Last week we looked at converting WordPress HTML into normal HTML and Markdown (Rendering Markdown and HTML in Ruby).

This is a great first step, but we also need a way of taking the generated WordPress export and migrating the data to the new application.

This needs to be an automated process so I can keep periodically running it during development, and when I put the new version live I’ve already got a process and I just run.

There are a couple of existing solutions for converting WordPress data to a new format, but because I’m not migrating to an off-the-shelf CMS, none of them would work for me.

Fortunately it’s not too difficult to do the job ourselves.

In today’s tutorial I will be walking through how I wrote my migration task.

Generating the Rake Task

Whenever I want to migrate the data I’m going to want run a command in Terminal to kick things off. Rails uses Rake (Understanding and Using Ruby Rake) for command line stuff so we can write our own Rake task for running the migration.

First we need to generate a new Rake task:

rails g task wordpress import  
[/bash]

This will generate a new `import` task that is namespaced under `wordpress`.

If you look under the `tasks` directory under the `lib` directory you should find a new file called `wordpress.rake`:  
```ruby  
namespace :wordpress do  
desc "Import WordPress data"  
task import: :environment do  
end  
end  

Reading the WordPress XML Export

Next we need to take the XML export that WordPress generates and read it into a structure we can work with.

In order to do this, I will be using the Nokogiri gem.

Add the following line to your Gemfile:

gem "nokogiri"  

And run the following command in Terminal:

bundle install  
[/bash]

Next I’m going to create a new directory under `lib` called `word_press` and a new file called `data.rb`:  
```ruby  
module WordPress  
class Data  
end  
end  

In order to pass the XML into Nokogiri, we first need to read the file.

I’ll handle this in the initialize method:

attr_reader :doc

def initialize  
file = File.expand_path("wordpress.xml")  
file = File.open(file)  
doc = Nokogiri::XML(file.read().gsub("\u0004", ""))  
end  

In this example I’m hard coding the path to the XML export. You could pass this as an option from the Rake command, but because this is specific to my application, and it’s never going to change, I don’t mind hard coding it.

Finally I’m going to provide a single method for getting the posts from the export:

def posts  
doc.xpath("//item[wp:post_type = ‘post’]").collect do |post|  
WordPress::Post.new(post)  
end  
end  

Nokogiri provides an xpath interface for traversing the XML structure. I’m only interested in the posts so that’s the only bit I need.

I collect over the array of results from the xpath query and create an array of new Post objects that will be returned from this method.

For my application, I’m using the posts as an entry point for getting all of the data from the export.

Creating the Data Objects

The next step is to create Data Objects for each of the types of data you want to migrate.

module WordPress  
class Post  
def initialize(doc)  
@doc = doc  
end  
end  
end  

By wrapping the Nokigiri element in a Ruby class, I can make any customisations and conversions as the object is read.

For example, if you just want to pass the data on, you can simply provide a method and return the value:

def title  
@doc.xpath("title").text  
end

def slug  
@doc.xpath("wp:post_name").text  
end  

But if you want to convert to a different format, you can encapsulate that in the method.

For example, in last week’s tutorial I was converting the WordPress HTML into Markdown and regular HTML.

I can deal with this conversion process inside of this class:

def content  
content = @doc.xpath("content:encoded").text  
content = format_syntax_highlighter(content)  
content.gsub(/[\n]{2,}+/, "\n\n")  
end

def html  
Render::HTML.new.render(markdown)  
end

def markdown  
return @markdown unless @mardown.nil?

@markdown = Render::Markdown.new.render(content)  
end

def format_syntax_highlighter(text)  
text.gsub(/\[(\w+)\](.+?)\[\/\1\]/m) do |match|  
"\n```#{$1}#{$2}```\n"  
end  
end  

To the outside world, this conversion process is completely hidden.

You can also create more classes to encapsulate related entities. For example, each post will have related comments so I can repeat the process of collecting these related items:

def comments  
@doc.xpath("wp:comment").collect do |comment|  
Comment.new(comment)  
end  
end  

Now I can deal with the comment specific formatting in it’s own object.

Importing the data

Finally back in the wordpress.rake task we can deal with the actual importing process.

This will basically mean taking each object from the WordPress data export and creating new Active Record objects and relations.

namespace :wordpress do  
desc "Import WordPress data"  
task import: :environment do  
# Get the WordPress data  
data = WordPress::Data.new

# Import the posts  
data.posts.each do |data|  
article = Article.new  
article.title = data.title  
# etc  
article.save!  
end  
end  
end  

The structure I’ve decided on for my CMS is more complicated than a regular blog and so this provides a nice opportunity to create the object graph for each article. That is something I definitely could not of done if I had used a general purpose solution.

Conclusion

In today’s tutorial we’ve covered a couple of interesting areas of Ruby development including creating Rake tasks as well as the very useful Nokogiri gem.

By encapsulating each chunk of data from the WordPress export as a class we can deal with whatever conversion details we require.

Although there are many existing solutions for migrating data from a WordPress blog, none of them came close to satisfying my requirements.

Hopefully if you are looking to do the same, you can use these last two posts as a foundation for building what you need.