Migrating Blogs (mysql export/import)

This site used to run on Wordpress and it served me well for the 8 years I used it.  There were so many features I never ever used that I decided to start over and write my own blog engine using django.  Whenever I've migrated copyandwaste to a different platform it's been pretty easy.. there were always builtin tools in the platform to export/import stuff. I didn't write any tools to migrate from wordpress to my django blog, I used built-in tools in MySQL to get the job done.

My Legacy Posts Table

mysql> describe wp_posts;
+-----------------------+---------------------+------+-----+---------------------+----------------+
| Field                 | Type                | Null | Key | Default             | Extra          |
+-----------------------+---------------------+------+-----+---------------------+----------------+
| ID                    | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment |
| post_author           | bigint(20) unsigned | NO   | MUL | 0                   |                |
| post_date             | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_date_gmt         | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_content          | longtext            | NO   |     | NULL                |                |
| post_title            | text                | NO   |     | NULL                |                |
| post_category         | int(4)              | NO   |     | 0                   |                |
| post_excerpt          | text                | NO   |     | NULL                |                |
| post_status           | varchar(20)         | NO   |     | publish             |                |
| comment_status        | varchar(20)         | NO   |     | open                |                |
| ping_status           | varchar(20)         | NO   |     | open                |                |
| post_password         | varchar(20)         | NO   |     |                     |                |
| post_name             | varchar(200)        | NO   | MUL |                     |                |
| to_ping               | text                | NO   |     | NULL                |                |
| pinged                | text                | NO   |     | NULL                |                |
| post_modified         | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_modified_gmt     | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_content_filtered | text                | NO   |     | NULL                |                |
| post_parent           | bigint(20) unsigned | NO   | MUL | 0                   |                |
| guid                  | varchar(255)        | NO   |     |                     |                |
| menu_order            | int(11)             | NO   |     | 0                   |                |
| post_type             | varchar(20)         | NO   | MUL | post                |                |
| post_mime_type        | varchar(100)        | NO   |     |                     |                |
| comment_count         | bigint(20)          | NO   |     | 0                   |                |
+-----------------------+---------------------+------+-----+---------------------+----------------+

My New Posts Table

mysql> describe blog_post;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | int(11)      | NO   | PRI | NULL    | auto_increment |
| title     | varchar(200) | NO   |     | NULL    |                |
| body      | longtext     | NO   |     | NULL    |                |
| tags      | varchar(255) | NO   |     | NULL    |                |
| slug      | varchar(255) | YES  | UNI | NULL    |                |
| author_id | int(11)      | YES  | MUL | NULL    |                |
| pub_date  | datetime     | YES  |     | NULL    |                |
| syn_date  | datetime     | YES  |     | NULL    |                |
| is_public | tinyint(1)   | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+

Exporting Data from the Legacy Table for the new Table

As you can see there only a few columns that are the same, and since I'm the only author of posts on here I don't really need to keep track of the author id from the legacy table. I only care about published posts and the only columns I need to export are post_title, post_date, and post_content. So I wrote the select statement which does that and just to keep my sanity I select these columns "as" the names in my new table. But the good part is "INTO OUTFILE" this allows me to save the select into a filename of my choice.

select         
post_title as title,         
post_content as body,         
post_date as pub_date   

INTO OUTFILE "/tmp/cprm-wordpress_posts.sql"
FIELDS TERMINATED BY '~~~' LINES TERMINATED BY '^^^'
from wp_posts  
where post_status = "publish" and post_type = "post" order by post_date asc;

After I select the data into the outfile I loaded the data file into my new table:

LOAD DATA LOCAL INFILE '/tmp/cprm-wordpress_posts.sql' 
INTO TABLE blog_post
FIELDS TERMINATED BY '~~~' 
LINES TERMINATED BY '^^^' 
(title,body,pub_date);

At this point I've got the meat and potatoes of my posts in the new database table but need to do a little maintenance. Since I am the first user created in my new django blog engine, my user account has an id of 1. Also all the posts I am importing were published so I need to update all the rows in my table with is_public=1.

update blog_post set autor_id=1;
update blog_post set is_public=1;

What else do I care about? oh yeah comments.

Legacy comments table

mysql> describe wp_comments;
+----------------------+---------------------+------+-----+---------------------+----------------+
| Field                | Type                | Null | Key | Default             | Extra          |
+----------------------+---------------------+------+-----+---------------------+----------------+
| comment_ID           | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment |
| comment_post_ID      | bigint(20) unsigned | NO   | MUL | 0                   |                |
| comment_author       | tinytext            | NO   |     | NULL                |                |
| comment_author_email | varchar(100)        | NO   |     |                     |                |
| comment_author_url   | varchar(200)        | NO   |     |                     |                |
| comment_author_IP    | varchar(100)        | NO   |     |                     |                |
| comment_date         | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| comment_date_gmt     | datetime            | NO   | MUL | 0000-00-00 00:00:00 |                |
| comment_content      | text                | NO   |     | NULL                |                |
| comment_karma        | int(11)             | NO   |     | 0                   |                |
| comment_approved     | varchar(20)         | NO   | MUL | 1                   |                |
| comment_agent        | varchar(255)        | NO   |     |                     |                |
| comment_type         | varchar(20)         | NO   |     |                     |                |
| comment_parent       | bigint(20) unsigned | NO   | MUL | 0                   |                |
| user_id              | bigint(20) unsigned | NO   |     | 0                   |                |
+----------------------+---------------------+------+-----+---------------------+----------------+

New comments table

mysql> describe django_comments;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int(11)      | NO   | PRI | NULL    | auto_increment |
| content_type_id | int(11)      | NO   | MUL | NULL    |                |
| object_pk       | longtext     | NO   |     | NULL    |                |
| site_id         | int(11)      | NO   | MUL | NULL    |                |
| user_id         | int(11)      | YES  | MUL | NULL    |                |
| user_name       | varchar(50)  | NO   |     | NULL    |                |
| user_email      | varchar(75)  | NO   |     | NULL    |                |
| user_url        | varchar(200) | NO   |     | NULL    |                |
| comment         | longtext     | NO   |     | NULL    |                |
| submit_date     | datetime     | NO   |     | NULL    |                |
| ip_address      | char(15)     | YES  |     | NULL    |                |
| is_public       | tinyint(1)   | NO   |     | NULL    |                |
| is_removed      | tinyint(1)   | NO   |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+

Exporting wp_comments

As you can see, all I care about is the name, email, url, comment that a user made in wordpress.

select comment_author, 
       comment_author_email,
       comment_author_url,
       comment_author_IP,
       comment_content,
       comment_date,
       comment_post_ID
INTO OUTFILE "/tmp/cprm-wordpress_comments.sql"
FIELDS TERMINATED BY '~~~' LINES TERMINATED BY '^^^'
       from wp_comments
       where comment_approved=1; 

Import Data into new table

LOAD DATA LOCAL INFILE '/tmp/cprm-wordpress_comments.sql' 
INTO TABLE django_comments
FIELDS TERMINATED BY '~~~' 
LINES TERMINATED BY '^^^' 
(user_name, user_email, user_url, ip_address, comment, submit_date, object_pk);

Cleaning-up

Django-comments requires you to set the content_type_id, in my setup the Post model id is 18.

update django_comments set content_type_id=18;

Conclusion

Because I made my object relations simple and stripped down functionality the migration was pretty painless. I can never remember mysql syntax so I templated out mysql table export and mysql table import templates over at HatchConfigs.

Tagged as hatch , migration mysql , wordpress
Written by Andrew Konkol on April 5th, 2012

2 Comments

Nice post, I was thinking about doing the same thing with Ruby. I moved my blog from Wordpress to Jekyll late last year but find myself wanting something database driven again. I like how you built comment authentication via Twitter, Facebook, etc.

Jeff Judge
April 5, 2012
I was going to use disqus, but it looks bad and I'm a minimalist. Managing all the social auth stuff is kind of a pain in the ass, but a learning experience nonetheless.

Andrew Konkol
May 7, 2012
Log in with Twitter, Google, Facebook, LinkedIn to leave a comment.