Moving to a new server
I’m in the process of moving my whole site to a new server. What a headache. I think it will be worth it, though.
Anyway, you won’t see any new entries for a little while, and if the site goes down at all, you know why.
Please don’t comment until I say that everything is working on the new server, or your comment may be lost. Thanks.
UPDATE: I have hit a major snag which will probably delay the move quite a bit. My WordPress database on my current host is encoded in UTF-8. The database on the new host is also encoded in UTF-8. As far as I can tell, the encoding is preserved through every step of the export/import process. Why, then, do Chinese characters come out garbled? Not cool.
Did you check that the mysql versions are the same? If the new server/host is using an older version of mysql (can’t remember which version) it won’t support utf-8. I had the same problem in one of my host changes.
Could it be that there are different versions of UT-8? Or could it be a platform issue?
Actually, that seems to be what the problem is. The database is running on MySQL 4 now, but the new server is running MySQL 5. The collation language encodings are different (but I’m not sure if that’s at the root of the problem).
I really can’t accept that I can’t import my database for such a silly reason, though. I’m still working on it. What did you do?
I had this exact same problem with Drupal and a migration from MySQL 4 to MySQL 5. I spent days trying to find an easy solution, but apparently there is no easy way to do it, nor did I find a good tool to handle the job (the command-line tools that come with MySQL are useless for this sort of thing). Since Python has good Unicode support and a well-tested MySQL driver, I used a Python script to pull all the data off the MySQL server and serialize it to disk. Then after the migration (with garbled text and all), I used another script to replace the garbled text with properly encoded UTF-8 text. This is admittedly a somewhat low-level hack and a big pain to boot. I started to really dislike MySQL+Drupal after this experience. Anyway, email me if you want to go this route and want the gory details.
Oh, forgot to mention: Yes, John, the collation encodings are at the heart of the problem. What it amounts to is this: MySQL 4 does not know how to export UTF-8 data in the format that MySQL 5 can import. Why? My guess is that those Swedish gusy from MySQL AB don’t use UTF-8 much and didn’t think about this sort of thing.
However, I’m not too familiar with WordPress, and there’s no reason to assume that there isn’t some way WordPress can compensate for MySQL’s flaws. Is there any kind of XML import/export feature in WordPress, or perhaps a WordPress plugin?
My solution is probably not for you. At the time I hadn’t been blogging with Korean for long so I ran my site on both servers and edited entries with Korean on the new server cutting and pasting the old text in. It took a long time but it worked. However, now that I have hundreds of entries on the Korean blog I wouldn’t repeat the process.
You may want to try installing mysql 4 on your desktop importing the database and then updating it to mysql 5. Finally export the new updated database and import it at your new host. I’m not sure if that would work, but I would think it’s worth a try.
Proper UTF-8 support appears in MySQL 4.1.x, so I guess you are probably importing from MySQL 4.0.x. However when I migrated my sites from 4.0 to 4.1 (and then subsequently to 5.x) there does not appear to have any issue — all my Traditional Chinese posts turned out to be fine. Probably because I never bothered to encode my database with UTF-8. They are all in 8-bit latin-1 so all the gibberish gets converted untouched 🙂
Another thing to look out for is whether web server is sending out a wrong content type. Even though you have specified UTF-8 in your template, web server might still “suggest” latin-1/iso-8859-1 to the browser. It happens to the default configuration of my server (Apache 2) and I have to force it to stop giving suggestions.
feihong, EFL Geek, and Scott Y,
THANK YOU for all the help. All my searching and e-mails to support were doing no good, and I was beginning to despair. At least now there are some ideas to try.
Some more info (confirming Scott’s suspicions):
My plan of action is now this:
Try some different WordPress export plugins to see if they can somehow solve the encoding issue. It’s sort of a long shot, but it’s by far the easiest solution, so I think I should try it.
Install MySQL locally, import, then upgrade locally. Then export, and import to the new server.
Try feihong’s script. (Do I need shell access for this?)
One important detail is that at my old server I don’t have shell access. I do at my new server, but I don’t see how that helps.
Found this (in Simplified Chinese):
Hopefully it helps.
Shell access is great, my host offers that and it’s really powerful for installing and backing up. I particularly like CVS which I use for upgrading my moodle installation.
Good luck with everything.
I’ve had my share of encoding problems with hostgator, myself. Who’s your host, EFL Geek?
Without shell access it will not work. I think you had better try your plan of action #2, and if the local upgrade doesn’t preserve the encodings, then you could run the Python script on your own machine. Assuming you don’t find some nice export plugin first.
I’ve had this issue and the solution is to dump the database with the default character encoding set to ASCII (latin1). The following command will do the trick if you have shell access:
mysqldump -u [user] -p –default-character-set=latin1 [database] > out.sql
If you’re using a web interface you’ll need to figure out how to set the character encoding for the database dump.
Technically, UTF8 is totally ASCII compatible. This means that applications like MySQL which use only ASCII characters to process data (such as quotation marks around strings) can handle UTF8 transparently. If all you want to do is move data into and out of a database keep your default encodings set to latin1 — indexing and searching will be slightly faster, and you won’t have to worry about this sort of thing.
One more thing…
MySQL dumps its database content to a plain text file consisting of the SQL syntax needed to recreate the database on a new install. So once you’ve dumped the database, you can check that the content has dumped correctly by manually opening the file and trying to view the Chinese content. In the event you have future problems, this will at least help you identify which server is giving you trouble.
That being said, I’m pretty confident that setting everything to latin1 will solve your problems.
Mark, I originally looked into hostgator, but there was something in their TOS about email limits that would conflict with the Moodle installation I use with my students, otherwise they were my first choice.
I ended up going with Site 5 (affiliates link embedded). I’ve been incredibly satisfied with site5. The service is great and the plans are well worth it.
If you run multiple domains, MultiSite is really cool – each domain gets it’s own control panel which makes organizing files really easy. Additionally you will get a private IP for your account which is really useful as I won’t get banned in China just because of some other site on the same server as me.
Anyhow if you have any more questions drop by my site or send me an email: eslteacher at gmail dot com
I was all hung up on the entries being encoded in UTF-8 on both ends, when in reality, my entries were not encoded in UTF-8 in the database as I thought. WordPress encoded in UTF-8, then MySQL stored as Latin1. Evidently there were no problems with that.
So I was able to import to the new database as “Latin1” (instead of UTF-8), and everything now displays fine! Hard to believe it was that easy to fix.
Thanks to everyone for all the help.
That’s great, John! That sounds way simpler than what I did. How did you set it to import as Latin1? Was that through myphpadmin on cpanel? Or was that some setting in WordPress?
I had a trouble restoring my own webite once. I had a backup for the wordpress database. I was working on the same server and my encode was UTF-8. But when I restored it, some of the Chinese characters went crazy. Still I dont know why.
hey guys, i met with the same issue when i was trying to move my site to dreamhost. however, after some research i found that it was nothing to do with the ‘encoding’, but ‘collation’ of a new version of MySQL. i have posted a very simple solution at http://blog.jtam.org/2006/01/mysql-import-to-dreamhost/. i am sorry but it’s chinese. if you don’t read that, you may email me by jeffreytam[at]gmail.com for a patch of wordpress 2.0.1. good luck~
[…] Webmasters .ORG has a helpful post, as does Sinosplice. Reading through their notes is giving me a headache, so I’ll have to save this issue for […]