Unicode with Blogger

10 Sep 2003

Unicode is great, but so far underused. It’s basically a newer, larger character set designed to make multilingual computing easier, indirectly bringing peace and harmony to all. Maybe one day we’ll be free of the mojibake and luanma (that’s Japanese and Chinese for “garbage characters”) that thwart our otherwise well-intended communications. Unicode is a step in the right direction.

What does implementing Unicode mean? It means you’ll no longer load up a page to find “garbage characters” and have to change the encoding used for the page. It means you can have characters from completely different character sets (say, Chinese and Korean and French) on the same page. Check out Glome for a good example of that. Unicode is great.

I bring this up largely because I think other China bloggers really ought to adopt Unicode in their blogs. Alf’s latest post reminded me of that. Even though he entered his Chinese name, “阿福,” correctly in Blogger, I can’t read it even when I change the encoding, and he made that post on my computer!

So I’d like to provide some instructions for those that use Blogger.

1. In Blogger, go to Settings, then Formatting.

2. Change the Encoding to “Universal (Unicode UTF-8)”.

3. Save Changes.

4. Go into the Blogger template.

5. In the <head> section of the document (that’s the part between the <head> and </head> tags), insert this line:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

6. Save Changes.

7. Publish.

Now when people visit your blog, it will automatically load with Unicode encoding and characters should display fine.

IMPORTANT NOTE: When you input Chinese into your blog entry through Blogger, you must be sure your browser is in Unicode encoding already. Otherwise it’ll all turn out as garbage. If you remember to switch over halfway through your entry, post first, then change the encoding, because changing the encoding will make you lose everything you’ve written in Blogger’s “Edit Post” window. If some of what you’ve written is in Chinese, then you’ll want to copy and paste it into a text file, switch over to Unicode encoding, then copy and paste back in. Nothing lost.

IMPORTANT NOTE 2: If you’ve written in Chinese in the past and it can be viewed successfully in your archives simply by switching to Chinese encoding, it will nevertheless become garbage after you switch over to Unicode. You’ll have to decide if you think it’s worth it to switch. I do.

Share

John Pasden

John is a Shanghai-based linguist and entrepreneur, founder of AllSet Learning.

Comments

  1. Konglong Says: March 9, 2004 at 10:40 pm

    John, are you using Blogger for your posts? If you are interested, MovableType works real well with UTF8. Your site seems to load ISO by default. Maybe I should send you an email since this is an old post.

    Kong

  2. Thanks so much! I was pulling my hair out trying to figure the encoding stuff out. I write a lot in spanish and the ~ and the other accents were turned into garbage. Thanks for your tutorial!!

  3. Erland Lewin Says: February 13, 2006 at 11:02 pm

    Great post! I Google’d this quickly with ‘blogger unicode’, and your HTML-line was just what I needed.

    One quirk though: when I copied and pasted your HTML line, the double quotes turned to garbage as they are not regular quotes but were unicode character 8221 (”)

    /Erland

  4. Erland Lewin Says: February 13, 2006 at 11:46 pm

    I had one more issue getting this to work. I am hosting my blog on my own machine, and the Apache web server was configured to use the standard character encoding as ISO-8859-1, which was sent as part of the HTTP TCP headers, which overrode what I was specifying in the HTML file.

    I tried adding an .htaccess file which said:

    AddCharset UTF-8 .html

    However, that didn’t work at first either, until I edited by server configuration to “AllowOverride All” for the relevant directory. Actually, probably I would’ve only needed to do “AllowOverride FileInfo”.

  5. Erland,

    Thanks for bringing this to my attention. If I had properly used the <code> tag, the quotation marks would not have been converted. It’s fixed now. It was good of you to add your experiences as well.

  6. Dear John:

    I put in nice meta tag inserts for my Blog and inserted at – success! When I checked with ‘Scrub the Web’ – excellent meta tag read out.

    But the inserts messed up the Blog which was suddenly filled with garbled letters and no email post facility.

    So I put your recommended and also put back the legend and Success. The Blog looks great again. No garbled letters and I have the email post facility back.

    EXCEPT although my meta tag info is still in my Template – none of it is being picked up by ‘Scrub the Web’.

    How do I have a great looking Blog and Great Meta Tags?

    Do you have any idea what I’m doing wrong?

  7. THANK YOU!!!!!!!!!! i’ve been searching high and low for this. i’ve tried changing the encoding from universal to everything else but nothing worked. then i slipped in this line and ta-dahh i can finally publish in chinese!!!!!!!!! YAY! thankyouthankyouthankyou

  8. Thanks a lot man. I have the first and the only blog written in a language called Assamese. I often had this problem of wrong encoding!! As I do not use blogger, I have to paste the line to the posts! Thanks a lot for solving this prob, I was pulling my leftover hair trying to get it!! Thanks again!!

  9. Wow! Just noticed that you are in linguist just as I am!!

  10. More and more surprises for me!! You are a gator!!!! I am doing my PhD in Linguistics at UF!!!! Do drop me an email whenever you can!!!!

  11. You are the MAN!
    You have saved me lot of time searching for solution. Thanks!!!
    One final question, are you ready, here we go: “Yr codes worked w/ some area in the new Beta Blogger formats like “header”, “about me”, “subtitittle”, etc…but the main “post” body does not work. Was it something I have done wrong?
    Checck this out to see what I meant : http://www.covietdesign.com

Leave a Reply

Your email address will not be published. Required fields are marked *