TagCrowd on China Blogs

11 Oct 2006

Hank at Network Sense introduced me to TagCrowd, a site which takes a chunk of text and displays the highest frequency words as a tag cloud. He tried it out by copying all the text on the main page of several blogs. Interesting results. Here are two of his results and two that I did:

Sinosplice (before this post):

tag-sinosplice

Shanghaiist:

tag-shanghaiist

Danwei.org:

tag-danwei

ESWN:

tag-eswn

Share

John Pasden

John is a Shanghai-based linguist and entrepreneur, founder of AllSet Learning.

Comments

  1. I cannot use it, because as my web is in Spanish, doesn’t ignore the Spanish prepositions.

    For example, the biggest word I have is “de”, and after that, “el”, “en” y “que”.

    Only “China”, “chino”, “chinos” and “estrella” are remarkable.

  2. That’s just nifty… the fact that “chickens” beat out “china” for size from my main page’s content says something about my blog 😉

  3. Yeah, you’re right Chinochano. It doesn’t work for Spanish. We get a sort of Spanish Prepositions Catalogue.

    Seems useful for English language blogs, though it’s also somewhat misleading.

  4. Good point. The word “the” accounts for over 6% of all words in English media. There are several other common words missing from those tag clouds. I wonder which words it ignores…

    http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm

  5. That is so cool! Thanks for sharing it. It would be interesting to do a comparison of tag clouds for a blog’s front page over a month or so, taking snapshots once or twice a week. And leaving out the words (permalink, for instance, in my tagcloud) that are not content-related.

  6. Hey, I’m the guy that created TagCrowd and I really appreciate all the comments. Since it’s still alpha software, all these ideas go towards making it a better and more useful tool.

    Lyn is right: the software works by automatically removing the most common English words, which tend to be irrelevant parts of speech, like ‘the’, ‘and’, ‘there’, etc.

    Until now, I hadn’t even thought about translating TagCrowd to other languages. That is super-valuable feedback. I’ll need to seek out equivalent word-lists for other languages in order for it to be useful for speakers/writers in those languages. I’d love anyone’s help with that if you know of good resources.

    If anyone has any more thoughts or suggestions, feel free to send me feedback.

Leave a Reply

Your email address will not be published. Required fields are marked *