TagCrowd on China Blogs
Hank at Network Sense introduced me to TagCrowd, a site which takes a chunk of text and displays the highest frequency words as a tag cloud. He tried it out by copying all the text on the main page of several blogs. Interesting results. Here are two of his results and two that I did:
Sinosplice (before this post):
ESWN:
I cannot use it, because as my web is in Spanish, doesn’t ignore the Spanish prepositions.
For example, the biggest word I have is “de”, and after that, “el”, “en” y “que”.
Only “China”, “chino”, “chinos” and “estrella” are remarkable.
That’s just nifty… the fact that “chickens” beat out “china” for size from my main page’s content says something about my blog 😉
Yeah, you’re right Chinochano. It doesn’t work for Spanish. We get a sort of Spanish Prepositions Catalogue.
Seems useful for English language blogs, though it’s also somewhat misleading.
Good point. The word “the” accounts for over 6% of all words in English media. There are several other common words missing from those tag clouds. I wonder which words it ignores…
http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm
That is so cool! Thanks for sharing it. It would be interesting to do a comparison of tag clouds for a blog’s front page over a month or so, taking snapshots once or twice a week. And leaving out the words (permalink, for instance, in my tagcloud) that are not content-related.
Hey, I’m the guy that created TagCrowd and I really appreciate all the comments. Since it’s still alpha software, all these ideas go towards making it a better and more useful tool.
Lyn is right: the software works by automatically removing the most common English words, which tend to be irrelevant parts of speech, like ‘the’, ‘and’, ‘there’, etc.
Until now, I hadn’t even thought about translating TagCrowd to other languages. That is super-valuable feedback. I’ll need to seek out equivalent word-lists for other languages in order for it to be useful for speakers/writers in those languages. I’d love anyone’s help with that if you know of good resources.
If anyone has any more thoughts or suggestions, feel free to send me feedback.