Jukuu is interesting
27 Aug 2010
Sinosplice reader Matthew recently introduced me to Jukuu, a database of sample sentences. The Chinese name, 句酷, is a pun on the word 句库, meaning “sentence base,” in the naming tradition of nciku). There are some really interesting things going on on Jukuu. Here’s a screenshot from the results for a search for “get”:
I enjoyed some of those random sentences. Some things worth noting:
– The sample sentences in the screenshot above are all taken from About Face 3, a well-known book on goal-directed design, which has been published in multiple languages.
– Jukuu offers not only multiple translations (grouped by part of speech), but also the distribution of those various parts of speech in its database (that’s what the pie graph at the right represents).
– Jukuu also offers other word forms (词形) for “get” (in this case, “gets,” “getting,” “got,” “gotten,” and even “getable”).
– If you click on one of the translations in the top right, the resulting page shows you sentences with just that translation of “get” (for example, this one for 得到).
– You can get similar results without going the “exact translation route” by just searching for multiple words, in a mix of English and Chinese. (The sentences aren’t censored, either. Have fun with that!)
– If you go to the “get” results page, further down the right column, you also see links for “adjectives frequently preceding this word,” “verbs frequently preceding this word,” “prepositions frequently preceding this word,” and “nouns frequently following this word.”
This kind of thing is a linguist’s dream, and can only be accomplished by corpus analysis with part of speech tagging, which is a ton of work. It’s really cool to see a resource like this publicly available online.