Jukuu is interesting

Sinosplice reader Matthew recently introduced me to Jukuu, a database of sample sentences. The Chinese name, 句酷, is a pun on the word 句库, meaning “sentence base,” in the naming tradition of nciku). There are some really interesting things going on on Jukuu. Here’s a screenshot from the results for a search for “get”:

Jukuu search results for "get"

I enjoyed some of those random sentences. Some things worth noting:

– The sample sentences in the screenshot above are all taken from About Face 3, a well-known book on goal-directed design, which has been published in multiple languages.
– Jukuu offers not only multiple translations (grouped by part of speech), but also the distribution of those various parts of speech in its database (that’s what the pie graph at the right represents).
– Jukuu also offers other word forms (词形) for “get” (in this case, “gets,” “getting,” “got,” “gotten,” and even “getable”).
– If you click on one of the translations in the top right, the resulting page shows you sentences with just that translation of “get” (for example, this one for 得到).
– You can get similar results without going the “exact translation route” by just searching for multiple words, in a mix of English and Chinese. (The sentences aren’t censored, either. Have fun with that!)
– If you go to the “get” results page, further down the right column, you also see links for “adjectives frequently preceding this word,” “verbs frequently preceding this word,” “prepositions frequently preceding this word,” and “nouns frequently following this word.”

This kind of thing is a linguist’s dream, and can only be accomplished by corpus analysis with part of speech tagging, which is a ton of work. It’s really cool to see a resource like this publicly available online.

Share

John Pasden

John is a Shanghai-based linguist and entrepreneur, founder of AllSet Learning.

Comments

  1. Love it! This will be particularly useful when writing in a second language. Like a google search without so much searching.

  2. I’ve been using Jukuu for a while now, but you should also check out a new dictionary/translating site I found. It’s Microsoft’s Engkoo. http://engkoo.com
    What’s great about Engkoo is that it seems to pull example sentences from the web itself. Using the web as a corpus. If the translation can’t be found on the web, it goes for machine translation. Go check it out! Just as interesting as Jukuu in my opinion, even if not more so.

    • 阿皮 (a1pi2) Says: August 28, 2010 at 7:07 am

      Engkoo is very cool. It’s actually intended as a tool to help Chinese speak English more proficiently. There’s a ton of linguistics research behind it.

      I really like the way that it highlights the same phrases–in different positions–in the different languages. It reminds me of the sing-a-long bouncing balls of the old silent movie era.

    • Thanks, I’ll have to take another look at Engkoo. I heard about it when it first came out, and I just couldn’t get over the fact that Microsoft was backing a project with such a ridiculous name. I’m over that, though. I’ll check it out again.

  3. Very clever site, but for us Chinese learners the weakness of this, as well as nciku, is that most of the example sentences are taken from English sources, with the consequence that you can never be sure just how natural and idiomatic the Chinese translation is. I find nciku extremely useful and convenient, and often peruse the sample sentences there, but I have seen innumerable translations which are just flat out wrong (we can only hope jukuu will raise the standards).

    • Very true. As a learner of Chinese in an English-learning land, I’ve long gotten used to the problems of using such resources in the reverse direction of that for which they were intended. Definitely worth noting, though.

  4. I’m surprised you weren’t not aware of this one; it’s been pretty popular amongst hardcore learners for the past few years. It sure comes in handy at times, but one must take all the sentences given with a grain of salt, of course.

Leave a Reply