Can Project Naptha Read Chinese Text in Images?

Yesterday Project Naptha hit Hacker News. It offers a way to extract electronic text from image files through a simple Chrome browser extension. Excited to see that simplified and traditional Chinese are both supported by the extension, I immediately installed the extension and tried it out.

The results? Unfortunately, Not so great.

When it doesn’t work at all

First of all, the script needs to recognize the text in the image. This first step doesn’t always go too well, even if the text seems relatively clear to the human eye. Let’s look at some cases where the extension found nothing, despite the Chinese text being pretty legible.

In this first case, the font is non-standard. OK, fair enough. That’s to be expected.

Testing Project Naptha with Chinese

In this next case, the text is pretty clear, but the contrast is poor.

Testing Project Naptha with Chinese

In this final example, the text is fairly clear to the human eye, but also low-res and slanted. That probably makes it difficult for the algorithm.

Testing Project Naptha with Chinese

When it sort of works

In many other cases, some text was identified, but not enough for the extension to be really useful for anything. Here are some images where Project Naptha could identify some text, and the “select all text” function was applied. (The blue boxes show what Project Naptha identified in the images as “text.” Sometimes they are bizarrely incorrect.)

Some examples:

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

I found the last two quite surprising, considering how clear and straightforward the text is, and also high-res.

When it actually works

Sometimes it was relatively successful in identifying the text. In these cases you must first set the language to Chinese (either simplified or traditional, depending on the text). There’s a cool effect showing you that some processing is going on. When that’s done, you can copy and paste the text.

Testing Project Naptha with Chinese

But… it might not be exactly what you were hoping for.

This selected Chinese text yielded the following copy-paste results:

Testing Project Naptha with Chinese

总统亲 ã热fl地接

\早、待了葫芦兄妹

If it had correctly captured all the text, it would have been:

10、总统亲自热情地接

待了葫芦兄妹

This one is better:

Testing Project Naptha with Chinese

雹电二怪对兄妹俩尽效使用现代

化武器况妹俩也不示弱 麝芦神功连

连使出 胭宙电二怪打入深深的山沟

It should have been:

355、雷电二怪对兄妹俩尽使用现代

化武器况妹俩也不示弱,葫芦神功连

连使出,把雷电二怪打入深深的山沟

Also, my sample size is too small to make any definite conclusions, but it seems like the extension works better for simplified characters than for traditional.

Conclusion

I don’t mean to sound overly critical. This is amazing technology here, and the fact that it launched with any support for Chinese characters at all is pretty awesome (and brave)! I’m sure the technology will improve with time, and that is going to be tremendously helpful to Chinese learners.

To put this in perspective, the development of OCR (optical character recognition) for mobile devices meant that you could point your cell phone’s camera at any characters you see, and get feedback on what the characters say (sometimes). Project Naptha means the same thing, but for your home browsing experience. For me, that’s when I do a lot more Chinese reading, so it’s even more important. Once this technology is perfected, as long as you have a tool to help you read electronic Chinese text, you’re all set!

Personally, I think this is especially great news for comics. It’s no coincidence that I tested this extension out on comic book text. I’m really looking forward to seeing how this extension develops.

Amazing Stencil-like Chinese Calligraphy Written in Chalk

OK, this would be amazing just by the fact that the following characters were written by hand, in chalk:

Amazing Chinese Calligraphy in Chalk

(I wouldn’t have believed that these weren’t somehow stenciled on somehow if I didn’t see the man writing the characters with my own eyes.)

…but how about that they’re also written upside down, by a farmer with no legs?

Amazing Chinese Calligraphy in Chalk

When I went by this spot an hour later after dinner, the chalk characters had all been rubbed out.

FluentU: a Producer of Original Videos for Learning Chinese (2)

As I mentioned in Part 1 of this review, FluentU is showing a lot of potential as a learning platform and a content producer. In this review I’ll look more closely at FluentU’s self-produced video series, and finish with an interview of content director Jason Schuurman (my ex-co-worker at ChinesePod).

Fluent-U’s Video Series

So, assuming you have a FluentU account, where do you find the FluentU-produced videos on the FluentU site? It’s not quite as obvious as you might think. They’re not aggressively recommended. But if you go into “Courses,” you’ll notice that some course names start with “FluentU.” Those are the ones FluentU has produced on their own. Unfortunately they’re not really grouped together for easy identification, so I went ahead and listed out all the ones I could find currently available on the FluentU website:

  1. FluentU: A Good Morning (Newbie, 8 clips adding up to 2:07)
  2. FluentU: Table for Two (Elementary, 7 clips adding up to 3:48)
  3. FluentU: Making Friends and Drinking Coffee (Elementary, 12 clips adding up to 3:01)
  4. FluentU: A Trip to the Supermarket (Elementary, 7 clips adding up to 4:50)
  5. FluentU: Studying on Campus (Intermediate, 8 clips adding up to 4:58)
  6. FluentU: An Evening Get-together (Intermediate, 8 clips adding up to 8:20)
  7. FluentU: Dinner with a Friend (Intermediate, 9 clips adding up to 6:42)
  8. FluentU: Shopping at the Clothing Store (Intermediate, 7 clips adding up to 6:34)
  9. FluentU: A Basketball Afternoon (Intermediate, 8 clips adding up to 6:13)
  10. FluentU: Going in for a Job Interview (Upper Intermediate, 9 clips adding up to 7:13)

So only 1 Newbie series, 3 Elementary, a whopping 5 Intermediate, and 1 Upper Intermediate. I was hoping for more video at the lower level, but at least I got to see what FluentU created across 4 levels.

One of the things I really like about these series is that although they’re broken up into short episodic clips, there’s also a “full” version that ends each series, putting all the clips together. This is great for review, and it also meant that I could easily watch each series without having to watch them one by one. (Watching a series on FluentU isn’t quite as easy as watching a playlist on YouTube.)

FluentU Original Video Series

The video themselves are professionally made, using young, attractive actors. There’s a bit of a Taiwanese flavor to them all (unsurprising, since they were all shot in Taipei, I believe), which may upset the Beijing-centric putonghua police. I think it’s fine, though, the videos seem to be designed to be universally applicable to mainland China learners as well. I didn’t encounter any Taiwan-only vocabulary, pronunciation was pretty standard (although not perfect, by Beijing standards). There was also some Taiwan-style usage of sentence-final modal particles, like , , and , but I didn’t find it too distracting.

At the lower levels, it’s clear that the creators slowed down the speech and added additional pauses for lower level learners. While this is nice, it has a funny effect, making some scenes feel quite awkward. You know those conversations where neither side is quite sure to say, and there are long awkward silences? There’s a lot of that feeling in some of the lower-level video series. Even within just the Elementary series, though, we quickly see the awkwardness in one series–FluentU: Making Friends and Drinking Coffee–fade to a much more natural rate of speech in FluentU: A Trip to the Supermarket. By Intermediate, the language is a lot more natural, while still being quite clear. I found the Upper Intermediate surprisingly accessible (read: not difficult), meaning it will be more useful for more learners, which is a good thing. (Other material marked “Upper Intermediate” on FluentU is usually quite a bit more challenging.)

FluentU Original Video Series

I noticed that the writers also went to great pains to make the dialogs full of high-frequency words and phrases. While it’s usually easy to pick out words that aren’t useful in most videos or even standard textbooks, there really aren’t many at all in FluentU’s videos. In fact, the language is usually so simple and everyday that the videos don’t really through you any cureveballs at all. There aren’t “twist endings” like you frequently find in ChinesePod dialogs. One thing that keeps the videos interesting, though, is a pervasive flirtiness running through many of the male-female dialogs. You kind of expect the video to devolve into a “what’s your number?” or “we should hang out sometime,” but they stay innocent.

In fact, it’s the flirtiness, combined with an amusing awkwardness, that makes these original videos memorable. Probably my favorite example of this is the elementary video where the cute flirty waitress informs the young protagonist where the bathroom is, immediately followed up by a cutesy “don’t forget to wash your hands!” (Ha ha, wut??)

FluentU Original Video Series

Then there’s also the very awkward pre-interview high-five (between strangers) in the Upper Intermediate job interview lesson:

FluentU Original Video Series

(The girl and guy from that video are going to end up as co-workers, and if that sexual tension doesn’t later erupt in a sequel series, I really don’t know what the writers are trying to do here.)

Overall, I enjoyed the videos. Especially at the lower levels, they still feel like “studying,” but they’re very usable and easy on the eyes. I can see these being useful in the classroom, especially for college students.

Interview with FluentU’s Content Director, Jason Schuurman

thinking Jason

Me: Although it’s a very different service, FluentU kind of reminds me of ChinesePod in that it offers a lot of material and tools, and it’s up to learners to put them together in the way that makes sense for them. Can you comment on how that might work at FluentU, and how different types of users use the service?

Jason: Yeah, we definitely provide a lot of freedom and flexibility. We feel this not only allows for a broader range of users to benefit from the site, but it also let’s our users take their learning into their own hands and not get bored with materials they weren’t interested in learning in the first place.

That being said, we still do provide structure for those who want it in the form of recommended content and more specifically, ‘courses’. Courses at FluentU are something like a playlist-meets-lesson plan, and guide users through various sets of content ranging from multi-part video series, to topically related materials and textbook vocab lists.

As far as users go, on one end of the spectrum, you have users who only use FluentU for the content itself. Our library of videos and audios is very large and already organized for you, translated into English, and completely annotated. These users usually already have a system for review they prefer, prefer not to review at all, or are in Chinese classes already and looking for engaging authentic content. Most of these users are intermediate and above and subscribe to our Basic subscription plan.

On the other end of the spectrum, are the users who use FluentU as their single source for most or all of their language learning and are Plus subscribers. Not only do they get all the content, but they have access to and can create decks (vocab lists), and are able to actually learn everything within the FluentU library using our personalized review tools. Learn mode integrates videos you’ve watched and tracks your progress so that we can do things like make sure clips you see are comprehensible to you and do an even better job recommending you content. All that adds up to a pretty complete learning package.

How big is the video library, exactly? How many FluentU-produced videos do you have now?

Jason: Our total count in the video library as I write this is 1251 videos, with 65 at newbie, 86 at elementary, 138 at intermediate, 257 at upper intermediate, 360 at advanced, and 340 at native. Of those, we’ve produced 10 ‘Courses’ of videos of our own, each about 5-10 videos apiece. Though, it’s ever-increasing because we publish at least 12 videos a week, sometimes more.

In addition to that we also produce our own audio dialogs, which we call ‘Audios’. They’re similar to some videos, but are more lesson-like and practical, and of course, don’t require the user to actually watch anything, which is great for mobile. (They’ll also be available offline on our upcoming mobile app.) Right now we have 228 of those spread across the levels, and also publish around 10 a week.

And as for ‘Decks’, which are vocab lists of various types available for learning via the Learn Mode, we have 124, and continue to publish more as well.

How have you been growing the library? Are you focused on particular levels, or topics, or styles of video? What are the plans for that?

Jason: We’re always looking for and preparing new videos. In general, we make sure to cover a wide range of topics and formats, and also keep difficulty in mind so that we can continuously publish a good mix of content. Though we also keep things like the overall library balance and user feedback and requests in mind as well. So far this has worked really well, so while we have no plans to change much in that regard.

How should learners decide what videos to watch on FluentU?

Jason: Browse! The videos in our library are divided into different topics, formats, and difficulties, which help users choose what to watch based on their interests and Chinese level. We also display the percentage of vocabulary the user already knows for each video, which allows them to chose videos that are more precisely at their level.

In addition to that, we have “Courses”, which if you don’t know exactly what to you want to watch or are looking for more structure, organize the content into playlists for you. Courses are also popular because they’re longer forms of content. You can slowly pick away at a course, with your next video, deck, audio, or learn mode session waiting for you every time you sign in, meaning you don’t always have browse for new content.

How do your difficulty levels relate to textbooks or other online services like ChinesePod or the Chinese Grammar Wiki?

Jason: We based our difficulty levels on a lot of things, but mostly a combination of our own knowledge and expertise (bolstered of course via sources like the Chinese Grammar Wiki), and other helpful standards like the new HSK levels. When actually determining difficulties, we consider things like speed and clarity, but linguistically we place a strong emphasis on frequency and usefulness to ensure that lower-level videos are filled with the stuff you really need when starting a language.

Can you explain the process for how you went about creating and filming your original video series?

Jason: When we first started compiling our library of videos, we realized there was a lack of quality, entertaining, video content for lower-level learners. There’s certainly stuff out there, as our current library reflects, but nothing as well-produced and linguistically-minded as we would have prefered. The stuff out there was either very entertaining but not practical, or very practical, but not entertaining. We wanted both, and so we decided to just make them ourselves. Essentially, we wanted to make sure our lower-level learners were getting as great of a video-learning experience as our more advanced users, and making some of our own videos was the best way to do that.

To produce them, we teamed up with a small independent film production company who we felt really understood our ‘vision’ with the videos. From there, we decided on what scenarios were most needed/wanted, wrote the scripts with our pedagogy in mind, and produced the videos.

You say the FluentU video series is for “lower-level learners.” Can you explain that a bit more? Is it for someone with a year of formal Chinese under their belt, or is it an absolute newbie, or someone who’s learned pinyin and a few basic phrases but nothing else, or what?

Jason: We’ve produced our own video series at the newbie level all the way up the upper intermediate level, but we made sure to produce more at the newbie, elementary, and intermediate levels simply because there’s less good material at that level.

The ‘newbie’ level courses are meant for learners with a little bit of understanding as to how the Chinese language works in particular in terms of tones, pinyin and characters, but without any prior master of them. Through a combination of the scripts, which are written for language learning, and a filming style that is meant to take full advantage of the video-learning format, we wanted to make understanding the content just come naturally, which is very important for newbies. To do that we emphasize things like repetition of key vocab, visual cues, slightly slower speech, etc. The video player and learn mode also allow the user to start with pinyin first, and graduate up to characters when they’re ready. And finally, we also other courses meant specifically for learners who aren’t familiar with, or who need more practice with, things like tones and pinyin.

I’d say with a year of formal Chinese under your belt, you’d be just about at our ‘intermediate’ level. With ‘elementary’ being somewhere in between that, and what I’ve just described above.

How are your own videos different from other video-based learning material? What makes them special?

Jason: Well, I think for one we put a ton of care into making sure they were not only useful, educational, and pedagogically sound, but also interesting and worth watching. I think the biggest problem with a lot of language learning video series, is not unlike the problem that many textbook dialogs have, in that they end up being either boring, or unnatural and feel somewhat stilted in the end. We tried really hard to make sure ours were visually interesting (and hopefully even sometimes funny!) and also that our actors did their best to speak and act naturally, while still maintaining the clarity of speech that is so important to a language learning video. That’s probably the biggest difference in my mind.


That’s the end of this two-part review of FluentU. Check out Part 1 if you missed it.

Shanghai’s Mobile Library

I was surprised to see a library-on-wheels in Shanghai’s Jing’an Park the other day. The vehicle is called “Reader No. 1″ in English, “读者1号” in Chinese.

Shanghai's Mobile Library

Shanghai's Mobile Library

Shanghai's Mobile Library

Shanghai's Mobile Library

Shanghai's Mobile Library

The mobile library visits various spots in Jing’an District three times per week, for two hours each time. According to the sign, this has been going on since 2010? I had no idea.

I wonder how many foreigners are using this service?

FluentU: a Developing Video-based Platform for Learning Chinese (1)

FluentU has quickly become the most talked-about video service for learning Chinese online. The site sports a clean, modern feel, and the team have been very responsive over the past year, as user feedback has informed a number of nice changes. Although I’ve been following FluentU’s development (and even met with the founder a while back), I haven’t reviewed the service myself until recently. It’s not a coincidence; I’m actually a bit skeptical of video-based learning (it’s really hard to get right), and I wanted to wait until FluentU got a few more features out before I reviewed the service.

For the most part, I’m going to assume that most of my readers have already heard about FluentU (it’s certainly not new anymore!), and I won’t provide an in-depth introduction to how the service works. This is part 1 of a 2-part series.

Why video?

Why video? This is a really important question. Working at ChinesePod, we were often confronted with the “why don’t you guys do video?” question. The logic seemed to be: “if audio is good, video is better.” ChinesePod has done a few experiments in video over the years, but never fully committed to it. The reasons are:

  1. Professional video is much more labor-intensive than audio (by a factor of 5-10)
  2. Users often say they want video, but don’t really want to pay extra for it (poor ROI)
  3. Many users use audio material in a way that doesn’t work with video (e.g. listening while working out, or while driving)

What conclusions can I draw from this? Not a whole lot. Maybe video is just not a good fit for the ChinesePod brand. Building up a big fanbase over years and years, all centered on audio, probably doesn’t naturally lead to demand for video. If ChinesePod were to really commit to doing video, it would have to be a concerted, long-term effort, and more than just a few experimental videos.

FluentU, on the other hand, has focused on video from the start. In its early days, it utilized tons of clips from YouTube, which meant its resources could go into translation, vocab management, and other tools (rather than video production). More recently, FluentU has started producing its own professional video content.

Video is great for providing the full visual context of language, including both cultural elements and body language. This is especially powerful for learners not in China (learners which can also take advantage of the unblocked internet and faster speeds for viewing FluentU videos).

FluentU: the Video Player

FluentU does a great job of presenting video. The player is great, right down to all sorts of tiny details. If you know FluentU at all, you know this, so I won’t say too much here.

FluentU - Video - The Four Tones in Use

Some specific details I like:

  1. Being able to loop a specific clip within a video.
  2. Color coding in the video timeline so you can see where the dialog happens in the videos and where there’s no speaking going on.
  3. Hovering on the subtitles automatically pauses the video, so you can check the meanings or pinyin of the words you’re hearing.
  4. When you first select a video, you’re presented with the entire video transcript up front (which you can also download). This is especially useful for intermediate and above; if you can read enough to get the gist of the transcript, you don’t have to suffer through 5 minutes of a video before discovering it’s not what you want.

But there’s a catch… because FluentU makes extensive use of YouTube, it doesn’t work flawlessly in China. I have a VPN, of course, but it’s still a little slow. It’s usable, but the lag is quite annoying, I must admit. I imagine using FluentU on a fast (unfiltered) internet connection would be pretty awesome, though.

FluentU: Learn Mode

The is one of the key features I want to focus on. It wasn’t around in FluentU’s early free/beta days, and it has a lot of potential. Basically, “Learn Mode” is FluentU’s take on SRS, an idea which isn’t so great all by itself, but holds a lot of promise for enhancing other methods of learning.

When you choose a FluentU video at your level that you’re interested in, you can choose between “Watch” and “Learn.” “Watch” is just watching the video, as expected. “Learn” takes you to a new interface which is focused on figuring out which words in the video you actually know, and familiarizing you with the ones you don’t know. This process should feel very familiar to anyone who’s used Anki or other SRS vocabulary review software, but FluentU has done its own take on SRS.

FluentU

When you don’t “know” a word, you have the option of watching one or more short video clips which include the word. It’s a very cool cross-section of the word in action across all kinds of video content and contexts. Imagine that all those sample sentences you love so much in your favorite dictionary (or Chinese Grammar Wiki) were all mini video clips. That’s what it does, complete with transcript for each individual sentence.

After you “learn” the word and continue, the system will cycle back and test you on the words you should have “learned.” There are multiple-choice questions, fill-in-the-blank, and straight-up translation mini-quizzes for each word.

FluentU FluentU FluentU

So I’m totally on board with the idea of extending SRS into something more interesting, and I like seeing innovation around the boring SRS model, but there are a few issues (which I’m sure FluentU is working on). First, if you’re in China using a VPN, the lag issue is even worse for these tiny clips than for the full videos.

Second, the “Learn this word” vs. “Already Know” dichotomy may be a little hard for some types of learners to deal with. There are just so many words we learners are working on in learning, which fall in that fuzzy region somewhere between “Learn this word” (as if it were new) and “Already Know,” that being forced to choose may be just a little agonizing.

If you choose “Already Know,” then BAM, that word is forever (?) on your “known” list, which might make you feel like you damn well better know it before clicking “Already Know.” Perhaps that’s the idea: getting you to browse clips more, and make fuller use of FluentU’s archive of annotated video. Fair enough. I just think it will be hard for some users (read: super-serious learners with perfectionist tendencies, like I used to be) to confidently click on “Already Know.”

FluentU - The Four Tones in Use

One thing is for sure: the “Learn” mode offers a much more focused way to “study” FluentU’s video content, rather than just casually browsing. It really is a very different experience from the site’s main video-watching experience, more similar to a quiz than enjoying a TV show. I can see how this might attract some users and turn off others.

If FluentU can get “Learn” mode right and get more users actually using it, it has huge potential. Any learning service that can accurately determine what its users “know” is very well poised to offer an amazing, personalized learning experience. Right now, FluentU offers a little green strip next to every video displaying what is “known” (based on feedback from “Learn” mode). There’s a lot of potential here.

FluentU-01

Mini-Interview with FluentU’s Founder, Alan Park

alan-park

Me: The FluentU video player is fantastic! How did you design/develop it?

Alan: Thanks for the kind words! We designed/developed it through the same way that we develop the rest of the site: by going back and forth with our users and adjusting based on feedback, until they loved it. And then adjusting it some more.

FluentU has some great video content, but it seems to also be branching out into audio too. Are you having second thoughts about a “pure video” approach?

Alan: Our team doesn’t have many “sacred cows.” We experiment a lot and are always trying new things to make the best language learning site possible. We started with real-world videos because video has many advantages. Video is exciting, and it opens your eyes to a whole new world and culture. People talk naturally on video. It’s memorable and helps words stick. And most of all it’s fun. On the other hand, audio has 2 huge benefits: it’s cheaper to create than video, and it doesn’t require as much active engagement for the user as video. We’ve found that there is definitely a place for audio alongside video.

Is FluentU primarily aimed at individual self-study learners, or at schools and other institutions?

Alan: Our focus is individual learners, but many schools and institutions tell us that their students are loving FluentU.

You’ve launched other languages on the FluentU platform. What does this mean for Chinese? Will Chinese get any “special treatment” going forward, or are new features now “all or nothing”?

Alan: Chinese is our first language, so it will always get “special treatment.” And by virtue of the fact that there is pinyin and Chinese characters there is no way around it. Besides, it’s my favorite foreign language.

The “Learn” feature on FluentU is a unique take on spaced repetition. Is it popular with your users?

Alan: Yes, they love it. Instead of saying that it is a take on spaced repetition, I would say that spaced repetition is just one small part of it.

The “Learn” feature is really a personalized quiz for learning vocab through video contexts. Instead of learning vocab through flashcards, why not learn them through short video clips which are handpicked for you?

What’s next for the “Learn” feature?

Alan: We’re making it mobile friendly. Right now, it involves a lot of typing, which wouldn’t translate well for smartphone. Stay tuned!

Conclusions

Just a few takeaway points:

  • FluentU has a great, learner-centric video player with awesome features and real attention to detail
  • FluentU may not work well in China, even if you have a VPN
  • FluentU has “Learn” mode, which may not be for all users, but it definitely takes FluentU well beyond “a site with a bunch of videos,” and looks very promising

In part 2 I’ll be looking at the FluentU-produced video series, with a more in-depth interview with Content Director Jason Schuurman.

A Realistic Look at the Challenges of Reading Chinese

The following is a guest article written by a Sinosplice reader, Julian Suddaby. I have followed it with some commentary of my own.

Warning: if you’re a member of the “Chinese is super easy” faction, this article might annoy you a little, but be sure to read through to the end!


How Many Characters?

by Julian Suddaby, 2014-02-13

Introduction

I asked Google “how many chinese characters do I need to learn” and the best sites I found pointed to linguist Jun Da’s website and used his data to argue that 3,500 characters should be enough for most people, being that you’ll know around 99.5% of the characters in general circulation. [1] Is that really enough?

Well, if you’ve got to that point, congratulations. It’s an achievement. But you may not want to stop accumulating characters just yet. Indeed, sad to say, at 3,500 you won’t even be able to read Jun Da’s name, being that 笪 is way down at frequency #5,231. [2] So how many, then, do you need to learn? Well, that depends on one question that you should ask yourself: what exactly do you want to read?

A Newspaper

Students often want to read Chinese newspapers. The Southern Weekly 南方周末 being a popular choice, I took the ten most popular articles over the previous thirty days and ran them through a computer program that checked them against Jun Da’s most frequent 3,500 characters. The results are fairly encouraging for the Chinese student, I think: if you knew the 3,500 you’d only encounter forty-four new characters over the course of those ten articles, and twenty-nine of those you’d only see once and so would probably just take a guess at from context and move on. But you’d possibly want to look up 甄, a pseudonymous surname given to the subject of one of the articles (and thus appearing thirty-five times); 闰, used in the name of a Zhejiang corporation which appears to have buried five hundred tons of poisonous chemicals in their backyard (seven appearances); and 驿, used in the name of a company involved in a online security breach (also seven appearances). [3]

So, while you probably shouldn’t throw out your dictionary just yet, it does seem that trying to read a newspaper won’t be a disheartening experience.

A Children’s Book

Children’s novels are another popular choice of reading material for language students. Shen Shixi is a well-regarded children’s novelist, whose Jackal and Wolf has recently been translated into English by Helen Wang. I ran an analysis on another of Shen’s novels, 《鸟奴》(lit. “Bird Slave”). This is, character-wise, much more difficult than the newspaper articles, with two hundred and one characters not in the top 3,500. Ninety of those are used more than once. As you’d expect from Shen, the “king of animal fiction”, animal-related vocabulary is one particular problem here, and you’ll probably end up very confused if you don’t look up 鹩, used two hundred and eighty-four times; 喙, used thirty-six times; and 獾, used twenty-two times. [4]

The novel is about two hundred and forty pages long, and so you should expect to find a character you don’t recognize on most pages.

A wuxia novel

Jin Yong’s novels remain firm favorites. Rather than starting with the four volumes and 1,300 pages of The Legend of the Condor Heroes 《射雕英雄传》, students might perhaps try A Deadly Secret《连城诀》, which is just four hundred pages or so. In those four hundred pages you’ll encounter two hundred and ninety-six characters not in the top 3,500.The most frequently used are from the protagonists’ names (水笙, 水岱, and 万圭), but there are plenty of new common nouns and verbs used multiple times as well. [5]

On a page-by-page basis, you should recognize more characters than in the Shen Shixi novel above. In terms of total characters, however, A Deadly Secret is more of a challenge.

A modern classic

Lu Xun’s A Call to Arms 《呐喊》, despite collecting stories he wrote at a very early stage of modern Chinese literary vernacularization, should not be much more difficult than the two novels above—at least in terms of basic character recognition. Two hundred and thirty unseen characters in total, with 闰 (remember that one from above?), 珂 (used in a name) and 锵 (a sound) taking the top three spots. [6]

Conclusion

Even from this very cursory analysis, it appears that if your goal is to read Chinese fiction comfortably without a dictionary, you’re going to need to recognize more than 3,500 characters. Chinese writers use characters well into the four or five thousand frequency range very regularly.

So although reaching 3,500 is worth celebrating, I wouldn’t stop trying to acquire characters just yet. Keep reading and dictionary-checking, and don’t abandon memorizing/spaced repetition if that’s something you find helpful. [7] You’ll still be coming across new characters for a long, long time…. [8]

  1. See http://lingua.mtsu.edu/chinese-computing/statistics/index.html.
  2. 笪 Dà (a surname here, but means “a coarse mat of rushes or bamboo”, with 旦 dān providing the phonetic). Here and later I’m using Wenlin as my main reference for character glosses.
  3. 甄 Zhēn (a surname here, but originally meaning “to make pottery” and thus composed of 垔 and 瓦, but with no phonetic clue), 闰 rùn (used in a name here, but means “intercalary”; the much more common 润 shares the same pronunciation), 驿 yì (used together with 站 to mean “post/courier station”; right-hand side is the phonetic, as in 译).
  4. 鹩 liáo (“wren”, with the left-hand side providing the phonetic), 喙 huì (“snout; mouth; beak”, with both 口 and 彖 radicals semantic; no phonetic clue), 獾 huān (“badger”, with the right-hand side phonetic).
  5. 笙 shēng (“reed-pipe instrument”, bottom is the phonetic), 岱 Dài (“Taishan mountain”, top is the phonetic), 圭 guī (“jade tablet”, cf. 挂 or 桂 for the pronunciation).
  6. 珂 (“a jade-like stone”, right-hand side is the phonetic), 锵 qiāng (“clang”, right-hand side is the phonetic).
  7. For the more technologically-oriented student, another option may be available: thanks to the increasing availability of texts in machine-readable formats students could run their own frequency analysis on a text they wanted to read and pre-learn characters they don’t already know. It’s a pity there don’t seem to be any easy-to-use programs or websites that offer this functionality.
  8. It should also be noted that single character recognition is only part of reading Chinese, and is not on its own a good measure of reading proficiency. That said, the relative ease of measuring character recognition and frequency may justify its limited use as a self-diagnostic and motivational tool for learners of Chinese.

The following is my response:

Interesting! This sort of helps make a case for the importance of graded readers. (Have you seen Mandarin Companion?)

While I know your intent is to SEEK THE TRUTH, the overall tone of the article is, unfortunately, a little discouraging for struggling learners. For me, this totally highlights the need for materials that give the learner a sense of accomplishment for having reached 300, 500, 1000 characters, rather than an incessant message saying, “STILL NOT GOOD ENOUGH.”

His response:

You’re quite right, I suppose I am a little too rigidly 实事求是 in the piece! I completely agree with you about the need to avoid the demotivating “still not good enough” feeling and message that permeates most Chinese teaching materials (how I remember my exasperation when the 高级 textbook still required fifty plus new vocabulary items per short text!). There’s really a huge need for more good reading materials with limited character/vocabulary ranges, and your graded readers look fantastic.

The Laziest Animated Movie Title Translations Ever

I remember when I first moved to China, I used animated films to practice Chinese quite a bit. I quickly discovered that Disney did an especially good jobs with translating (my favorite was the Chinese version of The Emperor’s New Groove). But I also started noticing something strange about a lot of these animated films’ Chinese titles… the word 总动员 appeared, somewhat inexplicably, way too often.

What is 总动员?

toy-story-zongdongyuan

It was almost like a formula. In one word, what’s the movie about? That’s the main theme. Then just apply this formula:

[main theme] + 总动员

What was going on? I asked a number of native speakers abut this phenomenon, and none of them had paid the issue much notice. One bit of helpful information they did give, however, was that the word 总动员, in mainland China, is tied in the minds of many to some popular variety shows that came out around the year 2000. Specifically, they were 欢乐总动员 (“Joyous Zongdongyuan“) and 全家总动员 (“Whole Family Zongdongyuan“). Both were loud, fun, programs with lots of active people.

Pleco‘s dictionaries give the following definitions for 总动员:

  1. General/total mobilization
  2. General (or total) mobilization
  3. general mobilization (for war etc)

OK, obviously those aren’t the meanings they’re shooting for in the titles of cartoon movies.

Native speakers seems to have trouble giving an exact definition of this use of 总动员, but the feeling is clear: exciting, happy, lively, 热闹, with lots of people.

The Rise of 总动员

Not appearing in dictionaries did not stop this word from popping up in animated feature titles all over the place, starting shortly before 2000. Many were Disney films, but not all:

finding-nemo-zongdongyuan

  • Toy Story, Toy Story 2, Toy Story 3 (1995, 1999, 2010) 玩具总动员: “Toy Zongdongyuan
  • Joe’s Apartment (1996) 蟑螂总动员: “Cockroach Zongdongyuan
  • Finding Nemo (2003) 海底总动员: “Bottom of the Sea Zongdongyuan
  • Looney Tunes: Back in Action (2003) 巨星总动员: “Megastar Zongdongyuan
  • The Incredibles (2004) 超人总动员: “Superman Zongdongyuan
  • Cars, Cars 2 (2006, 2011) 赛车总动员: “Race Car Zongdongyuan
  • Ratatouille (2007) 美食总动员: “Gourmet Zongdongyuan
  • Bee Movie (2007) 蜜蜂总动员: “Bee Zongdongyuan
  • WALL·E (2008) 机器人总动员: “Robot Zongdongyuan
  • Planes (2013) 飞机总动员: “Airplane Zongdongyuan
  • Free Birds (2013) 火鸡总动员: “Turkey Zongdongyuan
  • Minuscule: Valley of the Lost Ants (2014) 昆虫总动员: “Ant Zongdongyuan

This is not a complete list; rather, it’s an attempt to try to capture some of the biggest titles and the range that “zongdongyuan” covers.

cars-zongdongyuan

OK, some of these seem to work OK… Specifically, Cars seems to deserve the treatment. I can’t help but feel that “Gourmet Zongdongyuan” (Ratatouille) could have been a much better title, though, as could have “Robot Zongdongyuan” (WALL·E).

To be fair, most of these movies actually do have multiple titles, and a casual check seems to indicate that the translators over in Taiwan are putting a bit more thought into the translations of animated feature film titles. Still, I’ve been seeing these zongdongyuan translations for years, and it especially stands out for Disney films, which tend to have excellent translations for the actual movies themselves, despite the total cop-out titles.

The Fall of 总动员

I was thinking the linguistics nerds like me were the only ones that gave this kind of issue any consideration, but fortunately at least some Chinese movie fans are also getting fed up:

最烦动画电影的中文翻译!动不动就什么什么总动员,总动员个屁呀!!有点技术含量行不行,不会翻译就直接用英文名也比这个好吧,都是文化人,怎么那么俗啊?这几年的电影都被总动员了,汽车、玩具、海底、美食、机器人,你知道什么是总动员不?我了个呸!!!!!看见这么二货的翻译就来气,这么好的电影弄了个蹩脚翻译!气死我了!想起来就来气。。。

A rough translation:

I’m so annoyed by the Chinese translations of animated films. It’s just this zongdongyuan, that zongdongyuan… screw zongdongyuan! Can’t you have just a bit of skill? If you don’t know how to translate, directly using the English title is better than this. You’ve all got culture, but now why so crude? In recent years all movies have been zongdongyuan-ized: cars, toys, bottom of the sea, gourmet food, robots… Do you even know what a zongdongyuan is? Bloody hell! When I see a shoddy translation like this it sets me off. How can such a good move have such a lame translation? I’m so pissed off! I get mad just thinking about it…

Anyway, the good news (for us translation purists) is that in recent years zongdongyuan seems to have worn out its welcome, and quite a few animated movies (including Disney/Pixar films) that almost certainly would have gotten zongdongyuan-ized 5 or 10 years ago did not: Brave, Tangled, A Bug’s Life, Madagascar, Rio, Up, Happy Feet (not 企鹅总动员!), Turbo, Mr. Peabody & Sherman, Despicable Me, The Croods (疯狂原始人, “crazy primitive men” rather than 原始人总动员!)… all escaped zongdongyuan-ization. (Whether or not those films’ titles have good Chinese translations, though, is another question… but at least they’re not quite so lazy.)

So did anyone else notice this lazy translation trend, or was it just me?

Sun Moon Eyeglasses

ri-yue-yanjing

I recently noticed an eyeware shop called 日月眼镜 (literally, “Sun Moon” Eyeglasses”). This is a good example of a name that plays on common knowledge of characters and character components. The glasses themselves, of course, are unrelated to celestial bodies, but when you put the characters for sun () and moon () together, you get , a character which means “bright.”

Why “bright”? There are two reasons:

  1. The word 明亮 (“bright”), is frequently used to describe attractive, alert, healthy young eyes. So it’s a good association.
  2. Another association, although less direct, is that can refer to eyesight itself. There’s a word 失明 (literally, “lose brightness”) which means “to lose one’s eyesight.” Logically, then, can refer to eyesight, but there’s no word (of which I am aware) other than 失明 which uses to mean “eyesight.”

Have you noticed any other Chinese shop or brand names that use “deconstructed characters” in their names?

Don’t Let the Air In

I saw this sign on the door of the AllSet Learning office building that leads out to the patio:

IMG_2999

Here’s a closeup:

IMG_3001

It reads:

请大家去阳台后
随手关门
以免雾霾进入楼层

Translation:

Please, everyone, when going out on the balcony
close the door behind you
to prevent smog from entering the building

A young Chinese guy (presumably the one who put up the sign) came by our office to call our attention to the sign and ask for our cooperation. It was a little awkward because our window was open at the time (oops).

It’s weird… there’s a very traditional Chinese belief in a need for “fresh air” (even in the depths of winter). This air pollution problem is now quite visibly butting heads with that belief.

What Makes Bad Characters Readable?

I recently stumbled upon an interesting blog post titled The persistence of comprehension, which focuses on this handwritten Chinese note:

sabotaged-chinese-note

Now before you go too crazy trying to read it, know this:

Some time ago, Instagram user jumppingjack posted the above image of a note she left to her mum. She said that her brother secretly added extra strokes to the characters in the note. The result is interesting though: even though extra strokes were added, the note is still readable to most competent Chinese speakers.

bad-characters

That brother is kind of awesome. That is the kind of mischief I would have been all over as a kid, if only I had had Chinese characters at my disposal. The character substitutions are pictured at the right. (Note: not all of them are real characters.)

(Also, I totally sympathize with jumpingjack for writing the character wrong, with the two sides swapped. I have done that way too many times myself.)

Try having a Chinese friend read the note and take note what gives them the most trouble. Read the original post for the note’s original content (in electronic text) and the author’s analysis and conclusions.

Page 1 of 16912345...102030...Last »
Sinosplice and all material found herein © 2002-2014, John Pasden. All rights reserved.
Sinosplice is happily hosted by WebFaction. Design by Dao By Design