Simulating 80% Comprehension in Chinese

13 Oct 2016

A while back I wrote about What 80% Comprehension Feels Like, and I quoted the English examples used in Marcos Benevides’ excellent presentation which simulate 80% comprehension in English by including made-up English-like vocabulary words.

I’ve been thinking about that presentation a lot, both about the impact of such a demonstration, as well as about how it could be accomplished in Chinese. I ended up creating my own examples in Chinese. I’ll go ahead and share that first, and follow up with some discussion of the considerations involved.

(Before you attempt to read the following, please note that if your Chinese is not at least at an intermediate level, the following exercise is not going to work. Like its English-language counterpart, these examples are most effective with native speakers.)

Chinese Samples

Here is 98% comprehension:

Chinese: 98% comprehension

Here is 95% comprehension:

Chinese: 95% comprehension

Here is 80% comprehension:

Chinese: 80% comprehension


The tricky thing about reading Chinese is that it’s not just a matter of vocabulary and grammar; there’s an issue not present in English: the issue of Chinese characters. When a learner reads a difficult Chinese text, all three of these components tend to play a part in the difficulty: vocabulary, grammar, and characters.

But for the example to work for both learners and native speakers alike, there needs to be a way to guarantee that parts of the text were incomprehensible, as accomplished with made-up words in English. How can one do this in Chinese?

How I did it

First of all, to maximize the chances that the “intelligible” parts of the Chinese sample text are also readable by learners, I used as simple a text as I could: a Level 1 Mandarin Companion graded reader. For these examples, it was The Secret Garden.

Then, I had to be sure I chose the more difficult content words to swap out, and that I got all instances of them in each sample. Obviously, I had to count the words to make sure I got the desired percentage right. But equally important, to make my samples representative of real-life 98%, 95%, and 80% comprehension experiences, the words chosen should “cloud” reading comprehension to the appropriate degree, no more, no less.

But here’s the tricky part: how to represent characters the reader doesn’t know. The obvious way would be to create my own characters that don’t really exist. I enjoy doing this, but it’s time consuming, and to make it look truly credible it would have to not stand out at all when mixed in with the other characters. Too much work.

So I turned to the Unihan database of Chinese characters. Over the years, more and more obscure characters have been added to this set of characters, and I found a list of the most recent additions. (Most recently added should mean most obscure, but I chose Extension D from this page because it was both recent and a small download.)

A quick check confirmed that these characters were indeed obscure, but many of them didn’t look like simplified Chinese characters, or were just too weird, so I had to choose carefully. After making my choices, I also had to check to make sure that educated Chinese adults didn’t recognize the characters (guessing doesn’t count).

After that, I selectively swapped out characters in the samples. (My 80% comprehension text sample is the shortest, because I was running out of “good” obscure characters, and I didn’t want to have to find more!)

One interesting side effect of using such obscure characters in my texts was that most software couldn’t render them. Whatever fonts they used just didn’t include those bizarre characters. Only Wenlin, with its custom font designed to render all kinds of obscure characters, could display them all. So I had to do screenshots of Wenlin’s interface.

How to use this


I used these passages as part of a presentation on extensive reading at LanguageCon in September. I got the effect I wanted: Chinese members of the audience giggled (embarrassedly?) at the characters they didn’t know, especially when they got to the 80% comprehension example.

Chinese learners smiled wryly: there wasn’t much amusing about a fake recreation of the challenge they face on a daily basis, trying to read Chinese.

More than anything, I hoped that the Chinese audience could empathize with the learners of Chinese. Most Chinese people never know what it feels like to have to learn so many foreign characters as a part of a foreign language learning experience. Through these examples, though, they can get an inkling.

Actually, maybe they were chuckling in relief… at least they’ve got that challenge behind them.

The AllSet Learning blog also has a similar Chinese language article on this topic: 80%没有你想的那么多.


John Pasden

John is a Shanghai-based linguist and entrepreneur, founder of AllSet Learning.


  1. “The obvious way would be to create my own characters that don’t really exist. I enjoy doing this, but it’s time consuming, and to make it look truly credible it would have to not stand out at all when mixed in with the other characters. Too much work.”

    What about raiding Xu Bing (徐冰)’s A Book from the Sky (天书)? It’s made (I believe) of imaginary characters made to look like real ones (using real radicals, etc).

    • With all respect to Xu Bing, the hard part is not in combining existing components in a novel way; the hard part is making the “fake” characters look legit when placed side by side with other characters represented in a standard font.

      I love Xu Bing’s work, though! (I once blogged about it.)

  2. Great post as always, John. To me there’s another interesting level to the “don’t recognize a word” issue, which is that Sometimes, in Some cases, the nature of Chinese characters makes 80% word recognition easier in Chinese than English.

    Skipping past all of the usual discussion of whether the writing system is efficient (it’s not), the one small advantage characters have over English is that sometimes — maybe even often — they convey what the character means. Anyone who’s looked at a normal Chinese menu has probably come across weird food characters they couldn’t pronounce correctly but could guess the meaning of. Even with less context than a menu, correct obscure-Chinese-word classification happens pretty regularly. Old weapon characters come to mind as another pretty identifiable category.

    This is something that obscure English words don’t often give you: some clue as to their meaning. Sure, I know, you can try to look for word roots and foreign borrowings, and that works sometimes, but most of the time those origins are obscured by spelling weirdness and morphology and so on.

    So the 80% recognition thing might not Quite be apples-to-apples between English and Chinese. Again, not to argue that Chinese isn’t damn hard, but that in this one limited area, it may have a slight advantage over English.

    • Steve,

      I know what you’re getting at, but I don’t think this plays a significant role in comprehension of new vocabulary until pretty late in the game. It takes a fair amount of experience in the language to be able to make use of semantic components to make educated guesses. (You’d need quite a bit of experience learning Chinese before you started trying to read about ancient Chinese weapons, for example.)

  3. Absolutely excellent. Thank you for taking the time to make an example that will speak to Chinese audiences so effectively. I am soon part of a webinar with (almost all) native Chinese speakers about comprehensible input in Chinese teaching. May I share the link to this article as a way for them to feel the impact of different levels of comprehensibility?

  4. This is great. I remember your post on this for English but had forgotten the specifics so I had assumed you’d just be blacking out characters. It took me until my third unrecognised character in what was otherwise a simple piece of text for me to cotton on 🙂

  5. Really cool! It accomplishes the 80% comprehensibility goal pretty effectively, I think. At least in some cases, you also manage to obscure any good cues to pronunciation (and of course no one can guess the tones). That’s the part that really bothers me as a non-native reader.

  6. Thanks for this. Frustratingly, I can’t get the 95% and 80% samples to load at all from within China – even on a VPN. Any suggestions?

    • Even if the image is blocked, you should be able to click through to the URL the image is licked to. Does that work?

      This is an issue with hosting on Flickr, and given Yahoo’s imminent doom, I’m probably going to have to migrate everything off Flickr sooner or later, but for now it’s a huge headache that I’m delaying on…

  7. David Lloyd-Jones Says: November 16, 2016 at 5:39 pm

    Hi, John,

    I enjoy your thoughtful, intelligent and — to me at least — original stuff.

    Two queries. You write about the characters problem differentiating English from Chinese. This seems arbitrary to me: a very high percentage of native English-language readers agree that p-l-a-t-e spells “dish” unless they have some clue other than the letters on the page. Meself, I’m open to the possibility that all of us need clues other than the letters on the page, and it’s just that some of us are good at supplying our own of these.

    Somewhat similarly, you make the distinction “Guessing doesn’t count” when judging whether sophisticated native readers of Chinese are going after your obscure characters. Why not? What’s this distinction between guessing and knowing?

    I bring a bias of my own to questions like these. It seems to me there’s a Marco Polo Syndrome out there amongst experts of all kinds, starting with grade school teachers who think they know more than grade school students. This syndrome starts from the sure knowledge that one is superior because one has accomplished something astonishingly difficult, e.g. getting to Peiping in the 15th Century, and then back to Europe.

    This means you know something your audience doesn’t, and this, in turn, gives you license to tell them anything you damn well please. It is particularly important that one have this license in order to tell them things that force them to acknowledge one’s superiority.

    This certainly doesn’t mean that you should tell any untruths, you understand. In Marco Polo’s case, for instance, saying that people on the other side of the world have their faces on their chests and no heads is not an untruth, it’s just a shortcut. It would take so long to explain that they have sallow skins, epithelial folds and long fingernails, so chest-heads can serve as a sort of place-holder of truth.

    Language manuals are full of such place-holders. Folk etymologies and, worse, foreigners’ invented folk etymologies, are whole classes of these bogosities.

    I think that one of the triumphs of Eleanor Hartz Jordan of the US Army School in Michigan, and the State Department’s FSI in Georgetown, is the degree to which they avoid this class of error. Their methods consist of extreme drill in ver-ree genuine situations. “This is boring as hell, kids, but it works.”

    “Wear this fourteen-pound tape recorder around your neck for three months, play it eight hours a day, and you’re going to Osaka,” was Voice of America’s Jack Masey’s advice to new recruits in 1967. If you didn’t, you were going to be on the street…. Mormon missionaries, State Department FSO-2’s, and CIA folks, follow these instructions. They do well.

    What they accomplish in three months of hard work seems trivial. They speak a very little bit of the target language with a relatively good accent. This is, in fact, a miracle, and more than many people with huge vocabularies and reams of publications achieve in a lifetime. They function.


  8. Terry Waltz Says: January 7, 2017 at 2:14 pm

    I went to Georgetown’s school of languages for Chinese, and I know a number of Mormon missionaries who did Chinese. None of these claims is accurate about the degree of fluency or proficiency these programs impart. Sorry. John has it right. His point is not that Chinese is difficult per se, it’s simply that ANY language is acquired through comprehensible input, and that for independent reading, the level of comprehensibility has to be very high — higher than most teachers think. There’s nothing Marco Polo about that, except the fact that most teachers don’t get it.

Leave a Reply

Your email address will not be published. Required fields are marked *