A Realistic Look at the Challenges of Reading Chinese
04 Apr 2014
The following is a guest article written by a Sinosplice reader, Julian Suddaby. I have followed it with some commentary of my own.
Warning: if you’re a member of the “Chinese is super easy” faction, this article might annoy you a little, but be sure to read through to the end!
How Many Characters?
by Julian Suddaby, 2014-02-13
I asked Google “how many chinese characters do I need to learn” and the best sites I found pointed to linguist Jun Da’s website and used his data to argue that 3,500 characters should be enough for most people, being that you’ll know around 99.5% of the characters in general circulation.  Is that really enough?
Well, if you’ve got to that point, congratulations. It’s an achievement. But you may not want to stop accumulating characters just yet. Indeed, sad to say, at 3,500 you won’t even be able to read Jun Da’s name, being that 笪 is way down at frequency #5,231.  So how many, then, do you need to learn? Well, that depends on one question that you should ask yourself: what exactly do you want to read?
Students often want to read Chinese newspapers. The Southern Weekly 南方周末 being a popular choice, I took the ten most popular articles over the previous thirty days and ran them through a computer program that checked them against Jun Da’s most frequent 3,500 characters. The results are fairly encouraging for the Chinese student, I think: if you knew the 3,500 you’d only encounter forty-four new characters over the course of those ten articles, and twenty-nine of those you’d only see once and so would probably just take a guess at from context and move on. But you’d possibly want to look up 甄, a pseudonymous surname given to the subject of one of the articles (and thus appearing thirty-five times); 闰, used in the name of a Zhejiang corporation which appears to have buried five hundred tons of poisonous chemicals in their backyard (seven appearances); and 驿, used in the name of a company involved in a online security breach (also seven appearances). 
So, while you probably shouldn’t throw out your dictionary just yet, it does seem that trying to read a newspaper won’t be a disheartening experience.
A Children’s Book
Children’s novels are another popular choice of reading material for language students. Shen Shixi is a well-regarded children’s novelist, whose Jackal and Wolf has recently been translated into English by Helen Wang. I ran an analysis on another of Shen’s novels, 《鸟奴》(lit. “Bird Slave”). This is, character-wise, much more difficult than the newspaper articles, with two hundred and one characters not in the top 3,500. Ninety of those are used more than once. As you’d expect from Shen, the “king of animal fiction”, animal-related vocabulary is one particular problem here, and you’ll probably end up very confused if you don’t look up 鹩, used two hundred and eighty-four times; 喙, used thirty-six times; and 獾, used twenty-two times. 
The novel is about two hundred and forty pages long, and so you should expect to find a character you don’t recognize on most pages.
A wuxia novel
Jin Yong’s novels remain firm favorites. Rather than starting with the four volumes and 1,300 pages of The Legend of the Condor Heroes 《射雕英雄传》, students might perhaps try A Deadly Secret《连城诀》, which is just four hundred pages or so. In those four hundred pages you’ll encounter two hundred and ninety-six characters not in the top 3,500.The most frequently used are from the protagonists’ names (水笙, 水岱, and 万圭), but there are plenty of new common nouns and verbs used multiple times as well. 
On a page-by-page basis, you should recognize more characters than in the Shen Shixi novel above. In terms of total characters, however, A Deadly Secret is more of a challenge.
A modern classic
Lu Xun’s A Call to Arms 《呐喊》, despite collecting stories he wrote at a very early stage of modern Chinese literary vernacularization, should not be much more difficult than the two novels above—at least in terms of basic character recognition. Two hundred and thirty unseen characters in total, with 闰 (remember that one from above?), 珂 (used in a name) and 锵 (a sound) taking the top three spots. 
Even from this very cursory analysis, it appears that if your goal is to read Chinese fiction comfortably without a dictionary, you’re going to need to recognize more than 3,500 characters. Chinese writers use characters well into the four or five thousand frequency range very regularly.
So although reaching 3,500 is worth celebrating, I wouldn’t stop trying to acquire characters just yet. Keep reading and dictionary-checking, and don’t abandon memorizing/spaced repetition if that’s something you find helpful.  You’ll still be coming across new characters for a long, long time…. 
- See http://lingua.mtsu.edu/chinese-computing/statistics/index.html.↩
- 笪 Dà (a surname here, but means “a coarse mat of rushes or bamboo”, with 旦 dān providing the phonetic). Here and later I’m using Wenlin as my main reference for character glosses.↩
- 甄 Zhēn (a surname here, but originally meaning “to make pottery” and thus composed of 垔 and 瓦, but with no phonetic clue), 闰 rùn (used in a name here, but means “intercalary”; the much more common 润 shares the same pronunciation), 驿 yì (used together with 站 to mean “post/courier station”; right-hand side is the phonetic, as in 译).↩
- 鹩 liáo (“wren”, with the left-hand side providing the phonetic), 喙 huì (“snout; mouth; beak”, with both 口 and 彖 radicals semantic; no phonetic clue), 獾 huān (“badger”, with the right-hand side phonetic).↩
- 笙 shēng (“reed-pipe instrument”, bottom is the phonetic), 岱 Dài (“Taishan mountain”, top is the phonetic), 圭 guī (“jade tablet”, cf. 挂 or 桂 for the pronunciation).↩
- 珂 (“a jade-like stone”, right-hand side is the phonetic), 锵 qiāng (“clang”, right-hand side is the phonetic).↩
- For the more technologically-oriented student, another option may be available: thanks to the increasing availability of texts in machine-readable formats students could run their own frequency analysis on a text they wanted to read and pre-learn characters they don’t already know. It’s a pity there don’t seem to be any easy-to-use programs or websites that offer this functionality.↩
- It should also be noted that single character recognition is only part of reading Chinese, and is not on its own a good measure of reading proficiency. That said, the relative ease of measuring character recognition and frequency may justify its limited use as a self-diagnostic and motivational tool for learners of Chinese.↩
The following is my response:
> Interesting! This sort of helps make a case for the importance of graded readers. (Have you seen Mandarin Companion?)
> While I know your intent is to SEEK THE TRUTH, the overall tone of the article is, unfortunately, a little discouraging for struggling learners. For me, this totally highlights the need for materials that give the learner a sense of accomplishment for having reached 300, 500, 1000 characters, rather than an incessant message saying, “STILL NOT GOOD ENOUGH.”
> You’re quite right, I suppose I am a little too rigidly 实事求是 in the piece! I completely agree with you about the need to avoid the demotivating “still not good enough” feeling and message that permeates most Chinese teaching materials (how I remember my exasperation when the 高级 textbook still required fifty plus new vocabulary items per short text!). There’s really a huge need for more good reading materials with limited character/vocabulary ranges, and your graded readers look fantastic.