Jun 2017

Brendan on the Meaninglessness of Chinese Characters

I’ve been dealing a lot with clients’ Chinese character issues, and happened to stumble upon this Quora answer of Brendan O’Kane’s to a question about the origin of the character :

Chinese speakers believe a lot of things about their own writing system, many of them untrue. One of the deepest-rooted and most pernicious of these false beliefs is the notion that characters have meaning. They don’t. The Chinese language [simplifying here; feel free to replace with “Chinese languages,” if you prefer] was spoken long before it was ever written, and has been spoken fluently throughout its history by far more people than have been able to write it fluently. The modern components of a character are not a reliable guide to either the meaning of the character or the early forms of a character, and the characters that make up a word are not necessarily a reliable guide to the meaning of the word. A lot of the stuff referred to as “etymology” in Chinese would more accurately be described as “stories about pictures” — cute, and occasionally helpful for memorization, and sometimes even sort of accurate, but mostly no more truthful than the old story about the English word “sincere” coming from Latin “sine cera,” “without wax,” or about “history” being “his story.”

Lots of interesting ideas here, and Brendan is spot on. And although “Chinese speakers believe a lot of things about their own writing system, many of them untrue,” that doesn’t mean you shouldn’t learn much of what Chinese speakers believe about their language (and writing system). In fact, you kind of have to. That’s culture. It’s like learning about all the ways that “America” is “the land of the free,” even if you don’t believe that the U.S. is that great bastion of liberty. What a people believes about its country is important.

Still, you don’t take everything at face value. Brendan’s point might be a “there is no spoon” moment for you, though, if you’re ready for it.

There is no spoon (勺子)

The key point here is that no bit of language, either spoken or written, has a meaning that people haven’t given it. (For more information on where meaning comes from, read up on semiotics and semantics.) Furthermore, spoken language is primary. Written language is a technology employed by a society. Sure, it’s a special technology with special properties and all kinds of cultural power, but it’s not the language itself, nor is it inherently meaningful in itself. Chinese characters do not hold any meaning that people do not give them.

If all this sounds obvious, that’s great, but if you pay attention, you may notice that Chinese characters do sometimes seem to take on mystical qualities in Chinese culture.

I’m not trying to get overly philosophical or quibble over irrelevant details. The question for me is: what does this mean for the learner of Chinese? Here are a few points:

  • You don’t have to know the full origins of every character you learn. Sure, they are sometimes helpful for memorization, and if that’s the case, great.
  • It’s worth noting how many non-language-oriented native speakers, fully fluent and literate, have no interest in character origins, and have forgotten most of what they once knew about that stuff. And yet they are still fully fluent and literate in Chinese.
  • Since character meanings are neither inherent nor absolute, it’s not bad to sometimes make up your own little stories to help you remember characters. The key is consistency (so as not to confuse yourself), not factual accuracy.
  • Still, because characters are such an important part of Chinese culture, it’s not a good idea to make up your own stories that run counter to the standard ones that virtually every Chinese person knows, like the meanings of the most basic pictographic (人, 日, 木, etc.) or the simple or compound ideographic (上, 明, 好, etc.) ones. For the more complicated ones that most native speakers couldn’t explain, your own story mnemonics are safe to use.

This is a complicated issue with tons of cultural baggage, I realize. I’m happy to discuss in the comments!


May 2017

Learn the Structural Patterns of Chinese Characters

It’s hard to succinctly explain what I mean by this title, because “character structure” and “character composition” are pretty much always used to mean “the character components that make up a character” (or, to use the more outdated term, “radicals”). But the character components would be the content. The limited number of spatial configurations in which those components routinely combine are the “character structure patterns” I’m talking about in this post.

Take a look at this:

Chinese Character Structural Patterns

If that’s not clear enough, let me break it down for you.

First of all, these “structural patterns” of Chinese characters are referred to as “Ideographic Description Characters” in the IT world, and each one actually has its own Unicode character! So you can copy and paste them just like other text (provided you have Unicode support), and even Google them. (Pro tip: Baidu them. Baidu Baike (Baidu’s Wikipedia) has lots of examples of each type.)

Here are those 12 Unicode characters:

⿰, ⿱, ⿲, ⿳, ⿴, ⿵, ⿶, ⿷, ⿸, ⿹, ⿺, ⿻

The patterns ⿰ and ⿱ (and sometimes a combination of those two, one embedded in the other) make up the most characters. Here are some simple examples of characters that use the more common structural patterns:

  1. ⿰:
  2. ⿱:
  3. ⿲:
  4. ⿴:
  5. ⿵:
  6. ⿸: 广
  7. ⿺:

My advice is:

  • If you’re learning characters, learn these patterns. There aren’t that many, and they’re useful. It’s also good to dispel the notion that character components can be combined in an infinite number of ways. It’s a lot to absorb, for sure, but it’s not an infinite number of options you’re dealing with.
  • If you’re teaching characters, teach these patterns (or at least point them out) as you teach the character components. Everyone teaches components, but it’s nice to add a little structure to the teaching of structure. Confirm the growing, amorphous familiarity your students are acquiring, and give it a definite form.
  • If you’re building a website or app, include these patterns. It’s not going to be useful to look up characters in this way, but if done right, it could be a great way to explore a character set, and self-directed exploration is one of the best ways to learn.


May 2017

Cool Custom Fonts for Chinese Book Covers

It’s a lot of work to create a new font in Chinese. Instead of English’s 26 capital letters, 26 lower-case letters, 10 numbers and a smattering of symbols, you have literally thousands of Chinese characters you need for even a basic font. But if you just need a special font for a logo or a book cover, it makes sense to put the design work into just the Chinese characters you need. And if you look at enough Chinese book covers, you discover some cool custom fonts!

Here are some covers with custom Chinese fonts which I discovered on a recent trip to the book store (Chinese title in text below each photo):









I suppose it’s possible that not all of these are custom-designed characters (they might just be fonts I’m not aware of), but they’re still pretty cool!


Feb 2017

Crazy Circus Chinese Characters

I’m always on the lookout for interesting, creative use of Chinese characters, and that includes cool and weird Chinese fonts. Well, the characters at the bottom of this poster really caught me by surprise, because I didn’t even realize they were characters at first:

Crazy Circus City

If you’re struggling to make anything out, note that English at the bottom right: “Crazy Circus City.” Big clue.

OK, here’s what it reads:


Literally, “crazy (疯狂) circus (马戏) city ().”

(Don’t worry if you still find it hard to read even after you know what it says, and even if you know the characters. It’s really hard to read!)

For some reason the traditional form of is used, though: . My Chinese teacher back in college always told me that mixing simplified and traditional characters was a big no-no. It’s just too… crazy.

If the whole thing had been in traditional characters, it would have read:



Oct 2016

Simulating 80% Comprehension in Chinese

A while back I wrote about What 80% Comprehension Feels Like, and I quoted the English examples used in Marcos Benevides’ excellent presentation which simulate 80% comprehension in English by including made-up English-like vocabulary words.

I’ve been thinking about that presentation a lot, both about the impact of such a demonstration, as well as about how it could be accomplished in Chinese. I ended up creating my own examples in Chinese. I’ll go ahead and share that first, and follow up with some discussion of the considerations involved.

(Before you attempt to read the following, please note that if your Chinese is not at least at an intermediate level, the following exercise is not going to work. Like its English-language counterpart, these examples are most effective with native speakers.)

Chinese Samples

Here is 98% comprehension:

Chinese: 98% comprehension

Here is 95% comprehension:

Chinese: 95% comprehension

Here is 80% comprehension:

Chinese: 80% comprehension


The tricky thing about reading Chinese is that it’s not just a matter of vocabulary and grammar; there’s an issue not present in English: the issue of Chinese characters. When a learner reads a difficult Chinese text, all three of these components tend to play a part in the difficulty: vocabulary, grammar, and characters.

But for the example to work for both learners and native speakers alike, there needs to be a way to guarantee that parts of the text were incomprehensible, as accomplished with made-up words in English. How can one do this in Chinese?

How I did it

First of all, to maximize the chances that the “intelligible” parts of the Chinese sample text are also readable by learners, I used as simple a text as I could: a Level 1 Mandarin Companion graded reader. For these examples, it was The Secret Garden.

Then, I had to be sure I chose the more difficult content words to swap out, and that I got all instances of them in each sample. Obviously, I had to count the words to make sure I got the desired percentage right. But equally important, to make my samples representative of real-life 98%, 95%, and 80% comprehension experiences, the words chosen should “cloud” reading comprehension to the appropriate degree, no more, no less.

But here’s the tricky part: how to represent characters the reader doesn’t know. The obvious way would be to create my own characters that don’t really exist. I enjoy doing this, but it’s time consuming, and to make it look truly credible it would have to not stand out at all when mixed in with the other characters. Too much work.

So I turned to the Unihan database of Chinese characters. Over the years, more and more obscure characters have been added to this set of characters, and I found a list of the most recent additions. (Most recently added should mean most obscure, but I chose Extension D from this page because it was both recent and a small download.)

A quick check confirmed that these characters were indeed obscure, but many of them didn’t look like simplified Chinese characters, or were just too weird, so I had to choose carefully. After making my choices, I also had to check to make sure that educated Chinese adults didn’t recognize the characters (guessing doesn’t count).

After that, I selectively swapped out characters in the samples. (My 80% comprehension text sample is the shortest, because I was running out of “good” obscure characters, and I didn’t want to have to find more!)

One interesting side effect of using such obscure characters in my texts was that most software couldn’t render them. Whatever fonts they used just didn’t include those bizarre characters. Only Wenlin, with its custom font designed to render all kinds of obscure characters, could display them all. So I had to do screenshots of Wenlin’s interface.

How to use this


I used these passages as part of a presentation on extensive reading at LanguageCon in September. I got the effect I wanted: Chinese members of the audience giggled (embarrassedly?) at the characters they didn’t know, especially when they got to the 80% comprehension example.

Chinese learners smiled wryly: there wasn’t much amusing about a fake recreation of the challenge they face on a daily basis, trying to read Chinese.

More than anything, I hoped that the Chinese audience could empathize with the learners of Chinese. Most Chinese people never know what it feels like to have to learn so many foreign characters as a part of a foreign language learning experience. Through these examples, though, they can get an inkling.

Actually, maybe they were chuckling in relief… at least they’ve got that challenge behind them.

The AllSet Learning blog also has a similar Chinese language article on this topic: 80%没有你想的那么多.


Oct 2016

Traffic Characterplay

This image has been on posters all around Shanghai’s Jing’an District this past summer:


Each character is related to traffic:

  • : yield
  • : car
  • : slow
  • : stop

Actually, each character is merely “decorated” in the traffic theme. It’s not true “characterplay” in the sense that characters are constructed or represented in some non-standard way for artistic effect. I’ll take it, though!


Aug 2016

Chinese Characters with Disney Characteristics

I noticed this poster in the Shanghai Metro:


You gotta love Disney’s attention to detail. If you look at the characters carefully, you’ll see that elements of the iconic “Disney” typography have been incorporated into the Chinese characters:


(Oh, and nope, I’m still not planning any new trips to Shanghai Disneyland!)


Jul 2016

Punny Clothing Shop

This is a clothing store in Shanghai’s Jing’an Metro Station:


The name of the shop is 布言布语 (Bù Yán Bù Yǔ). The pun involves the character , which in this case, is a substitute for .

The original expression is: 不言不语, which means “to not say a single word.”

The pun gives us 布言布语 (the pronunciation matches exactly), riffing on words like 布料 (“cloth”) which use the character.

Truth be told, 布言布语 is not a vey clever name. Sure, it has the pun, but 不言不语 has nothing to do with clothing. Still, somebody thought it was good enough for a clothing shop name.

Other Chinese brands have similarly used a language theme in their names. The first one that comes to mind for me is “BreadTalk,” which is 面包新语 in Chinese.


Jun 2016

Of Forests and Graves

How do you turn a forest into a grave? Check out this innovative ad I spotted on the Shanghai Metro:


The (altered) character is , meaning “forest.” The text below it reads:


In English:

If the trees all disappear, forests will turn into graves.

To understand the message, you have to know that the character , meaning “forest,” is made up of three , which each mean “tree.” And does indeed look like a little cross when you take away the two diagonal strokes.

Part of what makes this interesting to me is that crosses are, of course, not a feature of Chinese graveyards at all. Here’s a picture of a Chinese cemetary:


Still, innovative ad that drives the point home. Well done.


Oct 2015

Starbucks Hates Chinese Learners

I’d say that the Chinese name of Starbucks’ new flat white coffee is adequate proof that Starbucks hates Chinese learners. (The other piece of proof is that Starbucks employees in China probably play the fiercest language power struggle game of any other group I know.) Anyway, the Chinese name of the flat white is 馥芮白:


Yeah, don’t feel bad if you don’t know those first two characters. They’re not at all common. And that fist character… wow.

A little more info about the two hard one characters:

– 馥 (fù) fragrant. (The right half is the you might know from 复旦大学.)
– 芮 (ruì) small / surname. (I am familiar with this one mainly because of the “Réel” mall (芮欧百货) near my office.)


So in this case, even if you’re trying hard to use Chinese as much as possible, I’d say don’t feel bad if you took one look at this Chinese name and opted to use English.


Oct 2015

A Graffiti Theory on Love

I feel like this message is not something you’d see in American graffiti:



It reads:

> 爱情最终目的是婚姻

> Àiqíng zuìzhōng mùdì shì hūnyīn

> The ultimate goal of love is marriage

Hmmmm, not hard to guess the story behind that one.

The same graffiti “artist” seems to have left this as well:


幸好 is a word meaning “fortunately”, but the final character ( on ?) appears to not exist? The character comes close.


Jul 2015

A “Home” for Escher in Chinese Design

Here’s a poster I spotted in a mall recently:


The character there is , meaning “house,” “home,” or sometimes even “family.”

The first thing I noticed was its Escher-like quality, updated to a modern aesthetic. (Reminded me of Monument Valley even more than Escher directly, actually.) Very cool, and not something I see much in China, for sure!

The second thing I noticed was that the stylized character on the poster is missing a few strokes. If this character is , then the bottom part is supposed to be , which has 7 strokes. Instead, the bottom part looks more like the 5-stroke , minus the top stroke.

I found this odd, because this is a pretty big difference, and in my experience the Chinese don’t take character mutilation too lightly, especially when it’s not just private use. My wife’s response was just to shrug it off, though, with a, “yeah, but it’s still supposed to be .”

What do your Chinese friends think? Cool design, or heinous affront to the sanctity of the 10 strokes of the Chinese character ?


Jun 2015

The Chinese Chinese Room

I’ve been listening to a series of lectures called Philosophy of Mind: Brains, Consciousness, and Thinking Machines. A lot of these concepts I’ve read about before, but it’s nice to have everything together in a coherent set. One of the topics covered refers to the Chinese Room, an anti-AI argument devised by philosopher John Searle:

Modus scribendi

> Searle writes in his first description of the argument: “Suppose that I’m locked in a room and … that I know no Chinese, either written or spoken”. He further supposes that he has a set of rules in English that “enable me to correlate one set of formal symbols with another set of formal symbols”, that is, the Chinese characters. These rules allow him to respond, in written Chinese, to questions, also written in Chinese, in such a way that the posers of the questions – who do understand Chinese – are convinced that Searle can actually understand the Chinese conversation too, even though he cannot. Similarly, he argues that if there is a computer program that allows a computer to carry on an intelligent conversation in a written language, the computer executing the program would not understand the conversation either.

> The experiment is the centerpiece of Searle’s Chinese room argument which holds that a program cannot give a computer a “mind”, “understanding” or “consciousness”, regardless of how intelligently it may make it behave. The argument is directed against the philosophical positions of functionalism and computationalism, which hold that the mind may be viewed as an information processing system operating on formal symbols.

So my first thought was: what do the Chinese call the Chinese room? The laowai room? Turns out the answer is a little more boring than that: in Chinese it’s called the 中文房间. (You might have been tempted to translate it as the 中国房间, but that doesn’t work as well, since the whole part of the argument is that there are these inscrutable symbols that can be manipulated. It’s Chinese characters that are key, not Chinese culture or nationality.)

The room is “Chinese,” of course, because of the inscrutable nature of Chinese characters. It boosts the argument if it feels like there’s no chance that the person in the room will actually learn them in the process of their Chinese room work.

I have to take umbrage at that: you can read Chinese. (If you want to.) I admit, though, that the Chinese room is not the best way to go about it.

There’s one other thought experiment mentioned which touches on China: the China brain. That one doesn’t seem to have a well-known translation (and no page on Wikipedia), but it seems at least some translators have gone with the straightforward 中国脑子. (This one refers to the people of China, and not the Chinese language itself.)


Jun 2015

An Interview with Outlier Ash

I’m very happy to report that the Outlier Dictionary of Chinese Characters I wrote about before has met its $75k funding goal. That means that this dictionary will soon be available through Pleco, so if you were holding out, doubtful it would actually happen, doubt no longer. Congratulations to the Outlier Linguistic Solutions team!

Ash Henson

Ash Henson

This is an interview with Ash Henson, Outlier Linguistic Solutions’ main academic guy. Like some other people I’ve spoken with, I was a bit apprehensive about the project at first, feeling it was all way too academic and probably not a good resource for beginners. The more I talked with Ash, though, the more I was convinced this was not the case. I do believe this is going to be a great resource for learners at all levels, and I look forward to using it myself, both for my own purposes, and for my beginner-level clients.

Anyway, here are some additional questions I had about the dictionary, answered by Ash.

1. You have an article on the problem with the concept of “radicals.” Would it be fair to say that radicals are just an outdated concept which we don’t need anymore because we can look almost everything up by computer now? Is your dictionary going to include the concept of radicals at all?

Well, I’d say that radicals are only reliable as a tool to look up characters in traditional dictionaries. If you only use electronic or software dictionaries, then it’s safe to say that you can ignore them. We will actually point out the radical for each character though, so that you can look up the radical for that character if you need to look it up in a paper dictionary. The main issue with “radicals” is that there are really several unique concepts that are called “radicals”. For instance, you often hear people say “Characters are made of radicals.” While that is a reasonable conclusion to make from the name “radical”, it misrepresents how characters actually work. There are around 500 semantic components that appear in characters and a lot of them cannot be broken down into “radicals”.

2. You’ve mentioned before that the Outlier Character Dictionary will include the most up-to-date research, including even corrections of mistakes in the legendary 说文解字 (Shuowen Jiezi). Could you give a simple example or two of that?

This type of data can be found in the Expert Edition. I’ll share two examples from the demo. For 監 (jiān) “to inspect”, the 說文 says that it is composed of the semantic component 臥 (wò) “to rest” which is used to express the idea “to look down from above” and the sound component 䘓 (kàn) “thick animal blood” abridged to 血. The problem is, 監 is a character from the early Shang dynasty (roughly 1600 bce to 1046 bce), while 臥 and 䘓 don’t appear until Warring States (roughly 475 bce to 221 bce).


Image taken from the Outlier Dictionary of Chinese Characters

Obviously, either this interpretation is anachronistic or maybe 臥 and 䘓 did exist earlier and we just haven’t found any proof. However, if you look at the earliest extant forms of 監, it’s very obvious that it’s a picture of a person looking into a container that has liquid in it. This “picture” is used to represent the idea “to inspect, examine” as this was how the ancients inspected their own faces, i.e., they used water in a container as a mirror.

Another example is 黑 hēi “black”. The 說文 says that the top part is a window and the bottom part is flame (炎 yán) and gives the meaning of 黑 as “the color of something burnt”. Note that the 說文 is explaining the Small Seal script form. The earliest forms show a person with a tattooed face. This is one of the ancient Five Punishments, where the name of the crime a person committed was tattooed onto their face.

3. After all this time, how can researchers be certain about what are mistakes in the 说文解字 (Shuowen Jiezi)?

Basically by way of tracing characters back to their earliest extant forms and seeing how characters are used in earlier scripts. Like in the 監 (jiān) example above, the 說文 says that it’s composed of 卧 and an abbreviated 䘓, but 卧 and 䘓 show up around a thousand years after 監. It’s like explaining the 1066 war in terms of the soldiers’ cell phones. Keep in mind, the author of the 說文 was a very erudite scholar, with a very broad range of knowledge, but he was limited by the information he had access to and by pre-scientific thinking. The 說文 is best understood as an insight into how Han dynasty Confucian scholars looked at the Small Seal script. Even with its problems, it still plays a very important role in this type of research.

4. You’ve told me before that a proper understanding of characters can help a learner guess the correct pronunciation of a character. This is hard to imagine, since a lot of components have a wide range of possible functions and even multiple possible pronunciations. (Examples: 干、赶、汗、旱 or 今、含、零、领、邻) How can you solve this mess?

Sound components can be really frustrating, because they generally don’t give an exact sound. In the same way semantic components give a hint as to the range of meaning a character might have, sound components generally also just give a range of sounds. English speakers might not realize this, but English spelling is very similar. That’s why the exact same spelling “minute” can be pronounced MIN-it for “60 seconds” or mahy-NOOT for “extremely small”. Actually, this second one can also be pronounced mahy-NYOOT, mi-NOOT or mi-NYOOT. As you can see, the spelling “minute” does not give an exact pronunciation, but a range of possible pronunciations.

As a native-English speaker, this isn’t a huge problem, because for the most part, we go from words we already know how to say correctly, to learning how to write them. During college we learn a lot of new, specialized words for the field of work we are training for. Most of these are learned either from reading or from hearing professors or other students use them. When I was in college, I often heard people say words incorrectly because they had only seen them in writing. This is a reflection of the fact that English spelling only gives a range of possible pronunciation rather than an exact, IPA-like pronunciation.

Making sense of sound patterns in Chinese characters is very useful, because they can be used to remember how to write characters. For instance, before I learned how sound works, whenever I had to write a character containing 艮 or 良, I would always ask myself, “Oh, man. Do I put that dot here or not?” It was very frustrating. Once I learned how sound components work, I looked up the pronunciation for 艮 (gèn) and 良 (liáng). Then I noticed that for characters pronounced “gen”, “hen”, or “ken”, it was 艮. If it was pronounced “lang”, “liang”, “nang” or “niang”, then it was 良. So, by learning about sound relations, I went from a meaningless dot-or-no-dot question, to a meaningful “What is the pronunciation of the character I want to write?” question. Though sound isn’t represented exactly in Chinese writing, there are a lot of clues we can use, especially if we know to look for them.

Now to the examples you brought up: 干、赶、汗、旱 or 今、含、零、领、邻

Let’s look at 干 (gān), 赶 (gǎn), 汗 (hàn), and 旱 (hàn) first. Notice that they all have the ending “-an” and that they all share the component 干. This is a strong clue that there is a sound relation. Also note that there is no discernible pattern with the tones. That’s because tones generally are not taken into account. Native speakers would generally use “-an” as the sound clue. However, it’s very useful to remember that “g-“, “k-” and “h-” are very closely related sounds.

As for 零 (líng), 领 (lǐng), and 邻 (lín). Notice that 令 is pronounced “lìng.” Once again, tones don’t count (not to say they aren’t important! They just aren’t represented by the sound component). Lastly, notice that the sound for 邻 ends in “-n” and not in “-ng.” In this particular case, that’s due to the simplification of 鄰 to 邻, and 粦 is pronounced “lín.”

Finally, looking at 今 (jīn) and 含 (hán), we notice that 今 and 令 above are graphically very similar, but like the 艮 (gèn) and 良 (liáng) example, we can use sound to keep 今 (jīn) and 令 (lìng) separate. Using sound patterns to understand the relation between 今 (jīn) and 含 (hán) is a little more complex. You have to understand both that “g-“, “k-” and “h-” are closely related as previously mentioned and that many “j-“, “q-“, and “x-” come from an earlier “g-“, “k-” and “h-“. In other words, two groups of closely related sounds are also somewhat related.

Why do sound series have this kind of variation? The answer to this question is fascinating, but complex. Most characters in use today find their origins thousands of years ago during the Zhou dynasty. Back then, the language was very different and very possibly had prefixes and suffixes and it was these prefixes and suffixes which cause this variation. Another reason is from regular sound changes over the last several thousand years.

5. Your dictionary is designed to provide a wealth of modern character research into characters through a modern interface. How would this be used by a beginner who sees characters as an annoying hurdle?

The key to optimal learning is obtaining the ability to use the system of Chinese characters as a tool for being able to recall character forms after long periods of time and as a tool for making intelligent guesses about characters you haven’t learned yet. Native speakers have these abilities, but they are far from perfect and they are the results of years of input. Non-native speakers learning Chinese can also get them after learning a few thousand characters.

However, as you can imagine, their instincts about characters are probably not as good as a native speaker’s. The main advantage of using our methods is that you can gain these abilities after a few hundred characters, because all of the sound and meaning connections are being pointed out explicitly for each character. And, as I showed above, if you learn our sound patterns, your feel for sound representation will be better than a native speaker’s. We also explain meaning connections in a more precise way, so your feeling for meaning representation will also be more accurate.

To those who think of characters as a nuisance, if you learn them our way, you’ll learn in a way that is both more meaningful (and therefore you’ll likely find it more interesting) and more effective, so you’ll spend less time re-learning characters. We can’t remove the pain entirely, but we can minimize it!

As of today, the Outlier Dictionary of Chinese Characters Kickstarter is sill going.


May 2015

4 Reasons I Want the Outlier Dictionary of Chinese Characters

There’s a new Kickstarter project related to learning Chinese definitely worthy of more attention: the Outlier Dictionary of Chinese Characters. I’ve had the pleasure of multiple Skype calls with John and Ash of Outlier Linguistic Solutions, and this project is no joke. They’re out to build something I’ve wished has existed for quite a while, and they’ve got the skills and dedication to make it happen.

The Kickstarter page is packed with explanation, so I won’t rehash the same information you can check out on your own. But I will tell you what’s interesting about this project to me.

  1. It integrates with Pleco. Pleco is already my favorite dictionary, largely because it contains so many different dictionaries. It would be annoying if the Outlier Dictionary were a separate app, and building an app from scratch is a huge drain on resources. So I think this was a smart way to launch the dictionary.
  2. The Outlier founders are learners turned experts (check out this profile). Sure, no one knows Chinese better than the Chinese, but the perspective of a foreigner that has the passion to devote years and years of his life to it is hugely valuable. They have put a lot of thought into the difference between how native speakers learn Chinese and how foreigners learn Chinese, they’ve deconstructed the process, and they’ve come up with a better way for foreigners to learn characters. We learners need this!
  3. The dictionary is academically rigorous. Unlike most dictionaries, it doesn’t hold the legendary 说文解字 (Shuowen Jiezi) as the ultimate infallible reference. In fact, research into mistakes made by the Shuowen are part of the dictionary. This is amazing!
  4. The approach taken to Chinese character structure is new and necessary. I’ve complained about certain products claiming that radicals are a revolutionary way to learn characters. They’re not. In fact, the term “radical” itself is outmoded and confusing, because it’s tied to outdated dead-tree character dictionaries. So the Outlier Dictionary rightly ditches the term “radical” in favor of “functional component,” and it doesn’t stop there. Check out this breakdown:

Outlier Functional Components

OK, but is it too geeky?

One of the concerns I expressed to the Outlier team was that they were building a dictionary for academics that didn’t really serve the practical needs of the average learner. They fervently assured me this was not the case; they are building a dictionary that enables a strong understanding of the system of functional components behind characters, while also enabling curious learners to go as deep as they want in their character studies. This is exactly how it should be done, so I can’t wait to get my hands on this dictionary. I also plan to keep working with the Outlier team and deepen my involvement in their project. I know that clients of AllSet Learning could really use what Outlier is developing.

I’m embedding a demo video at the bottom, but there is a ton of information on the Kickstart page, so check it out!

Outlier Linguistic Solutions — Demo Walkthrough from Outlier Linguistic Solutions on Vimeo.


Jul 2014

Abstracted Characters

Stylized letters and characters are interesting to me, but how abstract can you get with Chinese characters? You kinda have to retain the strokes and radicals and stuff, right? Maybe not…

Abstract Characters

The characters represented above are 小燕画院.

Although the name is readable, it might take a bit longer to decipher than most Chinese text, even for native speakers. Have you ever spotted characters that have been taken even further into the abstract?


Jun 2014

Spring 2014 Characterplay

It’s no secret that I enjoy seeing Chinese characters with some kind of visual design twist (I sometimes call it “characterplay”), and I’m getting more and more friends and readers sharing photos with me. Keep them coming!


Here’s one shared by Matt Scranton:


So the character that means “hair” is in traditional Chinese, in simplified.

Here are some others I found myself around Shanghai:




This one took me a few seconds to figure out:

Notesy Name


A play about suddenly getting rich.



A play on the phrase 霸气十足, which means something “totally dominating.” Changing the oppressive “霸气” to “爸气” (which isn’t a real word) makes it seem friendlier (and appropriate for a Father’s Day promotion), though.

Father's Day

Also, you have the little hearts in and .


Apr 2014

Can Project Naptha Read Chinese Text in Images?

Yesterday Project Naptha hit Hacker News. It offers a way to extract electronic text from image files through a simple Chrome browser extension. Excited to see that simplified and traditional Chinese are both supported by the extension, I immediately installed the extension and tried it out.

The results? Unfortunately, Not so great.

When it doesn’t work at all

First of all, the script needs to recognize the text in the image. This first step doesn’t always go too well, even if the text seems relatively clear to the human eye. Let’s look at some cases where the extension found nothing, despite the Chinese text being pretty legible.

In this first case, the font is non-standard. OK, fair enough. That’s to be expected.

Testing Project Naptha with Chinese

In this next case, the text is pretty clear, but the contrast is poor.

Testing Project Naptha with Chinese

In this final example, the text is fairly clear to the human eye, but also low-res and slanted. That probably makes it difficult for the algorithm.

Testing Project Naptha with Chinese

When it sort of works

In many other cases, some text was identified, but not enough for the extension to be really useful for anything. Here are some images where Project Naptha could identify some text, and the “select all text” function was applied. (The blue boxes show what Project Naptha identified in the images as “text.” Sometimes they are bizarrely incorrect.)

Some examples:

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

Testing Project Naptha with Chinese

I found the last two quite surprising, considering how clear and straightforward the text is, and also high-res.

When it actually works

Sometimes it was relatively successful in identifying the text. In these cases you must first set the language to Chinese (either simplified or traditional, depending on the text). There’s a cool effect showing you that some processing is going on. When that’s done, you can copy and paste the text.

Testing Project Naptha with Chinese

But… it might not be exactly what you were hoping for.

This selected Chinese text yielded the following copy-paste results:

Testing Project Naptha with Chinese

> 总统亲 ã热fl地接

> \早、待了葫芦兄妹

If it had correctly captured all the text, it would have been:

> 10、总统亲自热情地接

> 待了葫芦兄妹

This one is better:

Testing Project Naptha with Chinese

> 雹电二怪对兄妹俩尽效使用现代

> 化武器况妹俩也不示弱 麝芦神功连

> 连使出 胭宙电二怪打入深深的山沟

It should have been:

> 355、雷电二怪对兄妹俩尽使用现代

> 化武器况妹俩也不示弱,葫芦神功连

> 连使出,把雷电二怪打入深深的山沟

Also, my sample size is too small to make any definite conclusions, but it seems like the extension works better for simplified characters than for traditional.


I don’t mean to sound overly critical. This is amazing technology here, and the fact that it launched with any support for Chinese characters at all is pretty awesome (and brave)! I’m sure the technology will improve with time, and that is going to be tremendously helpful to Chinese learners.

To put this in perspective, the development of OCR (optical character recognition) for mobile devices meant that you could point your cell phone’s camera at any characters you see, and get feedback on what the characters say (sometimes). Project Naptha means the same thing, but for your home browsing experience. For me, that’s when I do a lot more Chinese reading, so it’s even more important. Once this technology is perfected, as long as you have a tool to help you read electronic Chinese text, you’re all set!

Personally, I think this is especially great news for comics. It’s no coincidence that I tested this extension out on comic book text. I’m really looking forward to seeing how this extension develops.


Apr 2014

Amazing Stencil-like Chinese Calligraphy Written in Chalk

OK, this would be amazing just by the fact that the following characters were written by hand, in chalk:

Amazing Chinese Calligraphy in Chalk

(I wouldn’t have believed that these weren’t somehow stenciled on somehow if I didn’t see the man writing the characters with my own eyes.)

…but how about that they’re also written upside down, by a farmer with no legs?

Amazing Chinese Calligraphy in Chalk

When I went by this spot an hour later after dinner, the chalk characters had all been rubbed out.


Apr 2014

A Realistic Look at the Challenges of Reading Chinese

The following is a guest article written by a Sinosplice reader, Julian Suddaby. I have followed it with some commentary of my own.

Warning: if you’re a member of the “Chinese is super easy” faction, this article might annoy you a little, but be sure to read through to the end!

How Many Characters?

by Julian Suddaby, 2014-02-13


I asked Google “how many chinese characters do I need to learn” and the best sites I found pointed to linguist Jun Da’s website and used his data to argue that 3,500 characters should be enough for most people, being that you’ll know around 99.5% of the characters in general circulation. [1] Is that really enough?

Well, if you’ve got to that point, congratulations. It’s an achievement. But you may not want to stop accumulating characters just yet. Indeed, sad to say, at 3,500 you won’t even be able to read Jun Da’s name, being that 笪 is way down at frequency #5,231. [2] So how many, then, do you need to learn? Well, that depends on one question that you should ask yourself: what exactly do you want to read?

A Newspaper

Students often want to read Chinese newspapers. The Southern Weekly 南方周末 being a popular choice, I took the ten most popular articles over the previous thirty days and ran them through a computer program that checked them against Jun Da’s most frequent 3,500 characters. The results are fairly encouraging for the Chinese student, I think: if you knew the 3,500 you’d only encounter forty-four new characters over the course of those ten articles, and twenty-nine of those you’d only see once and so would probably just take a guess at from context and move on. But you’d possibly want to look up 甄, a pseudonymous surname given to the subject of one of the articles (and thus appearing thirty-five times); 闰, used in the name of a Zhejiang corporation which appears to have buried five hundred tons of poisonous chemicals in their backyard (seven appearances); and 驿, used in the name of a company involved in a online security breach (also seven appearances). [3]

So, while you probably shouldn’t throw out your dictionary just yet, it does seem that trying to read a newspaper won’t be a disheartening experience.

A Children’s Book

Children’s novels are another popular choice of reading material for language students. Shen Shixi is a well-regarded children’s novelist, whose Jackal and Wolf has recently been translated into English by Helen Wang. I ran an analysis on another of Shen’s novels, 《鸟奴》(lit. “Bird Slave”). This is, character-wise, much more difficult than the newspaper articles, with two hundred and one characters not in the top 3,500. Ninety of those are used more than once. As you’d expect from Shen, the “king of animal fiction”, animal-related vocabulary is one particular problem here, and you’ll probably end up very confused if you don’t look up 鹩, used two hundred and eighty-four times; 喙, used thirty-six times; and 獾, used twenty-two times. [4]

The novel is about two hundred and forty pages long, and so you should expect to find a character you don’t recognize on most pages.

A wuxia novel

Jin Yong’s novels remain firm favorites. Rather than starting with the four volumes and 1,300 pages of The Legend of the Condor Heroes 《射雕英雄传》, students might perhaps try A Deadly Secret《连城诀》, which is just four hundred pages or so. In those four hundred pages you’ll encounter two hundred and ninety-six characters not in the top 3,500.The most frequently used are from the protagonists’ names (水笙, 水岱, and 万圭), but there are plenty of new common nouns and verbs used multiple times as well. [5]

On a page-by-page basis, you should recognize more characters than in the Shen Shixi novel above. In terms of total characters, however, A Deadly Secret is more of a challenge.

A modern classic

Lu Xun’s A Call to Arms 《呐喊》, despite collecting stories he wrote at a very early stage of modern Chinese literary vernacularization, should not be much more difficult than the two novels above—at least in terms of basic character recognition. Two hundred and thirty unseen characters in total, with 闰 (remember that one from above?), 珂 (used in a name) and 锵 (a sound) taking the top three spots. [6]


Even from this very cursory analysis, it appears that if your goal is to read Chinese fiction comfortably without a dictionary, you’re going to need to recognize more than 3,500 characters. Chinese writers use characters well into the four or five thousand frequency range very regularly.

So although reaching 3,500 is worth celebrating, I wouldn’t stop trying to acquire characters just yet. Keep reading and dictionary-checking, and don’t abandon memorizing/spaced repetition if that’s something you find helpful. [7] You’ll still be coming across new characters for a long, long time…. [8]

  1. See http://lingua.mtsu.edu/chinese-computing/statistics/index.html.
  2. 笪 Dà (a surname here, but means “a coarse mat of rushes or bamboo”, with 旦 dān providing the phonetic). Here and later I’m using Wenlin as my main reference for character glosses.
  3. 甄 Zhēn (a surname here, but originally meaning “to make pottery” and thus composed of 垔 and 瓦, but with no phonetic clue), 闰 rùn (used in a name here, but means “intercalary”; the much more common 润 shares the same pronunciation), 驿 yì (used together with 站 to mean “post/courier station”; right-hand side is the phonetic, as in 译).
  4. 鹩 liáo (“wren”, with the left-hand side providing the phonetic), 喙 huì (“snout; mouth; beak”, with both 口 and 彖 radicals semantic; no phonetic clue), 獾 huān (“badger”, with the right-hand side phonetic).
  5. 笙 shēng (“reed-pipe instrument”, bottom is the phonetic), 岱 Dài (“Taishan mountain”, top is the phonetic), 圭 guī (“jade tablet”, cf. 挂 or 桂 for the pronunciation).
  6. 珂 (“a jade-like stone”, right-hand side is the phonetic), 锵 qiāng (“clang”, right-hand side is the phonetic).
  7. For the more technologically-oriented student, another option may be available: thanks to the increasing availability of texts in machine-readable formats students could run their own frequency analysis on a text they wanted to read and pre-learn characters they don’t already know. It’s a pity there don’t seem to be any easy-to-use programs or websites that offer this functionality.
  8. It should also be noted that single character recognition is only part of reading Chinese, and is not on its own a good measure of reading proficiency. That said, the relative ease of measuring character recognition and frequency may justify its limited use as a self-diagnostic and motivational tool for learners of Chinese.

The following is my response:

> Interesting! This sort of helps make a case for the importance of graded readers. (Have you seen Mandarin Companion?)

> While I know your intent is to SEEK THE TRUTH, the overall tone of the article is, unfortunately, a little discouraging for struggling learners. For me, this totally highlights the need for materials that give the learner a sense of accomplishment for having reached 300, 500, 1000 characters, rather than an incessant message saying, “STILL NOT GOOD ENOUGH.”

His response:

> You’re quite right, I suppose I am a little too rigidly 实事求是 in the piece! I completely agree with you about the need to avoid the demotivating “still not good enough” feeling and message that permeates most Chinese teaching materials (how I remember my exasperation when the 高级 textbook still required fifty plus new vocabulary items per short text!). There’s really a huge need for more good reading materials with limited character/vocabulary ranges, and your graded readers look fantastic.