Oh, and by the way, in this instance, Google Wave wins, 65% to 35%, making it part of an exclusive club of things harder to understand than Google Wave, which also includes “women, Scientology, the United States Tax code, Chinese telegraph code, Microsoft Visio 2004, and Obama’s Nobel Peace Prize.”
My recent post on the Wikimedia Commons Stroke Order Project prompted Mark of Toshuo.com to decry the relative dearth of traditional characters being added to the project. To this, David on Formosa reminded Mark that there are also a large number of characters shared by the traditional and simplified character sets.
At this point I’ll interject a visual aid (gotta love them Venn diagrams!):
All this got me thinking about the following question: If “s” represents the characters in the simplified set not shared with the traditional set, while “t” represents the characters in the traditional set not shared with the simplified set, and “u” represents the characters shared by the two sets, then what are the number of characters belonging to groups s, t, and u, respectively?
It seems like a simple enough question, but it’s actually quite tricky for a number of reasons.
First, the total number of Chinese characters in existence varies according to source, and largely depends on how many non-standard variants you want to include in your total set. You can be reasonably certain the total number is less than 50,000, but that’s still a pretty ridiculously large number, when most Chinese people regularly use less than 5000. For basic purposes of comparison, it makes sense to limit your set to a certain number of commonly used characters, but which set? One from the PRC? From Taiwan? From Hong Kong? From Unicode?
Second, you might be tempted to think that s = t, because simplified characters were “simplified from” traditional characters. This isn’t true, however, because in many cases multiple traditional forms were conflated into one simplified form. To give a very common example, traditional characters 干, 幹, and 乾 are all written 干 in simplified. So adding these three characters adds 1 to u, 2 to t, and 0 to s. There are lots of similar cases, so clearly t is going to be significantly larger than s. But by how many characters?
I’d be very interested to see a concrete answer to this question, regardless of the character limit used. I also wonder how the proportions of s, t, and u vary as the character limit is increased, and more and more low-frequency characters are included.
If you’ve got an answer, I’d love to hear from you!
If you’ve checked out many online Chinese dictionaries or websites on learning Chinese, you’ve seen a variety of ways to present characters’ proper stroke order. Animated GIFs are a favorite, but they often fall flat in one important respect: they display each stroke in a single frame, often leaving the direction of the stroke somewhat unclear.
This is where the Wikimedia Commons Stroke Order Project impresses me: not only are the animated GIFs large and attractive, but they fluidly demonstrate the direction of each stroke. A nice example:
> Hello, and welcome to the Commons Stroke Order Project. This project aims to create a complete set of high quality and free illustrations to clearly show the stroke order of East Asian characters (hanzi, kanji, kana, hantu, and hanja). The project was started as there was none like it in terms of quality and it seems that it is the only one working on all three schools of Han character stroke order; simplified and traditional Chinese, and Japanese.
> You are free to use the graphics we’ve made and welcomed to join us and contribute to our progress. It’s easy, you just have to follow the simple steps stated in our graphics guidelines.
At 378 total characters, the project is still far from a complete set, but it’s off to a nice start!
I do, wonder, though, what kind of stroke order information is freely available out there that could speed the process along. I’ve seen enough separate sets of animated characters to make me suspect many have been automatically generated. (Anyone have info on this?) I’m also curious how the project is going to deal with the annoying issue of variable stroke orders.
> I hope that my system gives a context, even for non-visual learners, for distinguishing between the four tones in Mandarin and providing a mnemonic system to help them remember which tone goes with a particular word.
From the moment I first heard of this idea, I was intrigued by it. Associating tones with colors does open up a lot of possibilities. Once the system is internalized, you can drop tone marks and tone numbers altogether, and you can tone-code the Chinese characters themselves using color. (The best non-color approximation to this would be writing the tone marks above the characters, which you will find in some textbooks and programs.) So I was very receptive to this idea.
Despite being very open to the concept, when I saw the actual colors chosen to represent each tone, they just felt wrong to me. The pairings Dummitt chose were:
Why would these colors feel wrong to me? How could the tone-color associations be anything but arbitrary?
The reason that the colors felt wrong to me was that I had already thought about the relationships between the tones and my own perceptions of those tones. I had even (briefly) considered color when I sketched my “Perceptual Tone Contours” idea:
Specifically, I felt that first and fourth tone feel similar, and that second and third tone feel similar. I believe that perceived similarity is strong enough that it affects both listening comprehension and production. This is why I purposely colored first and fourth tone red in my diagram, and second and third tone blue.
An Alternate Color Scheme
OK, so now we’re getting down to the point of my post. As a thought exercise I asked myself: If I had to assign colors to the four tones, which colors would I use?
In answering this question, one has to believe that there are underlying principles which, when followed, might produce better results. Otherwise, arbitrary assignment is fine. So what are the principles? I have two:
1. The colors need to have a high degree of contrast so that they will stand out on a white background and not be confused with each other.
2. The colors chosen need to reflect the appropriate perceptual similarities.
There are other considerations you might take into account if you want to be super-thorough, of course. From an Amazon reviewer of Dummitt’s book:
> If a person was going to design a color code tone system they would probably want to avoid using red and green in the same color scheme. Red – green color blindness causes an inability to discriminate differences in red and green. Hence the testing when you get your driver’s license. 5 to 8 percent of males have this color blindness.
> Using red and orange in the same scheme is also not very bright. Much language learning is done on buses, trains, planes and their attendant stations. Lighting is sub-optimal in all these situations and much worse in China. Low light intensity impairs the ability to discriminate red from orange.
These points have some merit, I suppose, but I’m not sure what colors they leave. I’m sticking to the two principles I listed above. I don’t see how you’re going to avoid either red or orange altogether if you need easily distinguishable, high-contrast colors.
Regarding the principle of high contrast, I can’t disagree with Dummitt’s choices. You can’t choose yellow, and the ones he chose are easy to distinguish quickly.
As for perceptual similarities, I would reflect these similarities by grouping the four tones into two warm and two cool colors. In my Chinese studies over the years, I have often associated fourth tone with aggression or anger, both concepts which I would associate with the color red. Red = fourth tone is the strongest association I have, but from there, all the others fall into place. You can’t use yellow (poor contrast), so orange is your other warm color, going to first tone. My diagram has fourth tone and second tone diametrically opposed (falling versus rising), and green is directly opposite red on the color wheel, so I would go with green for second tone. That makes third tone blue.
The comic says that the character 明 actually derives, not from 日 and 月 as is commonly taught, but from 囧 and 月. This etymology seems to confirm it. So one of the earliest character etymologies we learn (sun + moon = bright) is either a lie, or actually just a bit more ambiguous than we were led to believe? Interesting!
“Reduplication, in linguistics, is a morphological process by which the root or stem of a word, or part of it, is repeated” (Wikipedia). You see reduplication in Chinese a lot, with verbs (看看, 试试), nouns (妈妈, 狗狗), and even adjectives (红红的, 漂漂亮亮).
You get reduplication is Japanese too (some of the coolest examples are mimetic), in words such as 時々 or 様々. As you can see, rather than writing the character twice, the Japanese use a cool little iteration mark: 々. Now if the Japanese learned to write from the Chinese, why don’t the Chinese use the same iteration mark?
According to Wikipedia, the Chinese sometimes use 々, but you don’t see it in print. This is true; what the Chinese use (only when writing shorthand) actually looks something like ㄣ. Ostensibly, because you never see 々 in print in China (or it never even existed in neat, printed form), it comes out a bit sloppily as ㄣ in Chinese handwritten form.
I recently read a cutesy Taiwanese comic called 兔出没，注意！！！ Rabbits Caution about the lives of two rabbits named 呵呵 and 可乐 and their owners. In the comic, the author took a rather “mathematical” approach to reduplication. Look for 宝宝 and 玩玩 in this one:
Look for 看看 and 谢谢 in this one (and don’t be confused by the 回 in 回家):
In this frame, even “bye-bye” gets the treatment:
While cute, I figured this representation of reduplication was not likely original. I was quite surprised, however, to see an almost identical representation on Wikipedia dating back to 900 B.C.! The quote:
The bronzeware script on the bronze pot of the Zhou Dynasty, shown right, ends with “子二孫二寶用”, where the small 二 (two) is used as iteration marks to mean “子子孫孫寶用”.
Well, as they say, there’s nothing new under the sun, and history repeats itself. The weird thing is that 2 and 々 even sort of look alike, in the way that 々 and ㄣ do. 2 is 々 without the first stroke, and ㄣ is 々 without the last stroke. Meanwhile, the ancient Chinese iteration mark 二 bears a striking resemblance to the modern “ditto mark” used in modern English! (I’ll leave those for the orthographical conspiracy theorists among you to chew on.)
I had a great time interacting with other teachers at ACTFL 2008. Yes, what we do at Praxis Language is quite different from what the teachers in the trenches do, but it’s important to connect with them, to hear about how the classroom is changing, how the students are changing, and maybe even about how we might converge in some areas.
I sat in on some particularly interesting talks on CFL (Chinese as a Foreign Language). Only half a year after I finished my own thesis, I felt I really needed to be reminded of the wide world of academic pursuits… some of the research was quite fascinating. I’m planning to revisit some of the topics here in my blog in the next few weeks.
In the meantime, I’d just like to draw my readers’ attention to a cool product I ran into at ACTFL: Skritter [China-friendly link]. It’s a really well-executed online system for practicing character writing, and it has built-in support for Integrated Chinese. Check it out.
I started learning Japanese in 1996. When I began learning Mandarin in 1998, I already had a foundation in Chinese characters, thanks to my Japanese studies. Learning the two languages at the same time, I was frequently annoyed by little discrepancies such as 歩 and 步, 別 and 别, 氷 and 冰, etc. Those little character details caught my attention, though. I ended up writing my senior thesis on how and why the Chinese characters of the Chinese and Japanese writing systems ended up diverging.
One little detail that always nagged at me, though, was stroke order. The truth is, stroke order of Chinese characters is not consistent across Japanese and Chinese. I was reminded of this recently by Tae Kim’s blog entry entitled, What’s the stroke order of 【龜】? Who cares? He brought up the stroke order of the character 必 as an example of a “weird character.” This character just happens to be one of the ones whose correct stroke order has been ever so slightly bugging me all these years.
必 is a great example, because it shows up in plenty of relatively simple words in both languages, like 必要 (necessary) and 必须 (must) in Chinese, and 必ず (without fail) and 必要 (necessary) in Japanese.
Now let’s take a look at the stroke order of this simple character. I’ll have to assign letters to each stroke so that we can keep the different stroke orders straight:
– Ocrat, MDBG, and Wenlin all say A-B-C-D-E.
– Learn to Write Characters (click on 必), maintained by Dr. Tim Xie, says A-B-C-E-D.
– A-B-C-E-D makes a lot of sense to me, because the character’s radical is 心 (but that doesn’t necessarily matter at all).
– Remember that Chinese has the added excitement of the simplified/traditional divide, as well as other regional differences in the mainland, Taiwan, and Hong Kong.
– If you have more to add to this (especially from more authoritative sources). please leave a comment!
– WWWJDIC, Kawatsu, Kodansha, and Gakken all agree on the bizarre C-D-B-A-E.
– It’s almost as is they’re writing 义 first, then adding “wings,” but no, the radical here is 心 as well. (We can see why Tae calls it weird.)
Hmmm, that’s a lot of inconsistency. Gives you more respect for the people that can create good Chinese handwriting recognition software, doesn’t it?
But wait! It doesn’t end there. An even simpler character — 出 — behaves inconsistently as well. I’ll spare you all the details and jump to a diagram taken from a very interesting tool I found illustrating various stroke order differences:
Note that aside from the incredibly common 出, the heart radical 忄 — a component of tons of very common characters — is also among the ambiguously stroke-ordered. Notice too that the Japanese-only variants are not included in this list.
So what’s my point? Well, it’s not any of the following:
– Chinese is really hard
– Chinese characters are really complex
– Chinese characters are hard to learn
– Chinese character stroke order is fun!
Chinese is not semi-mystical. Chinese characters were created by people a really long time ago, and thus it is an amazingly imperfect, inconsistent system. East Asian brains aren’t semi-mystical either; with all these differences going on you can bet that the Chinese and Japanese get mixed up too. In fact, armed with the chart above you’ll find it really easy to spark debates with very literate Chinese over the “correct stroke order.”
Like me, you may be bugged by these inconsistencies. You may feel compelled to seek out some underlying pattern or just memorize a big list of exceptions. Don’t do it! Be satisfied with a quick look over the chart above. Just get the non-exceptional stroke order basics down and you’ll be fine, trust me. Don’t obsess over perfect stroke order and all the exceptions, because it’s an imperfect system. The deck is stacked against you. Learn to read and use characters to communicate, and you win.
Thinking about it now, I find it strange that I’ve never written about James W. Heisig and his landmark work, Remembering the Kanji.
It was in 1997 while I was studying in Japan that I came across the book. I was still in this “I must write every new character a million times every day” frame of mind until I came upon this system, and after discovering it I abandoned the traditional approach forever. The book ignited my imagination and unleashed its energy on Chinese characters. Heisig’s system ensnared me immediately, but surprisingly, the more I studied the method, the more I found myself dissatisfied with Heisig’s mnemonics and devising my own. I bought a copy and wrote all over it, “correcting” it for myself. Personalizing it, you might say. Heisig would have approved.
I didn’t stay with the system forever. I never learned a mnemonic for every last character. There just came a point when everything sort of “clicked,” and memorizing characters wasn’t difficult anymore. Sure, I would forget characters (and I still do), but every time I’d forget one and have to look it up, those old mnemonics returned to me and helped lock that character back in my memory. The important thing is that I never had to write characters over and over again. I’ve passed various written Chinese tests without ever having to do that. I have been able to make better use of my time and of my mind.
Occasionally I would come upon a character that resolutely defied my memory. If the character mattered to me, it would get “special attention.” That meant setting aside some time to deconstruct the character, research the etymology (sometimes, but not always, a helpful practice), and apply some imagination. It might take as long as 20-30 minutes for just that one character, but eventually I would come up with a memorable story mnemonic involving the character components, tailor-made for me. And then I would not forget the character again.
In short, Heisig’s book totally changed the way I approach characters. It’s a triumph of imagination over rote learning. I am very grateful to him for that. If you’re trying to learn Japanese or Chinese, I strongly recommend you get Remembering the Kanji.
Language Log recently published a post by Victor Mair entitled How to learn to read Chinese, in which Dr. Mair talks about a Chinese language newspaper with pinyin accompanying each character called Guoyu Ribao (国语日报). He hails it as a great way to pick up characters.
This is all well and good, but I was quite surprised by this paragraph (bold mine):
> Guoyu Ribao was a godsend in that it enabled me to learn Chinese characters passively and painlessly. By assimilating massive amounts of publications from the Guoyu Ribao people, before long I was able to read texts without phonetic annotation. Slowly, with practice, I also became capable of writing in characters as well.
While I agree that overloading new students of Chinese with character memorization is a bad idea, the words passively and painlessly in regards to learning Chinese characters just don’t seem right. (Does Dr. Mair know Dr. David Moser?) Interesting material goes a long way toward motivating students to learn, but no matter how you slice it, there’s quite a bit of work involved in becoming literate in Chinese. Yeah, it’s a bit painful, and yeah, it’s active work. While Dr. Moser exaggerates for fun, Dr. Mair seems to give pinyin a bit too much credit.
Sam explains how net-savvy Chinese have re-appropriated the character 囧, using it for what it looks like (a distraught face), rather than for what it originally meant (“bright,” apparently). Sam explains various dimensions of the phenomenon on his blog, but this is really cool for linguistic reasons. It’s not often that a non-pictographic character (with a rather abstract meaning) is reenlisted as a pictographic character and used on a relatively large scale!
“Christmas” in Chinese is, of course, 圣诞节, but in the spirit of my previous Character Creations, I’ve created two new single characters that mean “Christmas.”
Character Notes: some radicals in the creations above were chosen for semantic reasons, but many elements were chosen for purely visual purposes. In some cases I purposely shunned a more obvious option (such as 木 for “tree” or 星 for “star”) because they didn’t have the visual effect I wanted. In the case of 光, it not only looks more like a traditional star-shape than 星, but it has Biblical meaning as well.
Here’s a picture of a place near work where I occasionally eat:
I have nicknamed it “Filthy Delicious.” The name says it all.
What’s interesting to me, though, is the name of the cuisine boldly painted in red on the wall: 麻辣汤. This is interesting because once upon a time I was under the impression that this was the correct name, but enough chastisement from Chinese friends converted me to the “real name”: 麻辣烫. And yet there it is, in red and white, on the wall in the picture (in traditional characters, which, as you can see, totally adds class).
Search results for the two terms:
The name 麻辣汤 makes sense, because the final character 汤 means “soup,” and the dish itself is a kind of soup. (As I’ve mentioned before, it’s sort of a spicy “poor man’s hot pot.”) The final character in the latter, “correct” one is 烫, which means “burning hot.” This makes a kind of sense, except that the name becomes then a bunch of adjectives without any noun (like “soup”) to anchor it. That noun would usually come last in a Chinese dish name (as in 麻辣汤).
So what’s going on here? I haven’t had time to research it (ah, the advantages of blogging!), but I suspect it’s about tones. The name “málàtàng” likely comes from a dialect where 汤 (soup) is read “tàng” rather than standard Mandarin’s “tāng.” This kind of thing happens all the time in China’s rich linguistic tapestry, and the questions raised go something like the following:
1. Can the character 汤 have the reading “tàng” as well as “tāng”? This is not ideal, especially if there is no precedent in standard Mandarin. This would amount to a “corruption” of the character’s original reading.
2. Can we change the pronunciation of “málàtàng” to “málà tāng” for consistency? This seems ideal except that it would never work. It’s awfully hard to control how people talk, especially after they’ve settled on something.
3. Can we change the character used to represent “tàng”? If it comes from a dialect, it likely doesn’t have a standard written form anyway. If we can find something similar in meaning, a practical compromise is reached.
It seems to me that in my imaginary scenario path #3 above was selected, and character 烫 did the dirty work. I call it “dirty” because while it is no longer “a corruption of the character’s original reading,” it is instead a semantic corruption. 烫 originally means “burning hot” or “boiling hot,” but now you’re making it mean “soup,” or, if you choose to put it another way, you’re making it a non-semantic syllable in the three-character unit 麻辣烫. I don’t buy that, though, because the characters 麻辣 clearly keep the meaning of “numbingly spicy.”
If I have a point with all this, it’s that you can’t control the evolution of a language. Sure, a writing system need not necessarily do that, but when you encode individual characters with both semantic and phonetic information and then try to keep either from changing, you’re just kidding yourself. This is only a small example, but it’s a pretty widespread phenomenon now that the writing system is being used by the literate masses as a whole rather than a few elite (and the internet is certainly exacerbating the situation). Given enough time, so many characters will have their meanings muddled that the writing system will be reduced to the world’s most cumbersome “phonetic” system.
I’d be really curious to see what the written Chinese language looks like in 2000 years. It’s not going to look at all like it does now.
It’s great because it can recognize fluid handwriting where the strokes run together. Yes, you may have seen that kind of software before, but keep in mind that this is a free online dictionary.
Below are some examples of horrible handwriting being correctly recognized.
(Each character to the right displays its pinyin when you mouse over it.)
One of the really cool things about the handwriting recognition is that it keeps going in realtime as long as you write, and it always guesses. I’ve used programs that reach their recognition limit and just say, “nope, can’t do it.” Well, not this one. It gets an A+ for effort.
This, of course, leads to some fun experimentation. Here are a few of mine:
Thanks to David for introducing me to this website.
I just found these on YouTube. Hilarious. Just watch.
The amazing thing is that there are apparently over 30 of them! The camera work and pedagogy don’t get any better over time.
The full description of the first one led me to believe that the whole thing is just mocking a well-meaning old Chinese man, but then why would it go on for over 30 lessons? Plus more and more effort is clearly going into the on-screen presentation with the later clips.
Shortly after I arrived in China and observed the deaf community in Hangzhou, a beautiful thought struck me. Deaf people communicate in an entirely different way. If all the deaf people in the world use sign language, they could all learn the same sign language and communicate with each other regardless of race or nationality. No barriers. A truly international language!
But alas, that was not to be. You see, sign language doesn’t just “substitute for” or “imitate” human language… it is a human language. As such, it is subject to the same restrictions and limitations by which all human languages are bound. In this case, one of the most important factors is that deaf communities are very often isolated. They’re isolated within a country, with a city, or within a district. Without a means to regularly communicate, communities drift apart linguistically over time.
Not only is Chinese sign language different from sign language of other countries, but it also varies from city to city. The sign language of Shanghai differs from that of Hangzhou or Beijing, for example. Even so, there is a national standard promoted. (I’m not sure how hard the Chinese deaf communities strive to adhere to it.)
One of the ways that Chinese sign language sets itself apart is its references to Chinese characters. Certainly not all signs make reference to Chinese characters, and those signs that do make reference to characters don’t necessarily do it in a character-for-character way, but the influence of characters in Chinese sign language is tangible.
So last week my hard drive stopped working right before the deadline for my semantics/pragmatics paper. I was able to get an extension, but “I had computer troubles” is the “dog ate my homework” excuse of the modern age, so, conscientious student that I am, I felt the need to get documented “proof” of my computer troubles. I had the computer shop that fixed my hard drive write up a note and put the company seal on it.
Yeah, the whole idea sounds kind of silly, but not nearly as silly as it ended up looking. The computer guy used a tiny scrap of paper to scrawl the note. It just looks ridiculous (and yet, somehow… awesome?):
Anyway, I thought I’d share it because I kind of enjoy the challenge of reading handwritten Chinese. This note isn’t too difficult to read, although a little challenging in parts. If you want help deciphering, though, click through to the Flickr page.