16 Feb 2006

This week I’m finally getting around to writing my two remaining final papers for last semester. Classes don’t start until something like the 27th though (I think). One of my assignments is to revise my paper on Chomsky according to my professor’s comments. That shouldn’t be too hard, except that she left a few questions on my paper that would seem to warrant entire essays of their own in order to answer. (Ah, she won’t remember what she wrote on my paper, right??)

The other essay is a response to one of the lectures given in a seminar course. The Chinese name of the course was 当代学术前沿讲座 which basically translates to “a bunch of boring lectures.” The only one I found remotely interesting was 汉语语法的问题和方法 (Issues and Methods in Chinese Grammar). So that sure narrows down my possible writing topics.

Looking at my notes for that lecture, I discovered a mere half page of scribblings. (Yeah, I sure got a lot of daydreaming done during those lectures…) Well, at least I managed to record the “issues” the professor discussed:

1. The issue of word order and functional particles (虚词) in Chinese
2. The issue of formal written language
3. The issue of the non-correlation of parts of speech and sentence components in Chinese
4. The issue of special sentence patterns

Hmmm. Fascinating. I think I’ll write about word order some and throw in other stuff I think is interesting. This isn’t a research paper, so I’m not sure exactly what I’m supposed to write. Are they asking for BS? Regurgitation? It’s all unclear. So I plan to share my amazing and unique insights as a foreigner.

One matter the professor brought up which I did find interesting is the matter of word parsing. Since there are no spaces between words in Chinese sentences (and even the concept of “word” as it applies to Chinese is debated), you can get some confusing possibilities. The professor offered this example:

> 解放大道路面积水。

The sentence seems pretty straightforward, and can be translated as something like, “Water accumulated on the surface of Jiefang Boulevard [lit. ‘Liberation Boulevard’].” Actually, though, any two adjacent characters in that sentence comprise a “word.” Check it out (definitions courtesy of Wenlin):

– 解放: jiěfàng* v. liberate; emancipate ◆n. ①liberation ②〈PRC〉 | ∼ qián before liberation in 1949
– 放大: fàngdà* r.v. enlarge; magnify; amplify
– 大道: dàdào n. ①wide road ②the way of virtue and justice ③the Great Dao; the Great Way M:¹tiáo
– 道路: ¹dàolù n. road; way; path M:¹tiáo
– 路面: ²lùmiàn n. road surface; pavement
– 面积: miànji n. surface area
– 积水: ¹jīshuǐ* n./v.o. stagnant/accumulated water; flood

This issue is especially relevant in computational linguistics. How should computer programs interpret a sentence such as the one above? Wenlin doesn’t try to figure out which definitions apply semantically, it just offers you the definition of whatever word starts with the character you hover on (so if you hover on “放” you get the definition for 放大, not the relevant 解放). But if you were writing a script which would attempt to parse the sentence according to its actual meaning, how would you do it? This is a major issue for translation projects like Adsotrans.

Obviously, this can also be a serious issue for students of Chinese. More than once I’ve been stumped by this kind of thing.

Another issue I think is somewhat interesting is one of punctuation. Specifically, the 逗号 (,) versus the 顿号 (、). The 顿号 is a special kind of comma used for items in a list. When I first started studying Chinese, I thought, “that’s dumb. Why do they think they need that? If one kind of comma is enough for English, why shouldn’t it be enough for Chinese?” It wasn’t until I got into some translation work involving long, complex sentences that I began to truly appreciate the 顿号. In the beginning, I would sort of read both kinds of commas in the same way, but then I would hit certain sentences which required me to pay attention to the 顿号 or I would lose the meaning of the sentence.

Sometimes I think of the Chinese comma (逗号 / ,) as sort of a run-on enabler. This is because Chinese is much less fascist than English about when you need to end a sentence with a period and when you can just keep blabbing forever, adding the occasional comma. This feature of the language leads to a tremendous amount of run-on sentences when Chinese students write English, and it can be very annoying to English teachers.

This entry has been sort of an exercise in brainstorming to figure out what I’m going to write about in my paper. Any comments are very welcome (the sooner the better!), as they may provide further inspiration for what I write about in my paper.


John Pasden

John is a Shanghai-based linguist and entrepreneur, founder of AllSet Learning.


  1. Give up! Repent! Prepare to meet thy (academic) maker.

    What emphasis have your lecturers placed on the importance of context? And what approaches to explaining its role in the process of expressing meaning in interaction have they introduced? I know that this is key in any language, but as a student of chinese, I remember having to understand the need to rely on 上下文 in a way that up to that point I hadn’t realized.

    One more thing, just remember that Chomsky is just an engineer. A very good engineer but a simple, low, uninspired engineer none the less. But Wittgenstein, he was a genius. When will you Americans learn that yours is a nationality not destined for greatness? Everyone knows that European Jews are just smarter. I think if your revised paper expresses these sentiments succinctly but entirely, you will score highly….

  2. Let me try some random thoughts.

    You are right about Chinese’ use of commas, and about 顿号. The latter I think is a cool invention in Chinese and, IMHO, should be imported into English (see below). Ha, like that’s gonna happen.

    In today’s (casual?) English, the (new? and/or unofficial?) punctuation mark of “/” is often used to connect optional words which I believe in Chinese would just be piled together with nothing in between. I would like to see this one imported into Chinese. Ya, this will likely happen.

    One thing that always gets on my nerves is the unnecessary and/or unprecise use of 连词. This, to me, (the use of these two commas for 插入语 is also absent in Chinese) is a weakness in Chinese language and/or thinking. An example would be “于是她走了,那么她为什么走呢,所以我就不理解。” Here all 连词 are inserted as if they are 虚词. A worse, colloquial version goes like “然后她走了,然后她为什么走呢,然后我就不理解。” I always tease those who excessively use 然后 by a “再然后呢?” In this case, it would be clear enough to say “她走了。她为什么走呢?我就不理解。” However, people do speak and write this way, kinda like in English people use “you know”、 “you know what”、 or “I tell ya what…”

    That Chinese often allows rearrangement of word/component order might make an interesting topic. Example, the following would be (perfectly) the same. 我今天早上对他讲了那事。 那事我今天早上对他讲了。 对他讲了那事我今天早上。 我早上对他讲了那事今天。 我那事今天早上对他讲了。 我今天早上讲了那事对他。 我今天早上那事讲了对他。 那事讲了对他今天早上我。

    I think you have an advantage in being able to think in a comparative way.

  3. However, some particular word order DO change the meaning: 我对他讲了今天早上那事。 我今早上对他讲了那天事。

  4. One pause marker that continually confuses me is in the sentence 请大家前来朝拜、创造我们的天主, one of the antiphons from 每日礼赞. Every time I run across it, I think it must be a misprint. Sure, stringing verbs together usually requires some kind of coordinator, but it still seems to me like this sentence is inherently ambiguous, with or without the comma.

    The other punctuation issue that keeps tripping me up is the use of the dash: most style guides, it seems, require separation only from what comes before, and the choice of where to break off the comment is left to readers. A random example: “电视剧《梅艳芳菲》以香港影星梅艳芳生平为原型创作,早在去年就开始筹备,昨日,该剧出品方之一——万科影视方面证实,原来赵宝刚早已于春节前正式加盟剧组……” and the sentence goes on from there.

    What’s interesting to me is how people sometimes invert normal grammar for emphasis during speech – something like 好漂亮啊你,今天! – something that doesn’t seem to be covered by normal Chinese grammar – is it?

  5. 好漂亮啊你,今天!

    and another interesting sentence:
    shang hai zi lai shui lai zi hai shang. Whether you say this sentence from the left side or the right side, they are same.

  6. To answer your computational linugistics question on how the sentence “解放大道路面积水” can be
    understood, I believe a properly trained parser should be able to derive the correct parse tree for that
    particular sentence, this sentence does seem to have more than 1 parse, but I think the correct one would
    have the highest probability. Furthermore, I think that if this sentence can be broken down into separate
    phrases correctly, it would make the parseing much easier. I’m not familiar with the Chinese parsing/
    segmentation technology, but if I think a properly trained Viterbi algorithm based system or hidden Markov
    system should be able to give you the correct segmentation which is 解放 – 大道 – 路面 – 积水.

    I imagine a proper segmentation system with a 4-word phrase limit would start by looking for the proper
    segmentations from:


    Each interpretation would get a probability, then for each interpretation, the other words would be
    appended in a tree-like structure, then the one/ones with the highest probability would be chosen.
    Assuming the training corpus is good, and with proper pruning heuristics, it is very likely for a proper
    PC to reach the correct segmentation in real time. Of course, this method may become unusable
    for longer sentenecs as it’s pretty much combinatory in complexity, but one can easily represent the
    problem in some other way to make it solvable.

  7. I find this kind of post really interesting, as I’m very curious about your experience in a Chinese grad school.

  8. 1) I love the fact that nobody knows when classes start, ever.

    2) The flexibility with respect to parsing and parts of speech and the great many homonyms (homophones?) should make Chinese a playground for deconstructive and psychoanalytic literary criticism.

  9. more linguistic posts, they are my favoritest,

    I’ll send you some of my chinese grammer, i just kind of absorbed it somehow

  10. This is all over my head, but I just want to throw in one of my favorite ambiguous Chinese sentences: “保持共产党员先进性教育活动”

  11. Maybe it’s the medium that has all of the Chinese running horizontally. I’m wondering if reading it is any easier if oriented vertically. Same problems?

  12. Tim:

    No, printing vertically doesn’t make reading more easier, regardless of typesetting. It is the way the sentence was phrased rather than the way the text is oriented. Chinese has always been guilty of making run-ons and bunching several verbs as if that they were adverbs, so it is a difficult language to punctuate effectively. In fact, punctuation solves several issues of the way to parse a sentence – immediate puntuation was one of the KEY studies in the old days before printed punctuations were used. I mean, how would you punctuate the following sentence(s): “無名天地之始有名萬物之母” from 老子《道德經》?


    haha, you mentioned about the usage of “性”. This is supposed to be use as as modal modifier of a noun but we know how it interpret in such doubious ways, especially in long compound phrases.


    Well, 请大家前来朝拜创造我们的天主 doesn’t really need a pause, it was used to accentuate the Lord, “who creates us”! It should be read as (((请)(大家))((前来)(朝拜))(((创造)(我们的))(天主))) with each bracket/parathesis holds what I think as a vocabulary boundary.


    Chinese has been a great playground for deconstructive analysis by the viture of Classic Studies, due to the concise and cryptic nature of some classical Chinese texts…

  13. I see Gin’s examples as falling into two categories. Some sentences could appear in spoken form, but I can’t imagine them being written down, for example “对他讲了那事我今天早上”. Others are typical examples of topicalization, eg “那事我今天早上对他讲了”. The first kind gives me the impression of additional information being appended to the end of the sentence, while the second kind involves turning part of the sentence into the topic (or theme) by moving it to the front. Chinese is categorized as a topic prominent language, and this second phenomenon has been fairly well studied. Does Chinese linguistics make the same distinction? Are both these examples called 倒装句 (inverted sentences), or only the first category?

    The “piling together” of words with no punctuation between, which Gin mentions, certainly can lead to difficulty for people learning Chinese as a foreign language. It is particularly common when two disyllablic words are put together, for example “组织纪律性”. Should this be read as two separate ideas (“organization and discipline”), or as the former modifying the latter (“organizational discipline”)? Native speakers usually find it unambiguous because the two words are commonly used in conjunction and their meaning has become fixed, but for learners examples like these can cause problems.

    顿号 is a great piece of punctuation, but I don’t think English has a need for it. I don’t think I’ve ever come across an english sentence where using 顿号 would have made it less ambiguous.

  14. There are actually a couple of simple tricks that can increase segmentation accuracy, such as right-to-left rather than left-to-right segmention, as well as doing statistical comparisons of word frequency across word boundaries and opting to avoid uncommon segmentations. Simple left-to-right longest-word-match algorithms produce accurate segmentation rates of over 90% though, so segmentation really isn’t the huge problem it seems in theory.

    With the exception of a poor english definition for 积水 (since corrected), Adso pretty much nailed 解放大道路面积水. Also correctly pegged most of the components as nouns. 🙂

    In my experience the bigger technical problem is disambiguating the proper part of speech and definition for individual words. Does 解放 translate as “to liberate” or “liberation”, and is 基地 “Al Qaeda” or “base”? Coupled with the tendency of Chinese texts to selectively omit “implicit” parts of speech (like subjects or verbs), and the fact that common prepositional markers such as 对 or 在 are verbs in their own right, this makes it tough to understand longer sentences. English is much, much cleaner because the complex grammatical transformation provide much better indications of grammar flow. The words “at” or “in” are never used as verbs, for instance, while definite and indefinite articles are pretty good signposts that you’re in the presence of a noun.

    The 顿号 acts a lot like a semi-colon in English, incidentally.

  15. trevelyan,

    Thanks for the input. I obviously don’t know a whole lot about computational linguistics.

    Regarding the 顿号, though, I don’t see how it could be compared to a semicolon. Taking the first link when you do a Google search for 顿号, I find this sentence:

    亚马逊河、尼罗河、密西西比河和长江是世界四大河流。 (The Amazon, the Nile, the Mississippi, and the Yangtze are the world’s four greatest rivers.)

    I see this as a fairly typical (although basic) usage of the 顿号, but there’s no room for a semicolon there in the English.

  16. I’ve taken to using semicolons in lists whose elements might also contain lists, even though I’m not sure this is accepted usage. But I feel there is definitely a need to differentiate for the sake of clarity: “In 2006 GM plans to release new sedans, sports cars, SUVs, and trucks; restructure it’s financial, marketing, and distribution divisions; and increase advertising in North America.

  17. Lennet,

    Well, it’s true that in certain special cases the semicolon can replace the comma (check out this page and look at #2). But that still doesn’t make the 顿号 more similar to the semicolon in English than the comma in the vast majority of cases.

    I think maybe trevelyan meant that in Chinese the 逗号 takes on the basic function of the semicolon in English (as well as its other duties). That I would agree with. (This helps explain all the run-on sentences Chinese students write.)

  18. Lennet: I just re-read Strunk and White a few weeks ago, and it approved of the semi-colon as list separator for lists with long items.

    Do Chinese students write run-on sentences? The biggest punctuation problem I observe–and it’s a very, very standard problem–is bad spacing around punctuation. About 100% of Chinese writing English omit the space after punctuation, and about 50% take the space and put it before the punctuation mark .Like that.

  19. I’m all in favor of importing the dun hao into English, but then I’m all in favor of emoticons in formal writing 😉 And also adopting Chinese style run on sentences, which are a much closer fit to the way people actually think.

  20. Micah, what does Strunk and White say about semicolons in sentences like: “I’ve been a living in China for 5.5 years; but I’ve been a total doofus for much much longer.” ?

  21. Lennet,

    I looked it up for you. It says:

    “I am looting your room right now.”

  22. Micah,

    Yes, they looooove run-on sentences. You’re sure right about the punctuation thing, though (especially online).

  23. Look out! Linguist on a rampage! Worst part is, you could have a riot in my room while I was at work and it probably wouldn’t look any worse than I left it.

    For real though: one of the things I like about Chinese is that thoughts can be put on paper much more directly. People don’t think in complete sentences (which seems a misnomer when English style complete sentences are compared to the much ‘fuller’ or ‘more robust’ sentences typical in Chinese writing), so thoughts must be distorted when they are written down in English, moreso it would seem than in Chinese. I guess you could argue that it’s a process of refinement rather than distortion, but I don’t think I would buy it.

  24. Lennet,

    I doubt you would feel that way if you had been educated on Chinese the same way the Chinese have. I think communication in a foreign language naturally brings a sort of freshness and freedom… But as much as it delivers us from restrictions, it also betrays us when we make mistakes.

  25. “Make mistakes”? Speak for yourself. I don’t “make mistakes”, I “express my individuality”. Granted that learning Chinese as a second language and not learning it in the Chinese educational system have made it it fresher and freer for me, but I think that the way Chinese is presently written, with clauses related by topic and linked by commas, is a better reflection of the way people think than is the structure of written English.

  26. If the structure of Chinese better matched the way people think, wouldn’t all languages have a similar structure? I think the structural variety found in the world’s language is proof that these choices (eg subject-predicate vs theme-rheme, left-branching vs right-branching) are all arbitrary.

  27. “Micah, what does Strunk and White say about semicolons in sentences like: “I’ve been a living in China for 5.5 years; but I’ve been a total doofus for much much longer.” ?”

    Sorry for embedded speech marks. What you should write is this “I’ve been a living in China for 5.5 years; I’ve been a total doofus for much much longer.” No need for the ‘but’ as it seems superflous or even redundant, interpret that as you will 😉 (written after 2 bottles of Great Wall’s finest).

    Do emotions count as grammar?

  28. In both English and Chinese, a semicolon is higher in the punctuation foodchain than a comma, thus called a supercomma; I have seen English writers confuse its use with that of a colon, we Chinese on the other hand do not tend to make that mistake.

    In Chinese, dun-hao is lower than a comma, I would call it a minicomma, it is strictly for lists only, as John pointed out.

    How do you like my run-on sentences?I miss the days when we never had to use any punctuation marks !

  29. Ha! I read Gin’s comment without noticing the ungrammatical run-ons, until he pointed them out. It’s no secret that the internet is not the best place to learn the rules of English spelling, punctuation, or capitalization.

  30. To me, the syntax and punctuation of computer languages are far easier to wrap my brain around. Why can’t parenthesis in English be nested (at times it’s useful when giving additional information (especially details that you don’t want the reader to focus on)), despite the fact that it allows for clearer division of thoughts? Don’t even get me started with the American placement of periods inside quotes.

    Travelyan, have you considered using a genetic algorithm based approach to handle the problem of implicit speech?

