The 3-2 Tone Swap Error
This post identifies a type of tonal production error which many students of Mandarin Chinese make, not only in the beginner and elementary stages, but often well into the intermediate stage. While neither years of personal observation nor the multiple appearances in the audio data for my master’s thesis experiment constitute definitive evidence, it’s my belief that the phenomenon is real, and examining it can yield useful results for both students and teachers of Mandarin Chinese. I’m dubbing the error the “3-2 Tone Swap.”
The Error
Note that the term “error” is used in the error analysis sense, meaning that it is committed systematically, and is not merely a random mistake (which even native speakers make from time to time).
The error occurs, in two-syllable words, when the tonal pattern is 3-2. Many students will pronounce the 3-2 tone pattern incorrectly as 2-3. Some typical examples:
- 美国 (Correct: Měiguó, 3-2 Tone Swap Error: Méiguǒ)
- 法国 (Correct: Fǎguó, 3-2 Tone Swap Error: Fáguǒ)
- 五十 (Correct: wǔshí, 3-2 Tone Swap Error: wúshǐ)
- 可怜 (Correct: kělián, 3-2 Tone Swap Error: kéliǎn)
Personal History
I remember quite clearly when I discovered myself committing the 3-2 Tone Swap error. I had learned the word 可怜 (kělián) in Hangzhou from a friend. But I noticed that although I had “learned” the word, every time I tried to use it, my friend would correct my pronunciation. “No, it’s ‘kělián,’ not ‘kéliǎn.'” This was extremely frustrating for me, because I thought I had learned the word, and I was pronouncing it wrong even when I knew that the tones were 3-2. At the time I dismissed it as just a “problem word” that I would get eventually.
Around this time I became super-vigilant about my tones. I realized that although I was communicating pretty well, I was still making a lot of tone mistakes. Part of this new awareness came when I realized that native speakers were correcting me all the time using recasts, but I had previously been oblivious to it.
A typical conversation went like this:
Native Chinese speaker: 你是哪个国家的? [Which country are you from?]
Me: 美国。 [The USA.]
Native Chinese speaker: 哦,美国,是吗? [Oh, the USA, huh?]
Me: 对。 [Right.]
After having this same exchange about a million times, I had started to assume that it was just a natural conversational pattern in Chinese to have your country repeated back to you for verification. Yeah, it seems a little strange and inefficient, but there are stranger features of the Chinese language.
What I eventually came to realize, however, was that when I gave my answer, 美国, I was routinely mispronouncing it as *”Méiguǒ” (3-2 Tone Swap error), and then the other person was both (1) confirming the information and (2) modeling it for me in his response, which included the correct form “Měiguó” (a classic recast).
When I finally realized this, it sort of blew my mind. I had thought my tones were already pretty good, but I had been pronouncing the name of my own country wrong all this time?? Learning Mandarin Chinese is, if nothing else, an exercise in humility. There was nothing to do but hunker down and try to reform my pronunciation. While I found it easier to focus on high-frequency words like 美国, it quickly became apparent to me that the 3-2 tone swap issue was rampant in my pronunciation.
Research
Although the 3-2 Tone Swap phenomenon cropped up in my own experiment on tonal pairs for my masters thesis, it was not the focus of my own research. If anyone knows of specific research done on this phenomenon, I would love to hear about it.
The data in my own experiment showed some interesting patterns. While errors in 3-2 tonal pairs were clearly more common than in the other two tonal pairs I examined (1-1 and 2-4), there were some inconsistencies. Namely:
- Errors were notably less frequent for numbers (e.g. 50, “wǔshí”)
- Errors were less frequent for one’s own country (e.g. “Měiguó”, “Fǎguó”)
While all subjects illustrated the first trend, the second was particularly well demonstarted by an intermediate-level French subject, who routinely pronounced “Fǎguó” [France] correctly, despite the existence of a 3-2 tonal pair, but then also routinely pronounced “Měiguó” [The United States] incorrectly as *”Méiguǒ” (the 3-2 Tone Swap).
What this suggests is that although some tonal pairs seem to take longer to master, the mastery is not categorical. In other words, you don’t suddenly “get” the pronunciation pattern and then just switch over to correct 3-2 pronunciation for all words where it occurs. Acquisition of the 3-2 tonal pair appears to be occur more on a word-by-word basis, making it largely a matter of practice, practice, practice (which also explains the better performance with numbers). This mirrors my own experiences.
Questions
Tonal mastery is a long process for most students, with the 3-2 tone pair appearing to be one of the last patterns to acquire. Why?
I suspect that there is a relationship between the 3-2 Tone Swap error and the 3-3 tone sandhi (in which 3-3 tonal pairs are systematically converted to 2-3). The learners that exhibit the 3-2 Tone Swap error typically do very well with their 3-3 sandhi. Could learners be internalizing but then overextending the 3-3 tone sandhi rule to include not only 3-3 pairs, but also 3-2 pairs? It’s certainly possible.
Again, if anyone knows of any research into the above phenomena, I would appreciate links or more information!

This is one of those things that my brain knows but my mouth keeps forgetting. I probably just said it wrong enough times that it’s hard to change even though I know I need to.
I think you’re right on in thinking it’s connected to the 3-3 sandhi.
Additional note:
I was actually a little conflicted about this post. From a pedagogical perspective, the question is: even if this phenomenon is totally real, does the learner knowing about it help the learner any? Or does it just make him worry unnecessarily?
Every learner has different needs, but I think it depends on where the learner is in his progress… My gut tells me that it’s more of an intermediate learner issue.
This could well be from an overextension of 3-3 sandhi, but I think it could just as easily be from other factors, chief among them the basic weirdness, for many English speakers, of tones and especially the third tone. And then there’s the matter of the third tone’s behavior, and the way even an “ordinary” third tone changes to a half third tone depending on what’s around it. (I don’t know if this is more common in Northern Mandarin; my impression is that it is.)
Anyway, I think the 3/2 swap affects different people in different ways. I don’t have problems with this within words anymore (e.g. saying Méiguǒ for Měiguó is no longer an issue), but I’ll still occasionally catch myself doing this in connected speech. Beijing Sounds had a clip featuring me a while back in which I ask a cab driver where he learned his Spanish by asking “nár xuě de,” rather than “nǎr xué de.”
Personally, I think any serious learner should definitely be cognizant of this stuff once they reach the level of having grown-up conversations. A few years ago I was in the habit of forcing myself to record an audio diary in Chinese every night, and then listening to it and noting all the things I got wrong. (By this point I’d been living in China long enough that I could at least notice the obvious mistakes when listening, even if I couldn’t keep from making them while I was speaking.) This was absolutely excruciating, but it did do a lot to help me un-screw up my tones.
A while ago I was thinking of writing something on my own blog to the effect that while people should worry about tones when they’re starting to learn Chinese, they shouldn’t worry about them too much, since nobody, except maybe the borderline autistic, learns the tones right the first time around anyway, and pretty much everyone is going to have to go back and retrain themselves in a few years.
Brendan,
I totally agree… it’s really easy to get tripped up by the 3-2 combination in spontaneous combinations. I’ve found myself making this mistake too. (This is one of the reasons I created the Tone Pair Drills.)
Interestingly, I’ve also heard native speakers making the 3-2 tone swap “error,” particularly with the phrase ”五毛“ (wǔ máo) but it’s something I can’t explain.
I agree that getting to the level of really good, consistent tones is a long-term goal rather than a “first semester” goal, but don’t you think tone training could be structured in such a way as to build properly on itself, minimizing the need for true “retraining”?
I wonder if some of the problems with 3-2 combos is the dissonance between textbook Chinese and real speech patterns.
As Brendan noted, the 3rd tone has more changes than just the 3-3 sandhi, and, in fact, is very rarely the full falling-rising third tone (214) that textbooks typically describe. Rather, in most situations it is realized as a half-third tone, where the main characteristic is LOW falling. Certainly producing a rising tone is easier after a low third tone than after a rising third tone.
Related to this, though perhaps a south/north/dialect issue, is the regular appearance of the neutral tone and how the third-neutral tone combo might affect the 3-2 combo. If I understand correctly, the country names in common northern pronunciation are Měiguo, Fǎguo, etc. rather than Měiguó, Fǎguó. If this is what learners are hearing in the wild, then, as with the half-third tone, this would be an example where textbook and real Chinese differ. It might also add to the difficulty of learners discerning and properly producing 3-2/neutral tone combos.
Thanks for an interesting post, John.
Great post, I have noticed that as well, I often pronounce faguo as “the fruit of the Fa tree”, LOL. Sometimes I even catch myself on the go, and I immediately repeat the word again with the right tones causing even more confusion.
I think you are onto something here, because clearly the symetric error (ie. to pronounce 水果 as a 3-2) is much less common, at least for me.
But I have a slightly different hypothesis to explain it, also tied to the 3-3 Sandhi. It goes like this: When speaking mandarin the tone pattern 2-3 is a lot more common than the pattern 3-2 for individual words (because it comprises all original 2-3s plus all the 3-3s). Therefore, when you are unsure, or when your ROM is not fast enough to follow the pace of your words (ie. you speak faster than you think) then it is only natural that you assign 2-3 patterns to every couple of indetermined 2ish-3ish syllables. It is statistically a winner.
I think you already dealt with this in one of your previous posts: the 1st/4th group is quite distinct from the 2nd/3rd, and most difficulties come when differentiating sounds within those groups (ie 1st from 4th, or 2nd from 3rd). Due to this and to the variability of the 3-3 shandy in long phrases, it can happen when listening that we don’t register properly the 3rd and 2nd tones, so many less common 2-3/3-2/3-3 words are stored as a nebulose of 2s and 3s. Our easy way out is to render all these as 2-3s in quick conversation, and we get it right most of the time…
All this is pure speculation of course, but hope it helps to bring some light to the that 2-3 “fruity” effect 🙂
BTW, thanks for writing these posts about tones. Like some commentators said above, it is one of the aspects that is worst explained in text books, I always learn a lot from your posts and the comments I read here.
This post reminds me of that old Queen song, “Tone Cold Crazy.”
They had a lot of good China-centric songs:
“Another One Bites the Duck”
“Electric Bicycle Race-I’m in love with my car”
“Under Pressure”– the Gaokao anthem.
“Princelings of the Universe”
I made this, just out of interest for people who might not be able to read the pinyin very well:
Hmm, now I’m worried! But I’d rather be warned, than remain oblivious to common problems.
I can provide further personal evidence…my wife complained for years that I regularly miss the 2nd tone. It took us a while to figure out when exactly that happens: Spoken in isolation 2nd is fine. But as soon as it comes in the company of a 3rd tone I mess it up.
Usually for me the error pattern is the swap 3rd-2nd –> 2nd-3rd
It always costs me several tries and physical effort to replicate my wife’s correct version and that never works in a casual conversation. As if the 2nd needs some heavy lifting in its first half.
very good entry. I’m sure I’ve made this mistake too, but I can’t catch myself. Although the other day, I made an even more basic one. Instead of jiǎozi, I said jiáozǐ. So for whatever reason, even though I’ve said it right hundreds if not thousands of times in the past, in that one instance, textbook tonal sandhi carried the day.
法 in 法國 is pronounced with a fourth tone, not a third tone.
In Taiwan, maybe, but not in most of China.
Fàguó is actually the historical pronunciation for 法国 — if you watch 《梅兰芳》, you’ll hear people say it that way. (It is just about the only reason to see the movie, from what I understand — haven’t seen it myself, but a fellow geek – on Twitter, maybe? – mentioned this.)
I did this too (probably still do sometimes), and also assume it’s because of overcorrection for a 3-3 pattern. I’ve suspected that the main character in one of our textbook was called Xiao Wang for exactly this reason, so that students had all year to get trained to say Xiao3 Wang2 and not *Xiao2 Wang3, as one tends to do.
[…] post page with […]
@Britney @John
I can attest to this Taiwanese fourth tone pronunciation of 法 in 法国. I think it’s an older reading for this 多音字.
…or formally 多音字…
Another perspective:
Early on, having paid much more attention to spoken Chinese (as a result of laziness and fear of hanzi), most of my vocabulary was learned orally first. When I would acquire a 3-3 or 2-3 word, I couldn’t be sure which of the two patterns it followed. A common mistake at that point for me would be to pronounce the first element of the word wrongly in another word. An example would be first learning 美女 and then pronouncing 美国 as 2-2: 我是没国人!
One guy from Taiwan long ago suggested I view third tone as a tone that is low overall, rather than concentrating on the textbook 214 contour whose latter half sounds like second tone. This fits in with the “half third tone” idea which is also low overall.
On the other hand, if you start with the idea of third tone being low, it is likely to get some of the dipping contour anyway from adjacent higher syllables, so you don’t have to worry as much about not producing a dipping contour.
This also has a certain symmetry of 1 high vs. 3 low as well as 2 rising and 4 falling. I have never seen this idea published in textbooks or elsewhere though.
i’m a native speaker and i found out that i always pronounce half third tone except when it is the last character in a sentence.
Very interesting… I’ll have to keep my ears open, but I definitely never noticed this one before. One that affects me and my friends a lot: swapping characters — especially in word in which neither character is really used independently of the other. I had a girl say “what is the deal with you foreigners always saying the characters in a word in the wrong order, it’s not like I swap English words around like puter-com, hopper-grass, etc.”
I don’t remember having this problem too much, the pronunciation makes the position clear and many hanzis have almost fixed positions.
By the way why everybody says “characters” instead of hanzis? In my japanese journey I have never saw one single student who said “character” instead of kanji.
I had this problem with the pronunciation of 美國, and it took a long time to get over, and now if I am speaking quickly it still happens.
I think it does help to know that the problem exists but just don’t worry about it too much and just specifically add this pronunciation practice to your regular learning routine.
A teacher once gave me a list of a few hundred character pairs from each tone combination 1:1, 1:2, 1:3, 1:4, 2:1 etc Spending an hour or so a week running though the lists can help a lot, i’ll see if I can dig them out.
For me, I don’t really think about these things. I just hear these words over and over again on the streets, on TV, etc. and therefore it sticks. For learners overseas, you just have to keep at it. For learners in China, it will just sort of naturally stick with you after awhile.
John, Caffiend hints at what might be the reason. At our level, we often exaggerate the falling-rising contour of the 3rd tone. This might cause it to end at a higher pitch than a 4th tone begins at. It is relatively difficult to abruptly drop down and then rise again for the second tone syllable. I’d guess it is the difficulty of this reversal of pitch direction that encourages us to substitute the reversed sequence. Experienced and native speakers do not drag their 3rd tones up so high before uttering a 2nd tone. (For native speakers of non-tonal languages, I’d guess that the same difficulty would show up when singing—combining slurred phrases that require reversals in pitch direction between the phrases will be harder than combining those where the following phrase can be extrapolated from the rise or fall of the previous one. If this is true (?) searching the psychology of music literature rather than phonology might turn up some relevant research.)
Hi guys am just about to start learning this language and as a begginer congradulations you just confused me,it seems like am changing my mind.