Tag: Mark

Sep 2009

A Character-Counting Challenge

My recent post on the Wikimedia Commons Stroke Order Project prompted Mark of Toshuo.com to decry the relative dearth of traditional characters being added to the project. To this, David on Formosa reminded Mark that there are also a large number of characters shared by the traditional and simplified character sets.

At this point I’ll interject a visual aid (gotta love them Venn diagrams!):

All this got me thinking about the following question: If “s” represents the characters in the simplified set not shared with the traditional set, while “t” represents the characters in the traditional set not shared with the simplified set, and “u” represents the characters shared by the two sets, then what are the number of characters belonging to groups s, t, and u, respectively?

It seems like a simple enough question, but it’s actually quite tricky for a number of reasons.

First, the total number of Chinese characters in existence varies according to source, and largely depends on how many non-standard variants you want to include in your total set. You can be reasonably certain the total number is less than 50,000, but that’s still a pretty ridiculously large number, when most Chinese people regularly use less than 5000. For basic purposes of comparison, it makes sense to limit your set to a certain number of commonly used characters, but which set? One from the PRC? From Taiwan? From Hong Kong? From Unicode?

Second, you might be tempted to think that s = t, because simplified characters were “simplified from” traditional characters. This isn’t true, however, because in many cases multiple traditional forms were conflated into one simplified form. To give a very common example, traditional characters , , and are all written in simplified. So adding these three characters adds 1 to u, 2 to t, and 0 to s. There are lots of similar cases, so clearly t is going to be significantly larger than s. But by how many characters?

I’d be very interested to see a concrete answer to this question, regardless of the character limit used. I also wonder how the proportions of s, t, and u vary as the character limit is increased, and more and more low-frequency characters are included.

If you’ve got an answer, I’d love to hear from you!

Aug 2009

The Spaced Repetition Party

So you’re at a party. It’s not some crazy kegger, it’s just one of those social mixers you go to every once in a while to meet people. A homely guy walks up to you and introduces himself as Craig. He’s a financial consultant. He soon moves on.

Photo by Wallie-The-Frog

A few minutes later, he walks up again, and asks, “Remember me?”

“Yes,” he says. “And what do I do?”

“Uhhhh,” you say intelligently as you draw a blank.

“Financial consultant!” he says snippily and walks off.

A few minutes later he’s back again. He walks up to you and looks at you. “Hey, Craig the financial consultant,” you say. He nods and moves on.

He shows up again an hour later, and then one more time before the end of the event. He’s satisfied you know who he is.

The scene described above is a fictional dramatization of how spaced repetition works. Just like you forgot unmemorable Craig’s profession only 5 minutes after meeting him, you forget most things you learn. That is, unless you’re reminded. And it turns out that there are optimal times to be reminded, and that the more you’re reminded, the less often you need to be reminded. This is the “spacing” of “spaced repetition,” and its rules been pretty well figured out.

The famous Pimsleur language learning system is based on the principle of spaced repetition. It was designed for a time when static audio recordings were cutting edge, however, and the latest adaptation of the spaced repetition principle is spaced repetition software (SRS), which has been refined quite nicely in recent years by a Polish man named Piotr Wozniak.

With SRS, you “join the party” by starting up the software. You’re presented with various “cards” or “facts” which you want to remember. Some of them, like Craig, aren’t particularly memorable, and when they come up again, you may falter. No matter; SRS is infinitely patient. The more you have trouble with a fact, the more often it shows up in your review cycles, until eventually you get it down pat and it gets spaced out to the point where you hardly ever see it again.

Sound like fun? In my experience, the idea of efficiently offloading the work of memorization to a computer program tends to appeal mainly to programmers. I was introduced to it by programmer friend John Biesnecker, who was seduced by SRS evangelist and blogger Khatzumoto (also a programmer). I’ve seen another programmer friend, Mark Wilbur, go fanatical about SRS. Meanwhile, linguists and language teachers tend to go, “meh.”

Photo by Tom Lin

Personally, while I have my misgivings about SRS (a topic for another post), I think it’s a fantastic concept. The idea that, through science, we can understand how we forget, describe it in algorithms, and then systematically counteract it through software and learned behaviors is nothing short of amazing. The problem is that most of us aren’t willing to simply plug in and “trust the machine.” We prefer to live our lives unplugged… or at least not to be ritually spoon-fed our knowledge.

Like any innovative new form of technology, SRS has its early adopters. Those people swear by SRS, daily executing their spaced “reps” with the leading software: SuperMemo, Mnemosyne, and Anki. At the same time, though, something bigger is happening. Behind the scenes, SRS methods are infiltrating other learning software, such as Pleco (a popular Chinese dictionary). Although perhaps not completely obvious, SRS methods are a cornerstone of innovative Chinese character writing service Skritter. Cerego, the company behind another learning system earning lots of praise, Smart.fm, describes itself thusly:

> Based on years of applied research, Cerego has built adaptive, web-based applications that accelerate knowledge acquisition. Cerego’s patented core learning engine is driven by algorithms that generate optimal learning schedules for discrete chunks of declarative learning content, called “items”. This intelligent scheduling is achieved by gathering metadata on individual user performance and modeling memory decay patterns at the granular level of every item.

Guess what? It’s SRS.

The fact is, the average person doesn’t need to learn to change his habits to adapt SRS. As various companies and developers realize the value that SRS integration offers any kind of learning system, they’re integrating it into their existing products and services. It’s starting to appear in more and more products we already use. In the next few years, you can expect the slower ones to join the party as well. SRS is coming to you.

Feb 2009

Mark's New Pinyin Input Firefox Extension

My friend Mark has created a FireFox addon. It does one thing and it does it well: it converts onscreen text from numeral pinyin to pretty pinyin with tone marks.  (It doesn’t convert characters to pinyin or any of that jazz.)

I find this very useful. If it sounds good to you, try out the Pinyin Input Firefox Extension.

09

Feb 2009

A while back I blogged about buying a PS2 in China, and there was a lot of interest. There’s not much to say about PS3, because it is so far uncracked/unpirated, so everyone who plays PS3 here imports everything. Games are 2-300 RMB each. XBox 360 has similar status re: pirating to Wii in China, but I have almost no experience with it, so will limit my observations to the Wii and its games.

Nintendo does not officially sell the Wii in the People’s Republic of China, so buyers must purchase an imported system. While previously Japanese Wii systems were the most common, now Korean imports are becoming more common. I imagine it is possible buy the Wii imported from the United States and other countries as well.

These are the prices I was quoted at my local video game shop:

– Basic Wii system (one controller) imported from Korea: 1580 RMB
– Installation of WiiGator “backup launcher” (which allows you to play “backup copy” AKA pirated games): free
– Extra Wii controller set (Wii remote + “nunchuk”): 450 RMB
– Wii Fit imported from Japan (with Wii Fit game/software): 800 RMB
– 10 games (not imported, obviously) – free

All games work fine as long as you load them through the WiiGator Gamma Backup Launcher 0.3. The system also comes preloaded with Homebrew and Softchip (an alternate backup launcher). The shopkeeper told me only to use the WiiGator Gamma Backup Launcher, but I did actually try out the Softchip launcher, and it worked for most games. The (Korean) Mii section, however, does not work at all. I’ve heard that it can easily be enabled; the shopkeeper I talked to said it’s a waste of precious memory. I didn’t buy any memory upgrades, and so far I’m doing fine without it.

Just like PS2 and XBox 360 games, Wii discs sell in Shanghai for 5 RMB each.

It is expected that “backup launchers” and other alternate Wii firmware will continue to make strides. Currently, for example, online access is impossible, and attempts to use it will likely lock down the offending Wii system. In the event that alternate firmware does release better versions, it’s understood that shopkeepers will upgrade the firmware of their customers’ systems free of charge.

I can’t actually help you buy a Wii; this information is for reference only. If you’re interested, please also see Buying a Wii in Taiwan, a sister blog post by my friend Mark, who lives in Taiwan.

31

Oct 2006

Some of you may have noticed that when I put up my new Tone Pair Drills I added a new Products section to this website along with it. I’ll introduce one of the items here from various fascinating sociopolitical angles.

The shirt says 请讲普通话 which means “please speak Mandarin” (rather than some other local dialect). The inspiration for this shirt can be seen at countless bus stops all over Shanghai: completely ineffectual “请讲普通话” propaganda. The Shanghainese continue blissfully barking at each other in their dialect regardless.

Simply by wearing this one shirt, you can:

– subtly poke fun of the PRC’s language policies
– inform Chinese people around you that you want to talk to them in Chinese
– inform Chinese people around you that you don’t want to engage them in whatever crazy dialect they speak (especially useful in Shanghai)

But I also created this shirt for another special reason. My roommate Lenny plans to move to Taiwan in December. He has made it clear on several occasions that he won’t put up for the degenerate dialect of Mandarin the Taiwanese call 国语, and he has made it his personal mission to reform the speaking habits of the whole island.

While I’m sure he will have no problem at all with that task (maybe he can even get Prince Roy to help, although Mark and Poagao may be thorns in his side), I thought he could use this shirt to aid his righteous crusade in some small way. The shirt is great for Taiwan because:

– The Taiwanese never say 普通话 (Mandarin), as that’s a politicized PRC word; they say 国语 (also Mandarin).
– Three out of five of the characters on the shirt are simplified. Simplified characters are, of course, an aesthetic affront to the Taiwanese which offends every fiber of their being.

Obviously there are many reasons why you need to order this shirt now. (Oh yeah, also: I ordered some merchandise before deciding to go with CafePress, and I can confirm that the quality of their stuff doesn’t suck anymore.) For more Sinosplice merchandise, check out the Sinosplice Store (more stuff to come soon).

Aug 2006

The Scallop that was Chicken

Chicken… or Scallops?

Recently Mark visited Shanghai. One night having dinner at my place, there was a conversation that went something like this:

> Mark: This seafood is really good.

> John: Huh? What seafood?

> Mark: This seafood!

> John: That’s not seafood. That’s chicken.

> Mark: Really? Oh. In that case…

It wasn’t the first time that has happened. Sometimes chicken in China gets mistaken for scallop-like seafood. It’s not that the chicken tastes fishy, it’s that the texture is very much like scallop meat.

Why is this? Does anyone out there know? Should I be worried? What the hell am I really eating??

Jul 2006

Dongbei Bluntness

I have mentioned before that my ayi Xiao Wang is from Dongbei (东北, China’s northeast). I like her a lot, and perhaps one of the reasons is her impressive capacity for bluntness.

A while ago I was setting up an electric fan for her so she wouldn’t be so hot when she’s cooking in the kitchen. At first I thought it was broken, because when I pressed any of the buttons from speed 1 to 3, nothing happened. It was definitely plugged in securely in a good socket. What I forgot about was that some of these fans have a “oscillation” dial that controls degree of oscillation, but it can also be set to “off.” This means that there are two places the fan can be set to off, and if either are set to off, the fan stays off.

A little while after I incorrectly concluded the fan was broken, Xiao Wang realized the problem and got the fan working. I asked her what the problem had been. She responded, “You were so stupid! It was turned off on the direction dial.” Ah, thanks Xiao Wang. Always the charmer.

Then when Mark visited recently, she gave him a little taste of her bluntness as well:

> Your Chinese isn’t that good…. I don’t completely understand what you’re trying to say. Can people understand you in Taiwan? [source]

I can attest to the fact that Mark’s Chinese is not bad at all. Likely the main problem was Xiao Wang is not used to a variety of accents (especially foreign ones).

There was one other incident that made me feel really bad. Xiao Wang bought a bag of fresh peaches for us. She bought them on a Thursday and left them on the kitchen counter. She doesn’t come for most of the weekend, and I happened to be super busy that weekend, so I totally forgot about them. When she showed up the next Sunday she discovered most of them had spoiled. She made a big fuss about how I had needlessly let good peaches go to waste, and she had bought them as a gift for us with her own money (this I didn’t realize) for 10 RMB. She made it very clear that she was upset. I apologized profusely and ate some peaches right away from the parts that hadn’t spoiled.

Xiao Wang is a good lady, but she’s blunt to the core. Gotta love her.

Apr 2006

Chinese Number Tool

A little while back I recommended that Mark of the blog Doubting to shuō make an online number conversion tool similar to his Pinyin Tone Tool and Cantonese Tone Tool. Well, Mark has done it. These are the kinds of conversions it can do:

 Input Output 五千八百 5800 兩百五十 250 三百萬 3000000 三百万 3000000 7十 70 九億 900000000 九亿 900000000 6.25亿 625000000 1500万 15000000 五億三千九百二十萬四千四百四十一 539204441 壹仟叄佰柒拾捌 1378

Note that the Chinese Number Tool handles simplified, traditional, and even 大写. The tool can also optionally insert commas in the output. See Mark’s blog entry about it for further explanation. Nicely done, Mark!