The New Pleco OCR Is Amazing
There has been a bit of a buzz lately among the techy students of Chinese in Shanghai, and it’s all about the new functionality coming to the Pleco iPhone app. From the site:
> We’ve just announced an incredibly cool new feature for the next version of Pleco, 2.2; an OCR (Optical Character Recognition) that lets you point your iPhone’s camera at Chinese characters to look them up “live” (similar to an “augmented reality” system): demo video is here (or here if you can’t access YouTube).
Watch the video. Seriously. This is big.
Basically what the new app allows you to do is to add “popup definitions” to any Chinese you’re reading–even a book. It’s instantaneous. It uses the iPhone camera, but it’s not like taking a photo at all. (It’s more like using 3D goggles… Magical 3D goggles that provide pinyin readings and definitions for Chinese words.)
The technology behind this app is not terribly new… optical character recognition for Chinese characters has been getting steadily better over the years. But no smartphone app has done this well yet, and it’s a bit stunning to see Pleco performing so admirably right out of the gate.
Oh, and more good news from Pleco:
> Also, we’re finally working on an Android version of Pleco, and have just signed a license for our first Classical Chinese dictionary….
Awesome. Congratulations to Michael Love and the rest of the Pleco team.
You are right, Pleco’s new feature is amazing.
What is even more amazing: the ad on the right…. http://img683.imageshack.us/img683/3101/asiandatingatsinospilic.png
“Chinese women seek men …. ” – I am excited!
Looks pretty good. Have you got some background on it? Does it rely on black-on-white-background image or it can even pick up characters in the real world around me?
I love it. The drag box is so photoshop’esque efficient. A helpful app when you’re in a pinch and have text to read!! I wonder if it can read/scan signs and other things in different environments, not just a paper book.
Dear dad, can you buy me an iPhone so i can learn Chinese? You know that’s the main reason why Apple’s sell – EDUCATION – parents will pay anything for it.
Dangit, I don’t want an iPhone, but Pleco looks so good. I’m hoping that by the time I can get a smartphone Pleco will be ported to Android.
BTW, your “notify me by email” checkbox doesn’t actually have a checkbox, is that by design?
I’ve been waiting for this for years… I’m a bit surprised it wasn’t/isn’t included in any of the many e-dictionaries you can buy, but I’m sure this will be more convenient… Now if it can just ‘autofit’ to the size of the text…
I wonder if the developers’ work on this OCR thing wasn’t the final impetus to go for Android (at last!). A few Android apps (Barcode Scanner and Google Googles come to mind immediately) use the camera this way with awesome effectiveness. It just makes me suspect there’s something in fundamental in how the camera “talks” to the OS that gets developers all hot and bothered.
Sorry pointless speculation; couldn’t help myself.
Bottom line on the OCR app: WANT.
We have it so much easier these days.
After hating the Iphone for so long….it suddenly seems more attractive.
Hating the iPhone, really? Why? It has struck me as awesome since the beginning.
Does anyone have any idea of the technology they use?
Live OCR is not obvious on a smartphone.
Thanks for all of the comments!
Hans – it can do black-on-white or white-on-black or light/dark colors; we’ve got a pretty good system for automatically detecting whether it’s white-on-black versus black-on-white though there’s also a manual switch to force it to go one way or the other if it gets it wrong. And it does seem to work pretty well on signs as long as they’re well-lit and in a standard font.
Wilson – thanks!
StanDuke – “autofit” is a bit dicey for performance reasons; while in alphabetic languages you can autofit to a word using nothing but a simple visual analysis – find the big black blob and draw a box around it – with Chinese you have to look at the actual characters to know where the word boundaries are, which means we’d end up wasting time recognizing a lot of extra characters on each frame.
However, as long as you line the left edge of the recognition box up with the start of the word it’s perfectly OK if it goes longer (it’ll return the longest match it finds), and indeed we expect that for a given document most people will simply set the box to a size big enough to encompass ~4 characters and then just point it at words without continuously resizing it.
A manually-resizable box has an added benefit of making the interface easier to understand – many people still haven’t figured out how the |-> / |<- buttons in our document reader work (they allow you to expand / contract the highlight to focus in on one particular character) and we’re hoping that the draggable box will make it easier for people to figure out how to zero in on the definition for a single character.
Rachel – the impetus for Android was more a matter of them finally releasing a version of the NDK (Native Development Kit) that was robust enough that we could actually use it for large-scale development; previous versions of it seemed to be mostly designed for making little performance-critical sections of apps run faster – there wasn’t even a decent debugging system – but with this summer’s R4 release they finally changed that, and between that and some clever hacking with an open-source program called SWIG we were able to get our cross-platform Palm/Windows/iPhone running on Android easily enough to make an Android port viable.
But you’re correct that camera-to-OS interfaces had a lot to do with this; the 3GS has the camera and the processor to do live OCR a year ago, but until last month there wasn’t a way for us to get the operating system to feed us frames of video right as they came off the camera. Live-capturing barcode reader apps like RedLaser all worked by taking screen snapshots from the iPhone’s built-in photo capture preview screen, which is considerably slower / more awkward than reading the video data directly. In hindsight it looks like we might have been able to get live OCR working even with that, but we had to invest a lot of money into getting a good Chinese OCR system ported over to iPhone before we’d know for certain if it would work, and it took iOS4 and the iPhone 4’s faster processor / 720p video capture support to make us confident enough to take that gamble.
William – the core recognition engine was licensed from a company in China – there isn’t yet anything open-source with the performance/accuracy we need – though there’s also a bunch of video processing code we came up with ourselves, since I don’t believe anybody’s developed an OCR engine specifically for video yet.
Any word on when the 2.2 version with OCR will be released? I’ve been checking regularly since first reading this post, can’t wait for it!
Keep up the good work…
This is a gallery of Pleco in action. Phtos from Tany Hart in Hong Kong.
[…] put this in perspective, the development of OCR (optical character recognition) for mobile devices meant that you could point your cell phone’s camera at any characters you […]