Teaching My Android to Read
About a month ago, I finally gave in and decided to join the smart phone revolution. After careful consideration, I decided to go with the T-Mobile MyTouch 3G (powered by Google Android). I was attracted to the liberating UI, the elegant system architecture, and, of course, the great SDK.
I had been racking my brain for a couple weeks trying to think of a great application to make when, all of a sudden, it hit me. I was marveling over the Barcode Scanner application when I suddenly had the idea: What if I combine the device’s camera with OCR?
OCR is the process of turning pictures into text, and once you have raw text, the possibilities are endless. As a simple example, imagine being able take a picture of an address on a brochure and having it immediately displayed on a map. The same idea could also be applied to calling phone numbers, or even translating!
Another technology that could come in to play with this concept is the newly added TTS feature. Now you could have things read out loud to you. Whether you can’t afford to take your eyes off the road, can’t see because you forgot your glasses, or want to make every book a self-reading child book? No problem, just take a quick picture, and the text will be read out loud.
The next step is to do layout analysis on the picture. This tells you things like at (x, y) on the picture, there is z letter/word/block of text. So now you could just hold your phone over a page of text, and have all the instances of the word the highlighted in yellow – finally, that real-life Ctrl+F (find) you’ve always wanted!
Well friends, after about two weeks of development, I finally have a working prototype! I stole about 98% of the code from the ZXing Ocrad, and STLPort (Gears’s fork) projects, but hey, this is open source, and that’s how we roll. Here’s how it works:
- Hold your device over a document. The camera will auto-focus and send the image to Ocrad for processing.
- Once Ocrad has identified some text, it will be returned and displayed on the screen.
Here is an example of it working on the Domino’s ad that was on my door this afternoon:
Surprisingly, it only took Ocrad took about 200ms to process the entire image on the device. It takes Barcode Sanner about twice as long to process a 2D barcode. Although, I did cheat by using a native processor (Barcode Scanner has the overhead of Java), and mine only has to scan for one type of image (Barcode Scanner scans for many different formats for each picture).
Next, I’m going to get it processing the layout and visually overlaying it somehow on the screen. I have a feeling though that this is going to add a pretty big hit to the performance, but as long as I can keep it at least as good as Barcode Scanner, I think it will be acceptable.
When I have time, I’ll also try to get the code posted on CodePlex for those interested in seeing the exact details of how it was all done.