Why Did Google Open Source Tesseract OCR?

Lora Bentley

When it comes to open source initiatives, Google has always been among the first to lend a hand -- be it financially or via code contributions. This summer at the O'Reilly Open Source Convention, the search giant even launched its own open source project hosting portal, and is reportedly cooperating with SourceForge.net, rather than attempting to draw projects away from the popular site.


However, Google also drew criticism at OSCON for a perceived failure to put its money where its mouth is, so to speak, by releasing more of its own code in addition to contributing to everyone else's projects. In response, company representatives explained that they release code when it's possible to do so without compromising trade secrets -- when it makes sense for them to do so.


Therefore, the company's recent decision to re-release the Tesseract optical character recognition project must make sense on some level, but at this point we're scratching our heads. The project sat dormant for nearly 10 years after HP pulled out of the OCR business. The Information Sciences Research Institute at the University of Nevada, Las Vegas, asked Google to help with bug fixes last year, and after eliminating some of the critical ones, the search giant decided the technology was "stable enough" to release as open source.


If there are motives beyond altruism (and when are there not?), time will reveal them.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Sep 8, 2006 12:43 PM Dick Weisinger Dick Weisinger  says:
Yes.  It might be nice to see some Google Open Source code projects derived from applications like Google Maps or GMail.  But without knowing all details, I think that taking on sponsorship of Tesseract was a good thing.  As late as March of this year, the Tesseract code was not in good shape:http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-03-20,4If Google's has fixed bugs to make the project stable and repackaged the software to be useful, I think that's great.  If it's not as advertised by Google, I think you may have a case that they're just releasing junk Open Source for the sake of adding to their list of Google-sponsored Open Source projects.  But if the code isn't good, would they want the Google name on it?OCR seems a natural fit for Google as part of their library scan project, and Google is hiring OCR engineers.  Reply
Sep 8, 2006 7:17 PM luke luke  says:
I thought it was interesting that Google released Tesseract on Sourceforge.net rather than their own Google Code Hosting site. It seems like the Google Code Hosting isn't exactly married in with some of their other open source strategies or initiatives, maybe? Reply
Mar 23, 2007 9:47 AM bob bob  says:
I tried tesseract; it is total garbage. It produced total gibberesh for output; not a single word was recognizable. A 10 year old copy of TextBridge did much butter. Reply
May 24, 2007 9:06 AM Janvl Janvl  says:
Only if you do not install it correctly you get gibberish.So use google and forums to do some reading before you condemn a programm that does its job quit well. Reply
Feb 2, 2008 8:23 PM Alan Alan  says:
Wow, I just tried it out and it worked far better than the other open source packages I've tried (clara, gocr). Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.