Google Docs OCR

Google Docs now offers OCR (optical character recognition), but I’ve had little success getting it to work.

The link to upload files was flaky under Firefox 3.6.4. The underlined text that says “Select files to upload” is not clickable, but you can click the white space a few millimeters above or below what looks like a link. However, the clickable white space didn’t do anything when I clicked it. The link worked just fine in IE 8 and Safari 5.0.

screen shot of page to upload documents for OCR

I clicked the check box that says “Convert text from PDF or image files to Google Docs documents” and uploaded a PDF file. The file was a decent quality scan of a paper document.

section of text from a scanned article

I got a message back saying “Unable to convert document.”

So I tried again with a PDF file that had been created from a LaTeX file using pdflatex. The optical quality of the document was perfect since the document it wasn’t a scan but rather an electronic document printed directly to PDF. Moreover, the PDF file contains the plain text. Google indexes such PDFs created with pdflatex just as easily as HTML files. However, I still got the message “Unable to convert document.”

My experience with Google OCR wasn’t a total failure. I created a Microsoft Word document with text in 12-point Times New Roman — I figured this was as commonplace as I could get — and printed it to PDF. Google Docs did successfully convert that document to text.

I imagine Google’s OCR feature will be useful once they debug it. But it doesn’t yet seem ready for prime time based on my limited experience.

One thought on “Google Docs OCR”