How to OCR Text in PDF and Image Files in Adobe Acrobat
Scanned documents are great. They let you archive stacks of paper into folders on your computer, taking up far less space and being infinitely easier to organize, move, and copy. What's not so great is finding content stored away inside one of your hundreds of scanned documents. By default, they're little more than a picture of your document—and if you want to find info inside them, you'll have to open each one and read it for yourself.
Or, you could let your computer do the heavy lifting for you, by turning your image into text and letting you search through your scanned documents as easily as you search through any other documents. That's what OCR—Optical Character Recognition—does. It uses your computer's smarts to recognize letter shapes in an image or scanned document, and turn them into digital text you can copy and edit as needed.
Here's how you can use the OCR tool built-into Adobe Acrobat to turn your scanned documents and pictures of text into real digital text.
OCR a Document or Image in Acrobat
Adobe Acrobat is the original standard program for creating, editing, and viewing PDF files. It's commonly used in business, and is bundled with Adobe Creative Suite and the full version of Creative Cloud, so there's a good chance your business computer already has it installed—or you can install it for free from your Creative Cloud subscription. If so, it's a great tool to OCR your documents quickly on a Mac or PC.
Note: this tutorial requires Adobe Acrobat, not Adobe Reader. The latter is a free app just for viewing PDFs. If that's all you have, jump to the end of this tutorial for some other great OCR tools you can use.
Acrobat can recognize text in any PDF or image file in dozens of languages. All you have to do is open the scanned document or image that you'd like to OCR, then click the blue Tools button in the top right of the toolbar. In that sidebar, select the Recognize Text tab, then click the In This File button.
You'll now get some options to tweak your OCR. If you're recognizing a document that's in your computer's default languages (English (US) in my case), simply click OK to get your text recognized. Otherwise, click the Edit... button to select your OCR language, pick your PDF output style, and the resolution you want Acrobat to use while recognizing your text.
After a brief pause indicated by a progress bar on the bottom of the window, your text will be fully recognized. It took only around 15 seconds to recognize text on a scanned 1 page form on my 2012 MacBook Air, but a couple minutes on a 30 page full-color textbook PDF. Once it's done, you can select any text in the document and copy it as normal, or search for text in the document. By default, Acrobat will save the recognized text inside the original file when you OCR a PDF, and if you OCR an image it'll save the image with its text in a new PDF file. Either way, the recognized text will show up in any PDF reader afterwards, just as if it was an original digital document.
With the text recognized, you can now markup the PDF using all the normal markup tools—you can highlight, cross out text, and more. You can even copy the text with the detected formatting, though that's often less accurate than the text recognition itself.
Export Your OCRed Documents
If you're wanting to edit your original scanned documents, or perhaps reuse the info in them in a new document, you'll want more than just selectable text on a PDF. You'll want the full document converted. Acrobat makes that easy as well, OCRing the text and exporting it as a new document in one step.
Just open the document you want to OCR and convert, click File > Save As... and choose the format you'd like. You can export as a Word or rich text document, Excel or CSV spreadsheet, or as HTML. Add the file name you want and the location you'd like to save your new file, and click Save. Acrobat will proceed to show the same progress bar at the bottom of the window as it recognizes the text and formatting in your document, and then will save the exported copy.
Acrobat exports from scanned documents are both surprisingly good and frustratingly bad. It'll recognize most of the text and formatting, and you'll likely be surprised by how nice the finished exported document looks if it's not too complex. But then, it's still not the original document. There will be mistakes, formatting you'll need to fix, and more. The best way is always to use the original digital document, but this is a great way to get back a digital copy of a document if all you have is a scan.
While OCR isn't perfect, Acrobat's OCR is quite good. In this scanned form, almost every word was detected correctly, though one instance of the word Name was detected as N""e. That's perfectly good enough if you're just wanting to be able to roughly search through your documents using your PDF reader's search tool, though if you're actually using the OCR to make a copy of the original text, you'll want to proof-read it first and make sure to correct any obvious mistakes.
OCR Multiple Documents At Once
Got a ton of documents you want to OCR at once? Acrobat's great for that as well. Just open any document in Acrobat, then open the Recognize Text sidebar pane as before. This time, select In Multiple Files button, and you'll see a window where you can drag all your files you want to OCR. Again, you can add PDF or image files, and Acrobat will recognize the text and save them in PDF format. There's also a few extra options, where you can choose where to save the finished files and how you'd like them named.
Other OCR Tools
Acrobat isn't the only way to OCR text from your scanned documents, of course. If you don't already have a copy of it, there's a ton of other tools you can use. We already covered the best tools for OCR on your Mac: Prizmo, FineReader, the Doxie app, PDFPen, and Evernote. Prizmo and PDFPen also would work on your iOS devices for OCR on the go, and the Doxie app also works on PCs. Evernote doesn’t let you copy text out, but it works everywhere—and on the PC, OneNote’s OCR is great and free.
There’s also the free Tesseract OCR library, with a terribly basic free Mac app and a nicer PC app that can recognize text for you. On the Mac, another nice cheap OCR tool that’s closer to that free PC app is picatext for $3.99. Either way, if OCR is all you need, you don't have to get a copy of Acrobat just for that—but if you have Acrobat, its OCR tool is a great extra.
Taking a few minutes to OCR your PDF documents is all it'll take to get them from being basic images of your paper documents to full-fledged digital documents you can search, copy text from, markup, and export in Office formats. Acrobat has been maligned for its PDF reader, but it still has a ton of great features, and OCR is one of them.
If you have a copy of Acrobat, or a Creative Cloud subscription, give it a try and get your scanned documents OCRed. They'll instantly be way more valuable to you than they'd ever be as plain scans.