If you’ve been spurred to try and move towards a paperless office then you’re not alone! 2013 is the year of going paperless and that doesn’t just mean scanning those old receipts and correspondence, it’s fully digitising them too. In this guide we’ll show you five different apps that can convert documents you’ve scanned into fully searchable ones using a technology called OCR.
Update: You can also easily scan documents with OCR technology via your iPhone, by using the new DocScan app on Envato Market.
If you'd prefer to use a traditional scanner to scan your documents, read on for the full details of how to do that.
What Is OCR?
OCR stands for Optical Character Recognition. Whenever you scan a document, the scanner itself has no way of knowing what the difference between text and an image is, so everything you scan is effectively an image. This also applies even if you chose to save it as a PDF as you won’t be able to (yet) select any text.
OCR technology has been around for quite a while but it's an often understated feature that is usually never even looked over. If you've bought a scanner in the last few years then chances are you already had some pretty nifty OCR software on the disc it came with! As Mac users, we're sometimes spoilt by the fact that we hardly ever need to worry about installing drivers so software on the same discs is often ignored.
Tip: A PDF is just a container for text and images so any receipts or correspondence that you’ve scanned and saved as PDF aren’t yet searchable.
Prizmo is a dedicated OCR app. It isn’t designed to help you crop or straighten your scanned documents, it’s sole purpose is to analyse the text of any scans and convert it into searchable text. It’s not just limited to plain text documents such as receipts and correspondence, Prizmo will even analyse old newspapers and magazines, book covers, pretty much anything with any shape, size and colour of text is something Prizmo will convert.
Prizmo includes the ability to capture scans directly from Image Capture, OS X’s built-in camera and scanner import app, so you can use it in conjunction with any existing scanner. You can also import existing files if you’ve been scanning them already.
When you launch Prizmo it will prompt you to either create a new document or open an existing one. Before continuing, this is referring to a Prizmo document and not the one you’re wanting to analyse. It can be a little confusing but Prizmo can save the scans you’ve done if you ever need to go back and alter the text, for example if you had scanned in a 200-page PDF and noticed some pages weren’t properly analysed in the middle and some text was missing. Saving your work in Prizmo means you can go back and make any changes as needed.
Select New Document… and you’ll be presented with a new Prizmo document to start using.
We can either drag and drop an image file (JPG, PDF, TIFF, etc), import from our scanner or even browse a photo library.
For the purposes of this tutorial, I’m using an existing document that I had scanned in using my flatbed scanner. It was saved directly as a PDF and as you can see, I can’t highlight any text.
Select Open Image File… and select an image to use.
Once you see the image loaded, you’ll be presented with a familiar page layout, complete with page thumbnails on the left-side.
We have some adjustment controls at the bottom where we can adjust the rotation, crop the image and more. Prizmo will automatically detect the document’s requirements and make any necessary settings changes automatically, but we can always tweak them whenever necessary.
I’m happy with the default settings, so simply click Recognize and Prizmo will automatically detect any text areas and analyse them almost instantly.
In the same way as you’d draw an area to scan when you’re scanning a document, the same can be done for text areas. Prizmo will attempt this automatically but again, you have complete control.
The analysed text is then displayed on the right-hand side of the document. At this stage, nothing is saved. If the OCR wasn’t fully accurate, you can go in and make any changes.
Prizmo was 100% accurate with my document so there’s no changes for me to make. You can export your document to a number of cloud services such as Dropbox and Google Drive, or attach them to a new mail message.
I’m going to save my document to my desktop, so I’ll select File…
Prizmo is extremely useful for anyone scanning documents on a regular basis with any type of scanner. The advanced controls you have access to mean you can fine tune how the OCR process works instead of relying on fully automatic settings.
Learn more about Prizmo.
2. ABBYY FineReader Express
ABBYY FineReader Express is another specialised OCR tool designed specifically for the task, and it does it very well. The OCR process is automated so the only user interaction is telling ABBYY FineReader Express which document to load and to where the OCR’d version should be saved.
Instead of creating a new document or opening an existing one, ABBYY FineReader Express has a Quick Tasks panel that opens on launch. It’s a quick way to OCR documents with as few mouse clicks as possible.
You can convert scanned documents to a number of different formats and an ace up its sleeve is the ability to OCR a spreadsheet and output a fully searchable - and editable - one, making it very tempting for business users.
As we already have a PDF we need to OCR, launch ABBYY FineReader Express and select Convert to Searchable PDF and pick the document you want to OCR.
That’s actually it! ABBYY FineReader Express will prompt you to save the new OCR’d document to a location of your choice. Strangely, you’re prompted to save the document before the preview loads so to see if ABBYY FineReader Express was able to OCR the document properly, you’ll need to cancel saving the document and then save it from the menu instead.
Learn more about ABBYY FineReader Express.
We’ve covered the Doxie scanner and software in our previous guide “Go Paperless With Doxie” but it’s worth mentioning its built-in OCR features again.
Doxie includes built-in OCR in its import app so any documents you scan will have the option of being analysed. However, Doxie doesn’t contain a lot of controls and automates most of the process.
Scan any document you want to OCR and then launch the Doxie app, making sure your Doxie scanner is connected.
Once you’ve imported your scanned document you can then select where and how you’d like to export it. In this instance, I’ll be selecting PDF with OCR (Black and White). It’s better to select the type of document as it can make sure to use a format that takes up less space.
Tip: You often find that many companies send correspondence letters that are in a particular colour to keep with their branding - I’d recommend just saving them as black and white to keep the file size as low as possible.
Once you select where to save the PDF, Doxie will OCR the document and export it. The text is completely searchable and doesn’t replace the text of the scan, instead it uses a clever feature of PDFs called text overlay. Your document may look the same as it did before, but that’s a good thing. Instead, the text is placed transparently over the text, making it searchable and highlightable.
Whilst the Doxie process is very straightforward, there aren’t as many options as a dedicated OCR app such as Prizmo. However it does mean if you’re already a Doxie user or only do light scanning then those features may not be of much benefit to you.
Doxie scanners start from $119 and are available from Getdoxie.com.
PDFPen is a little different than Prizmo as it’s not just an OCR tool. It’s an all-in-one tool designed to fill in, edit and alter PDFs. One of its features is that it can detect scanned documents and perform OCR in one step.
Launch PDFPen and it will automatically prompt you to select a PDF to open. Select a scanned document and click Open.
Once PDFPen opens the document and detects it was scanned (rather than downloaded or computer generated), it will prompt if you’d like to analyse it and digitise the text. You have the option of just running the OCR tool on the current page or the entire document.
Specify the language required and select the relevant button - in this case I just selected OCR Document.
Once it’s finished, save the PDF. Unlike Doxie or Prizmo, you don’t create another copy immediately. PDFPen modifies existing PDF files so you can simply save the changes, eliminating the inconvenience of managing an additional file.
Learn more about PDFPen.
Evernote is an extremely popular note-syncing service that acts more of a hybrid between a scrapbook and a notebook. Think of it as having a filing cabinet full of pieces of information that’s always available and always easy to search.
We’ve covered Evernote extensively before here on Mactuts+ and I encourage anyone who uses Evernote (or is interested in using it more) to read our article “Taming the Elephant: Awesome Evernote Tips and Tricks” to learn more about it.
One feature of Evernote that is often overlooked and never really shown to the user is their automatic OCR service. Yep, any image you add to Evernote is scanned for text and added to your note. It’s performed server-side so adding a document to Evernote isn’t instantly converted. Due to the number of Evernote users, it’s also not instant. To prevent server problems, all documents requiring OCR are queued. There’s no way to know when it will be scanned but it’s usually within 24–48 hours. If you’re a premium member, it’s quicker.
To have a document scanned, simply drag it and add it to a new or existing note, making sure to sync Evernote as soon as you’ve done it. That’s all there is to it.
Eventually, Evernote will scan the document and perform OCR. Once that happens, the document will then be updated and sync back to Evernote on your device. It took about ten minutes for Evernote to OCR the document I added (I’m an Evernote Premium subscriber so times will vary).
The OCR is usually very accurate but there is no control over how the OCR works. It’s done automatically with no user input or settings.
You can then search for text and, as you can see, the text highlights as you search. After looking through the note, it appears to have been 100% accurate.
Step 4 (Optional)
If you’d like to keep a searchable PDF version outside of Evernote, you can right-click and select Save Searchable PDF As…
It’s not ideal as Evernote wraps every word with a green box so printing it may not be such a good idea, but it works.
Whilst its features are quite basic, using Evernote as a central hub for your paperless office is becoming even more popular so if you’re wanting to do the same then you could cut out any OCR process and just drop scans directly into Evernote. It’ll take care of the OCR for you and since most items are going to be receipts and correspondence then you’ll likely have almost no problem with Evernote’s OCR service.
Evernote is free, with premium accounts at $5 per month or $35 per year.
There’s a number of ways you can digitise those scanned documents to make them text-searchable and the costs of using an OCR tool has dropped dramatically. Gone are the days where you’re stuck to whatever app your scanner came with, you’re now free to use pretty much any OCR app you’d like.
If you’re going to find yourself using not only OCR tools but want a way of manipulating PDFs then PDFPen is the best choice. For anyone just wanting a way to OCR then I’d recommend Prizmo. Even if you have a Doxie, Prizmo gives you more control over how the OCR process works.
For anyone wanting to very occasionally OCR something then getting a free Evernote account is the most economical option.
Have you tried to go paperless? Do you bother with OCR or is everything searchable in your digital office? We’d love to hear from you so, as always, discuss the topic further in the comments.