How to Make Scanned Documents Searchable

Scott Hambrick
Posted by Scott Hambrick

Find the Information You Need, Faster!

How to make scanned documents searchableMaking sure you can find the information you want is exactly what the electronic document management (EDM) game is all about. But it can be like finding a needle in a haystack.  


You must first determine what kind of information you want. Do you need to search the body of your scanned documents for keywords, phrases or numbers?  Searchability requires making each document itself searchable. Perhaps you want to jump to a specific portion of a large, multi-page document to find specific information. But you may also want to locate every single document in a repository that contains a specific phrase like, “pencil sharpener” for example. This level of search is what I call “ENTERPRISE SEARCH”, which would be very useful in fulfilling a discovery request to find all documents pertaining to pencil sharpener injuries.

When you are on the phone with a customer, you need the exact document immediately.  That requires well-designed, library-style search like looking up a book by the title, subject or author and getting the location of the shelf for the single book you want. Accessing a scanned document is no different.  You want to be able to enter a few search criteria and receive precisely one result, which must be correct.

You can ensure a great search experience by 1) prepping or designing the documents correctly, 2) tagging properly and 3) using the search function correctly.

1. Prep/Design

Documents need to be designed so that each one corresponds to one record or data type. A challenging example would be medical records. Your medical record contains a history of physicals, labs, nurses’ notes, scripts, copies of IDs and insurance cards, etc. The physician has to rifle through all of that to find the one piece he needs. Keeping search in mind while prepping a medical record for scanning, we would split this record into many, many parts. Not just labs, notes, etc., but labs by date, scripts by date, etc.  As records are created electronically, they should be designed so that different record types do not aggregate in one single PDF or electronic record.

2. Tagging/Indexing

In order to tag your documents correctly, you need know how your staff will be searching for and through documents later on. In our library example, we always know at least two out of three things about a book. The author, the title, or what the book is about.  You can search for a single book by any of those three indexes and locate it.  Those indexes were created when the new book was brought into the library. 

We do the same for your records. When we scan an invoice into a document management system we need to capture at least four pieces of data about that invoice, because people always know at least one of these things about an invoice when they discover they need it. Those pieces of information are: 1) Invoice Number, 2) Date, 3) Vendor Name and 4) Amount.  If the user doesn’t have any of that information, they lack the information that defines the record and can’t search for it.

HR, oil and gas, quality control, engineering, sales and other record types all need a minimal set of indexes that are unique to the record type in order to be functional. If you need help determining what these indexes should be, we can help. We have a number of guides to record indexing for different business records types that we would be glad to share with you.

3. Using the Search Function Correctly

Are you spelling the search term correctly?  Does the search engine use simple search or Boolean search?  Do you know all of the possible search operators?  These factors all make a big difference when you are trying to locate ta difficult to find record.

