|
OCR (Optical Character Recognition) is the process of character recognition within an image to make the document searchable. OCR was originally developed with the intent to replace the process of coding documents into a database. While the software to perform OCR processing has greatly improved over the years, the results are not an effective replacement for coding. There are three main issues that determine the quality of the results from an OCR’d image:
Condition of the Original Documents
Quality of the Images
OCR Software Solution Used for Processing
First, the condition of the original documents before they are scanned greatly effects the quality of the results from OCR processing. The best results come from; clean, laser printed, First Generation, letter, legal and ledger sized originals. Examples of these documents are; depositions, letters, correspondence, reports without graphics or tables, manuals, etc. The worst type of documents for OCR processing results are old, handwritten, lots of graphics, lots of tables, non-laser printed, NCR paper, not First Generation, green bar or other colored paper, light grey scale print, colored ink or pencil, etc. Examples of these documents are invoices, diaries, magazines, newspapers, accounting spreadsheets, faxes, ads, handwritten field reports, etc.
The next condition that will affect results is the quality of the actual image files themselves. During the scanning of original documents, adjustments to the scanner are made to maximize the results of the image captured to yield the best possible results.
Following the scanning process, we perform post-processing quality review of the images against the originals to verify the best possible image has been captured. During this post-processing step, we perform additional actions like de-skewing, de-speckling, black-border removal and boundary verification.
The final issue that determines the results of OCR processing is the software. LDM Group, LLC uses the latest ExperVision OCR engine which employs a proprietary character recognition system called Machine Learned Fragment Analysis (MLFA), which is significantly faster and more accurate than other existing OCR technologies. We also utilize the latest Adobe Acrobat Professional version for OCR processing and are always testing and comparing results of other software solutions to improve the overall speed and reliability of our processing results.
Often, OCR is considered to be one of many tools that litigation teams might choose to employ during the preparation of their discovery documents for review and for production. Please contact one of our litigation technology consultants for a review of your project and to prepare a sample testing of your documents to determine if OCR might be right for your document collection processing needs.
|