Automating Image System Indexing

Oct 06, 2014

The Problem

An accounting department scans accounts payable (AP) documents for image system storage. For the images to be useful they need to be indexed into the imaging system organized as payments and searchable on multiple fields (amount, date, vendor, payee, type, etc.).

Due to the variety and quality of the documents, traditional methods for indexing were either not feasible or labor intensive. The documents were not sufficiently structured for traditional OCR. Inserting pre-printed bar code cover sheets between payments was time consuming and could not be used to define all the searchable information.

An evaluation of three well known commercial OCR/automation solutions was conducted. None of them worked well across the variety of documents and all were quite expensive to license.


The Solution

Point Enterprises developed a custom VB.Net web service implemented under the NetXed architecture. The web service integrates the open-source TesseractOCR engine and a small third-party optical MICR recognition engine. NetXed supports security, website and overall process management.

  • AP documents consist of a photocopy of the check/voucher or an ACH payment notice, followed by any number of pages of supporting documents. The department scans a batch of multiple payments having from a few pages to over 100 pages representing intermixed check and ACH payments. The scanner saves the batch as a single large TIFF file.
  • Users select a file from a web page in the application and click a Process tool button. The web service program processes the file into individual pages. Each page is passed first through MICR then the OCR processing. Business rules and “fuzzy logic” are applied to determine if a page is a check/voucher, an ACH notice or a supporting document.
  • These are organized into payments which the program then attempts to find and verify in the Accounting database. Approximately 90% of payments are correctly found and verified. By interfacing directly with the Accounting database the program can extract all the needed index information.
  • Of the remaining payments some are correctly identified as a payment, but cannot be verified with Accounting to a sufficient degree of confidence. These are marked as Suspects. A few items cannot be identified as part of a payment at all and appear as supporting documents in the previous payment.
  • A web page gives the department a way to quickly verify Suspects which are marked with a special icon. Clicking on any payment does an immediate lookup in the Accounting database. If the item retrieved is correct the user simply clicks Update. If the item is not correct they can search for other payments in the Accounting database, or click an icon to view the image and search in AP based on information visible on the document image.
  • 90% of payments are automatically verified. Most remaining Suspects are correctly identified when the record is selected, so can be verified in two clicks and a few seconds. Occasionally users have to view and search, but verification typically requires less than 30 seconds and very little keying.
  • Once a file is fully verified the Release button is enabled. Releasing generates a virtual bar-coded cover sheet that is electronically inserted into the page TIFFs and the entire file reassembled as a large TIFF that now includes cover sheets. No physical cover sheets are printed, saving both time and printing costs. The generated file can be submitted directly to the imaging system and auto-indexed.
  • Because some index fields have special characters or exceed 25 characters, they cannot be rendered as bar codes. Therefore, only key fields are bar coded on cover sheets and there is a back end process to fill in the remaining index fields.

The client recently converted from one imaging system to a different one. Everything remained the same for the Accounting department. The format of the cover sheets changed slightly and the back-end process was changed from a direct database update to generating a CSV file to define the remaining index fields. Both of these were implemented as minor changes to PEI's supporting web service in a few man-hours.


Benefits

Using NetXed to provide the infrastructure for automation projects provides a number of benefits that facilitate rapid, cost-effective implementation. Providing Security, Reporting, Data and Interface Managment, Business Analytics, Intelligent Forms, Website and page management and Web services in a single environment reduces implementation effort. Employees have a familiar user interface, even for widely different functions. I.T. has one integrated environment to manage.