Document capture software
|This article relies too much on references to primary sources. (August 2009)|
Document Capture Software refers to applications that provide the ability and feature set to automate the process of scanning paper documents. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
- 1 Typical features
- 2 Document Capture System Solutions - General
- 3 Distributed Capture Solutions
- 4 References
Typical features of Document Capture Software include:
- Barcode recognition
- Patch Code recognition
- Optical Character Recognition (OCR)
- Optical Mark Recognition (OMR)
- Quality Assurance
Goal for Implementation of a Document Capture Solution
The goal for implementing a document capture solution is to reduce the amount of time spent in the scanning and capture process, and produce metadata along with an image file, and/or OCR text. This information is then migrated to a Document Management or Enterprise Content Management system. These systems often provide a search function, allowing search of the assets based on the produced metadata, and then viewed using document imaging software.
Document Capture System Solutions - General
Integration with Document Management System
ECM (Enterprise Content management) and their DMS component (Document Management System) are being adopted by many organizations as a corporate document management system for all types of electronic files, e.g. MS word, PDF ... However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository.
By converting paper documents into digital format through scanning companies can convert paper into image formats such as TIF and JPG and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in the ECM in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents by using keywords in the metadata fields or by searching the content of PDF files across the repository.
Advantages of scanning documents into a ECM/DMS
Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as SharePoint can improve collaboration between departments or employees and also eliminate the risk of losing this information through disasters such as floods or fire.
Organisations adopting an ECM/DMS often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails. For business critical documents, such as purchase orders and supplier invoices, digitising documents can help speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow.
Document Capture Software
There are many document capture software providers that offer integration with ECM to varying levels. Some providers offer a batch interface that simply drops images and index data into a directory and relies on a batch upload utility to transfer these documents into the ECM. Others offer a direct integration with some ECM which allows documents and metadata to be exported into specific folders. A few capture providers offer a very tightly integrated bi-directional interface with some ECM, e.g.
- PSI:Capture from PSIGEN 
- Prevalent Software's Quillix provides flexible methods for capturing documents to SharePoint.
- ChronoScan Capture offers batch scanning with barcode recognition and OCR, and direct export to SharePoint web services, it has a free version for no commercial use 
- Ephesoft offers an open source software version for document capture integrated with open source document management products like Alfresco as well as commercial ones.*
- Librex from Corium provides intelligent scan and capture and smart connectors to multiple systems like Alfresco, SharePoint, Clara, COBA, Docuthèque, IntelliGID, SyGED, Ultima or simply to a network folder. Librex offers a free version (limited page volume) and an enterprise one.
- GScan Online offered by GRADIENT provides the ability to scan/document capture directly within SharePoint On-Premise and Office 365 enabling Advanced Image processing, OCR recognition, and fulltext searchable PDFs, whilst staying in the Microsoft environment. GScan Online app is available as a free version (limited user and page volume) via the Office Store. For more high-volume desktop scanning, see GScan, and for mobile document capture on the GO, see GScan Mobile.
Distributed Capture Solutions
Distributed document capture is a technology which allows the scanning of documents into a central server through the use of individual capture stations. A variation of distributed capture is thin-client document capture in which documents are scanned into a central server through the use of web browser. One of these web-based products was reviewed by AIIM. They said, "(this product) is a thin-client distributed capture system that streamlines the process of acquiring and creating documents." The streamlining is a result of several factors including the lack of software which needs to be installed at every scanning station and the variety of input sources from which documents can be captured. This includes things like email, fax, or a watched folder.
Jeff Shuey, Director of Business Development at Kodak, makes a distinction between distributed capture and what he calls "remote" capture. In an article publishing in AIIM, he said that the key difference between the two is whether or not the information that is captured from scanning needs to be sent to the centralized server. If, as he points out in his article, the document just needs to be scanned and committed to a SharePoint system and doesn't need to be sent to some other centralized server, this is just a remote capture situation.
There are Document Capture Software comparisons available, featuring some of the most relevant products (EMC Captiva, IBM Datacap, or Ephesoft) and extracting performance facts and their most relevant features.
- SharePoint Capture and OCR
- Quillix Capture
- ChronoScan Capture
- Peelen, Tjarda. "Software for Document Capture". Open Source ECM. Retrieved 18 Feb 2014.
- Feild, Don. "Ephesoft".
- Librex Document Capture and Smart Connectors
- GScan Online
- GRADIENT ECM
- GScan Online App Office Store Trial
- GScan (desktop)
- GScan Mobile
- Association for Information and Image Management "Prevalent Software - Quillix", accessed August 29, 2011.
- Association for Information and Image Management "Remote or Distributed Scanning - Are They Different?", accessed August 29, 2011.