Comparison of optical character recognition software
From Wikipedia, the free encyclopedia
(Redirected from OCR Software)
An OCR SDK is a software development kit for adding optical character recognition capabilities to forms processing applications, document imaging management systems, e-discovery systems and records management solutions.
In order to avoid the difficulties of incorporating OCR technology, some OCR SDKs contain a high number of APIs, support multiple operating systems and programming languages.
Here is a non-exhaustive comparison of optical character recognition software:
| Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ABBYY FineReader | 1989 | 11 | 2011 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 186[1] | ? | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[2] |
| AnyDoc Software | 1989 | ? | ? | Proprietary | No | Yes | No | No | No | VBScript | ? | ? | ? | Works with structured, semi-structured, and unstructured documents. |
| CuneiForm/OpenOCR | ? | 12 | 2007 | BSD variant | No | Yes | Yes | Yes | Yes | C/C++ | Yes | 28 | Any printed font | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure |
| ExperVision TypeReader & RTK | 1987 | 7.1.170.1125 | 2010 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 17 | 2618 | Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994).[3][citation needed]
|
| OCRFORMS[6] | 2009 | 11.10 | 2011 | Proprietary | No | Yes | No | Yes | No | C/Python | No | Any language based on latin alphabet | Printed and written latin fonts | Features a complete GUI and has a command-line tool for batch processing. Propietary algorithms for OCR/ICR/OMR and advanced string correction technology |
| GOCR | ? | 0.47 | 2009 | GPL | Yes[7] | Yes | Yes | Yes | Yes | C | ? | ? | ? | |
| LEADTOOLS[8] | 1990[9] | 17 | 2010 | Proprietary | No | Yes | No | No | No | various | Yes | 56[10] | Any printed font | Supports Latin, Asian, Arabic, and MICR character sets.[8] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[11] ICR (handwritten text recognition) is supported.[12] |
| Java OCR | ? | Java OCR | 2010 | ? | No | Yes | No | No | No | ? | ? | ? | ? | Uses Java[citation needed] |
| Microsoft Office Document Imaging | ? | Office 2007 | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Uses OmniPage[citation needed] |
| Microsoft Office OneNote 2007 | 2007 | ? | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | |
| NSOCR[13] | 2009 | 2.2 | 2012 | Proprietary | No | Yes | No | No | No | C/C++ | Yes | 7 | Any printed font | OCR software development kit. Recognition quality, agility, small size, ease of use, and a royalty-free licensing policy. |
| Ocrad | ? | 0.20 | 2010 | GPL | Yes | Yes | Yes | Yes | Yes | C++ | Yes | Latin alphabet | ? | Command line |
| OCRopus | ? | 0.3.1 | 2008 | Apache | No | No | No | Yes | No | C++ and Lua | ? | ? | ? | Pluggable framework which can use Tesseract |
| OCRFeeder | ? | 0.7.7 | 2009 | GPL | No | No | No | Yes | No | Python | ? | ? | ? | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad |
| OmniPage | 2005 | 18 | 2011 | Proprietary | No | Yes | Yes | No | No | C/C++/C#[14] | Yes | ? | ? | Product of Nuance Communications |
| PrimeOCR | 1994 | 5.1 | 2011 | Proprietary | No | Yes | No | No | No | C/C++/C#,VB/VB.NET[15] | Yes | 11 | OmniFont | Uses voting technology. Includes several OCR engines. Focuses on character recognition accuracy. |
| PSI:Capture | 1995 | 4.1 | 2011 | Proprietary | No | Yes | No | No | No | C# | No | 99 | Any printed font | Scan, capture and extract data from business documents such as invoices, forms and correspondance and export images/data to over 50 different backend systems including Microsoft SharePoint. |
| Puma.NET | ? | ? | ? | BSD | No | Yes | No | No | No | C# | Yes | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications |
| Readiris | ? | 12 Pro | 2009 | Proprietary | No | Yes | Yes | No | No | C++ | Yes | ? | ? | Product of I.R.I.S. Group of Belgium. Asian and Middle Eastern editions. |
| ReadSoft | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. |
| RelayFax | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | Many | ? | Converts faxed pages into editable document formats (doc, PDF, etc...). |
| Scantron | ? Cognition | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | For working with localized interfaces, corresponding language support is required. |
| SimpleOCR | 2002 | 3.5 | 2008 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | |
| SmartScore | ? | ? | ? | Proprietary | No | Yes | Yes | No | No | ? | ? | ? | ? | For musical scores |
| Tesseract | ? | 3.01 | 2010 | Apache | Yes[16] | Yes[17] | Yes | Yes | No | C++, C | ? | 35+[18] | ? | Created by Hewlett-Packard; under further development by Google |
| Transym OCR | ? | 3.0 | 2008 | Proprietary | No | Yes | No | No | No | C#, C/C++, VB, VB.NET | Yes | 11 | ? | |
| Zonal OCR | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | |
| Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Notes |
[edit] References
- ^ http://finereader.abbyy.com/full_feature_list/ocr_accuracy/
- ^ http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html
- ^ http://www.isri.unlv.edu/downloads/AT-1994.pdf
- ^ "Expervision TypeReader Desktop 7.0". http://www.expervision.com/ocr-software/desktop-ocr-typereader-7. Retrieved 2010-11-15.
- ^ Mendelson, Edward. "TypeReader 2008". PC Magazine. http://www.pcmag.com/article2/0,2817,2326804,00.asp.
- ^ http://www.ocrforms.com/
- ^ http://jocr.sourceforge.net/
- ^ a b http://www.leadtools.com/sdk/ocr/default.htm
- ^ http://www.leadtools.com/corporate/corporate.htm
- ^ http://www.leadtools.com/sdk/ocr/product-comparison-chart.htm
- ^ http://www.leadtools.com/sdk/recognition-imaging.htm
- ^ http://www.leadtools.com/sdk/ocr/icr.htm
- ^ http://www.nicomsoft.com/nsocr/
- ^ http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp
- ^ http://primeocr.com/prime_ocr.htm
- ^ http://code.google.com/p/tesseract-ocr/
- ^ Working as a command-line based program, as it provides no GUI.
- ^ Based on count of language training files for version 3.x, available at the download page, on 14 December 2010.