Comparison of optical character recognition software

From Wikipedia, the free encyclopedia
  (Redirected from OCR Software)
Jump to: navigation, search

An OCR SDK is a software development kit for adding optical character recognition capabilities to forms processing applications, document imaging management systems, e-discovery systems and records management solutions.

In order to avoid the difficulties of incorporating OCR technology, some OCR SDKs contain a high number of APIs, support multiple operating systems and programming languages.

Here is a non-exhaustive comparison of optical character recognition software:

Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Notes
ABBYY FineReader 1989 11 2011 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 186[1] ? ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[2]
AnyDoc Software 1989 ? ? Proprietary No Yes No No No VBScript ? ? ? Works with structured, semi-structured, and unstructured documents.
CuneiForm/OpenOCR ? 12 2007 BSD variant No Yes Yes Yes Yes C/C++ Yes 28 Any printed font Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
ExperVision TypeReader & RTK 1987 7.1.170.1125 2010 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 17 2618 Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994).[3][citation needed]


The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[4] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats."[5]PC Magazine

OCRFORMS[6] 2009 11.10 2011 Proprietary No Yes No Yes No C/Python No Any language based on latin alphabet Printed and written latin fonts Features a complete GUI and has a command-line tool for batch processing. Propietary algorithms for OCR/ICR/OMR and advanced string correction technology
GOCR ? 0.47 2009 GPL Yes[7] Yes Yes Yes Yes C ? ? ?
LEADTOOLS[8] 1990[9] 17 2010 Proprietary No Yes No No No various Yes 56[10] Any printed font Supports Latin, Asian, Arabic, and MICR character sets.[8] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[11] ICR (handwritten text recognition) is supported.[12]
Java OCR ? Java OCR 2010 ? No Yes No No No ? ? ? ? Uses Java[citation needed]
Microsoft Office Document Imaging ? Office 2007 2007 Proprietary No Yes No No No ? ? ? ? Uses OmniPage[citation needed]
Microsoft Office OneNote 2007 2007 ? 2007 Proprietary No Yes No No No ? ? ? ?
NSOCR[13] 2009 2.2 2012 Proprietary No Yes No No No C/C++ Yes 7 Any printed font OCR software development kit. Recognition quality, agility, small size, ease of use, and a royalty-free licensing policy.
Ocrad ? 0.20 2010 GPL Yes Yes Yes Yes Yes C++ Yes Latin alphabet ? Command line
OCRopus ? 0.3.1 2008 Apache No No No Yes No C++ and Lua ? ? ? Pluggable framework which can use Tesseract
OCRFeeder ? 0.7.7 2009 GPL No No No Yes No Python ? ? ? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OmniPage 2005 18 2011 Proprietary No Yes Yes No No C/C++/C#[14] Yes ? ? Product of Nuance Communications
PrimeOCR 1994 5.1 2011 Proprietary No Yes No No No C/C++/C#,VB/VB.NET[15] Yes 11 OmniFont Uses voting technology. Includes several OCR engines. Focuses on character recognition accuracy.
PSI:Capture 1995 4.1 2011 Proprietary No Yes No No No C# No 99 Any printed font Scan, capture and extract data from business documents such as invoices, forms and correspondance and export images/data to over 50 different backend systems including Microsoft SharePoint.
Puma.NET ? ? ? BSD No Yes No No No C# Yes 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
Readiris ? 12 Pro 2009 Proprietary No Yes Yes No No C++ Yes ? ? Product of I.R.I.S. Group of Belgium. Asian and Middle Eastern editions.
ReadSoft ? ? ? Proprietary No Yes No No No ? ? ? ? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
RelayFax ? ? ? Proprietary No Yes No No No ? ? Many ? Converts faxed pages into editable document formats (doc, PDF, etc...).
Scantron ? Cognition ? ? Proprietary No Yes No No No ? ? ? ? For working with localized interfaces, corresponding language support is required.
SimpleOCR 2002 3.5 2008 Proprietary No Yes No No No ? ? ? ?
SmartScore ? ? ? Proprietary No Yes Yes No No ? ? ? ? For musical scores
Tesseract ? 3.01 2010 Apache Yes[16] Yes[17] Yes Yes No C++, C ? 35+[18] ? Created by Hewlett-Packard; under further development by Google
Transym OCR ? 3.0 2008 Proprietary No Yes No No No C#, C/C++, VB, VB.NET Yes 11 ?
Zonal OCR ? ? ? Proprietary No Yes No No No ? ? ? ?
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Notes

[edit] References

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export