Comparison of optical character recognition software

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This comparison of optical character recognition software includes:

  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes
Tesseract 1985 3.02 Oct 2012 Apache No Yes Yes Yes Yes C++, C Yes 35+[1] ? Created by Hewlett-Packard; under further development by Google[2] It was one of the top 3 engines in the 1995 UNLV Accuracy test.
ExperVision[3] TypeReader & RTK 1987 7.1.170.1125 2010 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 21 2618 Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994).[4][citation needed]


The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[5] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats."[6]PC Magazine

Readiris 1987 12 Pro 2009 Proprietary No Yes Yes No No C++ Yes 137[7] ? Product of I.R.I.S. Group of Belgium. Asian and Middle Eastern editions.
ABBYY FineReader 1989 11 2011 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 198[8] ? DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[9] ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[10]
AnyDoc Software 1989 ? ? Proprietary No Yes No No No VBScript ? ? ? Works with structured, semi-structured, and unstructured documents.
LEADTOOLS[11] 1990[12] 18.0 2013 Proprietary Yes Yes Yes Yes No C/C++, .NET, Objective-C, Java, JavaScript Yes 56[13] Any printed font PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[14] Supports Latin, Asian, Arabic, and MICR character sets.[11] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[15] ICR (handwritten text recognition) is supported.[16]
PrimeOCR 1994 5.1 2011 Proprietary No Yes No No No C/C++, C#, VB, VB.NET[17] Yes 11 OmniFont Uses voting technology. Includes several OCR engines. Focuses on character recognition accuracy.
PSI:Capture 1995 4.1 2011 Proprietary No Yes No No No C# No 99 Any printed font Scan, capture and extract data from business documents such as invoices, forms and correspondence and export images/data to over 50 different backend systems including Microsoft SharePoint.
CuneiForm/OpenOCR 1996 12 2007 BSD variant No Yes Yes Yes Yes C/C++ Yes 28 Any printed font Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
Transym OCR 2000 3.3 2011 Proprietary No Yes No No No C#, C/C++, VB, VB.NET Yes 11 ?
Image to OCR Converter 2010[18] 1.2[19] 2012 Proprietary No Yes No No No C/C++, VB and .NET Command Line 40 ? Searchable PDF, Text-Only PDF, Word, HTML, Text[20] It can read most image formats and pdf files, and can scan images from scanner or camera.[21][22]
Aquaforest OCR SDK 2001 1.3 2009 Proprietary Yes[23] Yes No No No C#, ASP, VB.NET Yes 23 OmniFont (Extended Module available, including support for over 100 languages) [24] [23][25] Aquaforest's OCR SDK for .Net enables developers to directly make use of the Aquaforest OCR engine in their own applications and create searchable PDFs, RTF or text files from TIFFs, Bitmap and Image-Only PDF.
SimpleOCR 2002 3.5 2008 Proprietary No Yes No No No ? ? ? ?
Dynamsoft OCR SDK 2003 8.2 2012 Proprietary Yes Yes No No No C/C++ Yes 40+[26] ? PDF, TXT Dynamsoft is the leading provider of image capture SDKs and version control tools.
OmniPage 2005 18 2011 Proprietary No Yes Yes No No C/C++, C#[27] Yes ? ? Product of Nuance Communications
Indian Scripts OCR 2006 1.2 2012 ? Yes[28] Yes No Yes No C#, C/C++, ASP.NET Yes 7 Any printed font An online OCR tool for Indian languages (Bangla, Devanagari, Gurumukhi, Kannada, Malayalam, Tamil )[29]
Microsoft Office OneNote 2007 2007 ? 2007 Proprietary No Yes No No No ? ? ? ?
SoftLogic's OMR Software 2008 v12 2012 Proprietary Yes Yes No No No .NET No [30] ? DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[30] SoftLogic is a leading provider of OMR software,OCR software and question bank software.[30]
New OCR 2009 2.0 2012 Proprietary Yes[31] No No No No C/C++, PHP No 58 Any printed font TXT, DOC, ODT, RTF, PDF, HTML 58 recognition languages support, page layout analysis (multi-column text recognition), supports poorly scanned and photographed pages, supports low-resolution images
NSOCR[32] 2009 2.2 2012 Proprietary No Yes No No No C/C++ Yes 7 Any printed font OCR software development kit. Recognition quality, agility, small size, ease of use, and a royalty-free licensing policy.
OCRFORMS[33] 2009 11.10 2011 Proprietary No Yes No Yes No C, Python No Any language based on Latin alphabet Printed and written Latin fonts Features a complete GUI and has a command-line tool for batch processing. Proprietary algorithms for OCR/ICR/OMR and advanced string correction technology
Digital Syphon's Sonic Imagen 2012 08 2012 Proprietary Yes Yes Yes No No C/C++, .NET, JNI/JAVA Yes 186[34] Any XML, TXT[34] Multi-Core aware process 2,4,8 image files at one time, 64 bit one common simple API. Enterprise and Consumer License Editions for Windows.
FreeOCR ? 4.2 August 2012 Proprietary No Yes No No No ? ? ? ? [35]
GOCR ? 0.49 2010 GPL Yes[36] Yes Yes Yes Yes C ? ? ?
Ocrad ? 0.20 2010 GPL Yes Yes Yes Yes Yes C++ Yes Latin alphabet ? Command line
Java OCR ? 1.101 2010 BSD No Yes No No No Java Yes Any, with training Any printed font, with training[37] Sourceforge project.[38]
SmartScore ? ? ? Proprietary No Yes Yes No No ? ? ? ? For musical scores
Microsoft Office Document Imaging ? Office 2007 2007 Proprietary No Yes No No No ? ? ? ? Uses OmniPage[citation needed]
Puma.NET ? ? ? BSD No Yes No No No C# Yes 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
ReadSoft ? ? ? Proprietary No Yes No No No ? ? ? ? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
RelayFax ? ? ? Proprietary No Yes No No No ? ? Many ? Converts faxed pages into editable document formats (doc, PDF, etc...).
Scantron ? Cognition ? ? Proprietary No Yes No No No ? ? ? ? For working with localized interfaces, corresponding language support is required.
OCRFeeder ? 0.7.11 2009 GPL No No No Yes No Python ? ? ? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OCRKit 2009 2.0 2013 Proprietary Yes Yes Yes Yes No ? Yes 25 any PDF, RTF, HTML, text
OCRopus ? 0.6 2012 Apache No No No Yes No Python ? ? ? Pluggable framework which can use Tesseract
pic2txt 2012 ? 2012 Proprietary Yes Yes No Yes Yes C/C++ ? ? ? text, image Independed recognition software based on contur analysis
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes

References [edit]

  1. ^ Based on count of language training files for version 3.x on 14 December 2010. Available at the download page.
  2. ^ http://code.google.com/p/tesseract-ocr/
  3. ^ http://www.expervision.com/ocr-sdk-toolkit/openrtk-ocr-toolkit-sdk
  4. ^ http://www.isri.unlv.edu/downloads/AT-1994.pdf
  5. ^ "Expervision TypeReader Desktop 7.0". Retrieved 2010-11-15. 
  6. ^ Mendelson, Edward. "TypeReader 2008". PC Magazine. 
  7. ^ http://www.irislink.com/Documents/Image/aa-products/readiris/v14/pdf/RI12vsRI14-tableau%20comparatif-uk.pdf
  8. ^ http://finereader.abbyy.com/full_feature_list/ocr_accuracy/
  9. ^ http://finereader.abbyy.com/professional/tech_specs/
  10. ^ http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html
  11. ^ a b http://www.leadtools.com/sdk/ocr/default.htm
  12. ^ http://www.leadtools.com/corporate/corporate.htm
  13. ^ http://www.leadtools.com/sdk/ocr/product-comparison-chart.htm
  14. ^ http://www.leadtools.com/sdk/formats/ocr.htm
  15. ^ http://www.leadtools.com/sdk/recognition-imaging.htm
  16. ^ http://www.leadtools.com/sdk/ocr/icr.htm
  17. ^ http://primeocr.com/prime_ocr.htm
  18. ^ http://ivr.tmcnet.com/topics/ivr-voicexml/articles/73470-new-image-ocr-converter-reads-converts-text-various.htm
  19. ^ http://www.softpedia.com/get/Multimedia/Graphic/Image-Convertors/Image-to-OCR-Converter.shtml
  20. ^ http://www.yourdigitalspace.com/2011/02/best-and-free-ocr-tools-to-convert-your-images-into-text/
  21. ^ http://savedelete.com/best-free-ocr-software-tools.html
  22. ^ http://www.planetpdf.com/enterprise/article.asp?ContentID=Image_to_OCR_Converter_updated&gid=7974
  23. ^ a b http://www.aquaforest.com/en/index.asp
  24. ^ http://www.aquaforest.com/files/adx_en_xo.pdf
  25. ^ http://www.aquaforest.com/en/ocrsdk.asp
  26. ^ http://www.dynamsoft.com/Downloads/OCR-Language-Package.aspx
  27. ^ http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp
  28. ^ http://tdil-dc.in/index.php?option=com_content&view=article&id=63&lang=en
  29. ^ http://tdil-dc.in/index.php?option=com_content&view=article&id=63&lang=en/
  30. ^ a b c http://softlogic.co.in
  31. ^ http://www.newocr.com/
  32. ^ http://www.nicomsoft.com/nsocr/
  33. ^ http://www.ocrforms.com/
  34. ^ a b http://www.digitalsyphon.com/technologies_sonicimagen.asp?contentpage=technologies_sonicimajen&bodyid=technologies&technologies=technologies
  35. ^ http://www.paperfile.net/
  36. ^ http://jocr.sourceforge.net/
  37. ^ http://roncemer.com/software-development/java-ocr/
  38. ^ http://sourceforge.net/projects/javaocr/files/