Comparison of optical character recognition software
From Wikipedia, the free encyclopedia
(Redirected from List of optical character recognition software)
|
|
This article uses bare URLs for citations. (March 2013) |
This comparison of optical character recognition software includes:
- OCR engines, that do the actual character identification
- Layout analysis software, that divide scanned documents into zones suitable for OCR
- Graphical interfaces to one or more OCR engines
- Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
| Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tesseract | 1985 | 3.02 | Oct 2012 | Apache | No | Yes | Yes | Yes | Yes | C++, C | Yes | 35+[1] | ? | Created by Hewlett-Packard; under further development by Google[2] It was one of the top 3 engines in the 1995 UNLV Accuracy test. | ||
| ExperVision[3] TypeReader & RTK | 1987 | 7.1.170.1125 | 2010 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 21 | 2618 | Won the highest marks in the independent testing performed by UNLV for X consecutive years (in 1994).[4][citation needed]
|
||
| Readiris | 1987 | 12 Pro | 2009 | Proprietary | No | Yes | Yes | No | No | C++ | Yes | 137[7] | ? | Product of I.R.I.S. Group of Belgium. Asian and Middle Eastern editions. | ||
| ABBYY FineReader | 1989 | 11 | 2011 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 198[8] | ? | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[9] | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[10] | |
| AnyDoc Software | 1989 | ? | ? | Proprietary | No | Yes | No | No | No | VBScript | ? | ? | ? | Works with structured, semi-structured, and unstructured documents. | ||
| LEADTOOLS[11] | 1990[12] | 18.0 | 2013 | Proprietary | Yes | Yes | Yes | Yes | No | C/C++, .NET, Objective-C, Java, JavaScript | Yes | 56[13] | Any printed font | PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[14] | Supports Latin, Asian, Arabic, and MICR character sets.[11] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[15] ICR (handwritten text recognition) is supported.[16] | |
| PrimeOCR | 1994 | 5.1 | 2011 | Proprietary | No | Yes | No | No | No | C/C++, C#, VB, VB.NET[17] | Yes | 11 | OmniFont | Uses voting technology. Includes several OCR engines. Focuses on character recognition accuracy. | ||
| PSI:Capture | 1995 | 4.1 | 2011 | Proprietary | No | Yes | No | No | No | C# | No | 99 | Any printed font | Scan, capture and extract data from business documents such as invoices, forms and correspondence and export images/data to over 50 different backend systems including Microsoft SharePoint. | ||
| CuneiForm/OpenOCR | 1996 | 12 | 2007 | BSD variant | No | Yes | Yes | Yes | Yes | C/C++ | Yes | 28 | Any printed font | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure | ||
| Transym OCR | 2000 | 3.3 | 2011 | Proprietary | No | Yes | No | No | No | C#, C/C++, VB, VB.NET | Yes | 11 | ? | |||
| Image to OCR Converter | 2010[18] | 1.2[19] | 2012 | Proprietary | No | Yes | No | No | No | C/C++, VB and .NET | Command Line | 40 | ? | Searchable PDF, Text-Only PDF, Word, HTML, Text[20] | It can read most image formats and pdf files, and can scan images from scanner or camera.[21][22] | |
| Aquaforest OCR SDK | 2001 | 1.3 | 2009 | Proprietary | Yes[23] | Yes | No | No | No | C#, ASP, VB.NET | Yes | 23 | OmniFont (Extended Module available, including support for over 100 languages) [24] | [23][25] Aquaforest's OCR SDK for .Net enables developers to directly make use of the Aquaforest OCR engine in their own applications and create searchable PDFs, RTF or text files from TIFFs, Bitmap and Image-Only PDF. | ||
| SimpleOCR | 2002 | 3.5 | 2008 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | |||
| Dynamsoft OCR SDK | 2003 | 8.2 | 2012 | Proprietary | Yes | Yes | No | No | No | C/C++ | Yes | 40+[26] | ? | PDF, TXT | Dynamsoft is the leading provider of image capture SDKs and version control tools. | |
| OmniPage | 2005 | 18 | 2011 | Proprietary | No | Yes | Yes | No | No | C/C++, C#[27] | Yes | ? | ? | Product of Nuance Communications | ||
| Indian Scripts OCR | 2006 | 1.2 | 2012 | ? | Yes[28] | Yes | No | Yes | No | C#, C/C++, ASP.NET | Yes | 7 | Any printed font | An online OCR tool for Indian languages (Bangla, Devanagari, Gurumukhi, Kannada, Malayalam, Tamil )[29] | ||
| Microsoft Office OneNote 2007 | 2007 | ? | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | |||
| SoftLogic's OMR Software | 2008 | v12 | 2012 | Proprietary | Yes | Yes | No | No | No | .NET | No | [30] | ? | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[30] | SoftLogic is a leading provider of OMR software,OCR software and question bank software.[30] | |
| New OCR | 2009 | 2.0 | 2012 | Proprietary | Yes[31] | No | No | No | No | C/C++, PHP | No | 58 | Any printed font | TXT, DOC, ODT, RTF, PDF, HTML | 58 recognition languages support, page layout analysis (multi-column text recognition), supports poorly scanned and photographed pages, supports low-resolution images | |
| NSOCR[32] | 2009 | 2.2 | 2012 | Proprietary | No | Yes | No | No | No | C/C++ | Yes | 7 | Any printed font | OCR software development kit. Recognition quality, agility, small size, ease of use, and a royalty-free licensing policy. | ||
| OCRFORMS[33] | 2009 | 11.10 | 2011 | Proprietary | No | Yes | No | Yes | No | C, Python | No | Any language based on Latin alphabet | Printed and written Latin fonts | Features a complete GUI and has a command-line tool for batch processing. Proprietary algorithms for OCR/ICR/OMR and advanced string correction technology | ||
| Digital Syphon's Sonic Imagen | 2012 | 08 | 2012 | Proprietary | Yes | Yes | Yes | No | No | C/C++, .NET, JNI/JAVA | Yes | 186[34] | Any | XML, TXT[34] | Multi-Core aware process 2,4,8 image files at one time, 64 bit one common simple API. Enterprise and Consumer License Editions for Windows. | |
| FreeOCR | ? | 4.2 | August 2012 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | [35] | ||
| GOCR | ? | 0.49 | 2010 | GPL | Yes[36] | Yes | Yes | Yes | Yes | C | ? | ? | ? | |||
| Ocrad | ? | 0.20 | 2010 | GPL | Yes | Yes | Yes | Yes | Yes | C++ | Yes | Latin alphabet | ? | Command line | ||
| Java OCR | ? | 1.101 | 2010 | BSD | No | Yes | No | No | No | Java | Yes | Any, with training | Any printed font, with training[37] | Sourceforge project.[38] | ||
| SmartScore | ? | ? | ? | Proprietary | No | Yes | Yes | No | No | ? | ? | ? | ? | For musical scores | ||
| Microsoft Office Document Imaging | ? | Office 2007 | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Uses OmniPage[citation needed] | ||
| Puma.NET | ? | ? | ? | BSD | No | Yes | No | No | No | C# | Yes | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications | ||
| ReadSoft | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | ||
| RelayFax | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | Many | ? | Converts faxed pages into editable document formats (doc, PDF, etc...). | ||
| Scantron | ? Cognition | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | For working with localized interfaces, corresponding language support is required. | ||
| OCRFeeder | ? | 0.7.11 | 2009 | GPL | No | No | No | Yes | No | Python | ? | ? | ? | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad | ||
| OCRKit | 2009 | 2.0 | 2013 | Proprietary | Yes | Yes | Yes | Yes | No | ? | Yes | 25 | any | PDF, RTF, HTML, text | ||
| OCRopus | ? | 0.6 | 2012 | Apache | No | No | No | Yes | No | Python | ? | ? | ? | Pluggable framework which can use Tesseract | ||
| pic2txt | 2012 | ? | 2012 | Proprietary | Yes | Yes | No | Yes | Yes | C/C++ | ? | ? | ? | text, image | Independed recognition software based on contur analysis | |
| Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
References [edit]
- ^ Based on count of language training files for version 3.x on 14 December 2010. Available at the download page.
- ^ http://code.google.com/p/tesseract-ocr/
- ^ http://www.expervision.com/ocr-sdk-toolkit/openrtk-ocr-toolkit-sdk
- ^ http://www.isri.unlv.edu/downloads/AT-1994.pdf
- ^ "Expervision TypeReader Desktop 7.0". Retrieved 2010-11-15.
- ^ Mendelson, Edward. "TypeReader 2008". PC Magazine.
- ^ http://www.irislink.com/Documents/Image/aa-products/readiris/v14/pdf/RI12vsRI14-tableau%20comparatif-uk.pdf
- ^ http://finereader.abbyy.com/full_feature_list/ocr_accuracy/
- ^ http://finereader.abbyy.com/professional/tech_specs/
- ^ http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html
- ^ a b http://www.leadtools.com/sdk/ocr/default.htm
- ^ http://www.leadtools.com/corporate/corporate.htm
- ^ http://www.leadtools.com/sdk/ocr/product-comparison-chart.htm
- ^ http://www.leadtools.com/sdk/formats/ocr.htm
- ^ http://www.leadtools.com/sdk/recognition-imaging.htm
- ^ http://www.leadtools.com/sdk/ocr/icr.htm
- ^ http://primeocr.com/prime_ocr.htm
- ^ http://ivr.tmcnet.com/topics/ivr-voicexml/articles/73470-new-image-ocr-converter-reads-converts-text-various.htm
- ^ http://www.softpedia.com/get/Multimedia/Graphic/Image-Convertors/Image-to-OCR-Converter.shtml
- ^ http://www.yourdigitalspace.com/2011/02/best-and-free-ocr-tools-to-convert-your-images-into-text/
- ^ http://savedelete.com/best-free-ocr-software-tools.html
- ^ http://www.planetpdf.com/enterprise/article.asp?ContentID=Image_to_OCR_Converter_updated&gid=7974
- ^ a b http://www.aquaforest.com/en/index.asp
- ^ http://www.aquaforest.com/files/adx_en_xo.pdf
- ^ http://www.aquaforest.com/en/ocrsdk.asp
- ^ http://www.dynamsoft.com/Downloads/OCR-Language-Package.aspx
- ^ http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp
- ^ http://tdil-dc.in/index.php?option=com_content&view=article&id=63&lang=en
- ^ http://tdil-dc.in/index.php?option=com_content&view=article&id=63&lang=en/
- ^ a b c http://softlogic.co.in
- ^ http://www.newocr.com/
- ^ http://www.nicomsoft.com/nsocr/
- ^ http://www.ocrforms.com/
- ^ a b http://www.digitalsyphon.com/technologies_sonicimagen.asp?contentpage=technologies_sonicimajen&bodyid=technologies&technologies=technologies
- ^ http://www.paperfile.net/
- ^ http://jocr.sourceforge.net/
- ^ http://roncemer.com/software-development/java-ocr/
- ^ http://sourceforge.net/projects/javaocr/files/