Talk:Comparison of optical character recognition software
|WikiProject Software / Computing||(Rated List-class)|
|Text and/or other creative content from this version of Optical character recognition was copied or moved into List of optical character recognition software with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted so long as the latter page exists. The former page's talk page can be accessed at Talk:Optical character recognition.|
|Text and/or other creative content from this version of OCR Software was copied or moved into List of optical character recognition software with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted so long as the latter page exists. The former page's talk page can be accessed at Talk:OCR Software.|
|Text and/or other creative content from this version of OCR SDK was copied or moved into List of optical character recognition software with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted so long as the latter page exists. The former page's talk page can be accessed at Talk:OCR SDK.|
Horribly biased table
|This section or list is incomplete. Please help to improve it, or discuss the issue on the talk page.|
As it is currently (2010.09.14), the majority of the table is tied up in various secondary concerns. While the operating platform is very important to a user, is it really worth dedicating 7 whole columns to it? While I am a big proponent of open source software (I use GNU/Linux on all boxen both at work and at home), I don't consider that it should be the primary feature (hence the first column that bears a feature) when evaluating which OCR to choose. Why is the programming language it's own column? The real features do not begin until the end of the table, where at least the number of languages and number of fonts supported can be viewed as a differentiator between the softwares. However both of these columns are horribly deficient, which ruins their utility (the number of fonts is listed once besides three that say any). I can see that having the SDK is an important feature for some, but not for typical or casual users (which are the ones coming to this page for information). The only real feature column is the Notes column at the very end, which is also very inconsistent. In one entry it presents an exert of a magazine review. In another it repeats the marketing slogan ("Developed for ultimate accuracy" - Transym). In another it insults it as "not entirely accurate" (as if any of the others are, or ever purport to be).
I do enjoy all the information already here, I am just suggesting that it could be presented in a better way. I still much prefer the current format over a simple list (like List_of_statistical_packages) as I like to sort by Linux to see the 8 (2 commercial) that I can use natively. However there are good examples such as List_of_computer_algebra_systems or Comparison_of_video_codecs. In the example of CAS, the information is split into three tables: general, functionality and OS support. I think that such a breakdown of the tables would prove very useful, making so much information accessible.
A future feature list could include
- images directly from scanner
- images from scanned files (each could list supported images pdf, png, tif)
- export support (searchable pdf, txt, html, .doc, .odf, .tex)
- layout analysis
- number of fonts for English (or maybe all lang.)
- number of languages
- handwritten documents
- spreadsheet support
- mathematical formulas (I know of only InftyReader that can, which needs to be added)
- barcode scanning
- SDK (and the language of the SDK)
Well just my 2 cents, as I think that the current version is not exceptionally useful (except to figure out what you can run on your linux box or OSX box). —Preceding unsigned comment added by 220.127.116.11 (talk) 16:51, 14 September 2010 (UTC)
- I agree that the multi-table format is going to be necessary to fit in all the information that is useful to have. It's certainly feasible to serve multiple audiences - casual users, software developers, OCR researchers, etc. I would also suggest separating "Developer" into a separate column, as I've seen done on a number of pages; that info is currently sneaking into the "Notes" column. The main challenge right now is for us to simply put in the time to research all the interesting aspects not currently covered or filled in. -- Beland (talk) 15:52, 20 May 2013 (UTC)
- Looks like that's now at http://sourceforge.net/projects/easy-ocr/ -- Beland (talk) 21:24, 20 May 2013 (UTC)
- From :
- fuzzyocr - spamassassin plugin to check image attachments
- libhocr0 - Hebrew OCR
Use two tables ?
Maybe this list should be split into two tables. One showing software that contains an OCR engine. Another that shows frontends,GUIs, etc that use those engines. Jdc843 (talk) 08:03, 21 December 2012 (UTC)
- mobile OSes: like android (Text Fairy etc, Tesseract) or iOS?
- input: printed letters, handwriting, any alphabet (learns letters) — Preceding unsigned comment added by 18.104.22.168 (talk) 19:18, 8 June 2015 (UTC)
What are "Founded Year" and "Release Year"? Founding of the company? Release of the first version or the latest version?