|This article needs additional citations for verification. (October 2012) (Learn how and when to remove this template message)|
|Original author(s)||Cognitive Technologies|
|Initial release||Source April 2, 2008|
|Stable release||1.1 / April 19, 2011|
|Written in||C and C++|
|Type||Optical character recognition|
CuneiForm is a software tool for optical character recognition. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of the OCR engine was released under the open source BSD license license at the beginning of April 2008.
Algorithms used in CuneiForm come from the rules for writing letters, from their topology, and do not require pattern recognition learning. CuneiForm recognizes any print font (scanned from books, newspapers, magazines, laser printer output, dot-matrix printer output, typewriter text, etc.). It does not recognize handwritten or pseudo-handwritten text nor does it recognize decorative fonts (e.g. Gothic). There are special settings in CuneiForm for recognition of text from dot-matrix printer and 200x100 DPI resolution faxes.
CuneiForm can save text formatting, and also recognizes complicated tables (of any structure).
It recognizes Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, French, German, Hungarian, Italian, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, Russian-English bilingual, Serbian, Slovene, Spanish, Swedish, Turkish, and Ukrainian text.
CuneiForm can be used as a stand-alone command-line application, or as a back-end to other programs. It comes with its own graphic interface. CuneiForm can be also used as an OCR engine in OCRFeeder.
In 1993, Cognitive Technologies signed an OEM contract with Corel Corporation, which allowed the Cognitive recognition library to be built into the popular publishing package Corel Draw 3.0 (and subsequent versions).
In 1996, OCR CuneiForm'96 was released, which was the first OCR package to include the adaptive recognition method of character recognition. This method is based on a combination of two types of printed characters recognition algorithms: multifont and omnifont. This self-learning system is capable of recognizing poorly printed symbols by creating an internal font generated by those symbols which were printed well enough to be recognized. Thus dynamic adjustment (adaptation) for specific input characters is used.
Cognitive Technologies plans to start developing a new version of the software as an investor and coordinator of the project. Developers decided on the BSD license for the release to take into account all legal and technical nuances, but the whole program or its separate modules may be released later licensed under the GPL.
In September 2008, part of Cuneiform was released as open source software. One of the missing parts is table analysis, However, Cognitive has promised to release this component in the future.
- (English) Official website
- (Russian) Official website
- You can download Russian and English version of the setup here, and also the source code.
- Puma.NET is a wrapper library for Cognitive Technologies CuneiFrom recognition engine. It makes it easy to incorporate OCR functionality in any .NET Framework 2.0 (or higher) application.