hOCR
From Wikipedia, the free encyclopedia
| This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (May 2010) |
hOCR is an open standard which defines a data format for representation of OCR output. The standard aims to embed layout, recognition confidence, style and other information into the recognized text itself. Embedding this data into text in the standard HTML format is used to achieve that goal.
[edit] See also
- Software that utilizes this format:
- OCRopus — free OCR software for Linux
- Tesseract — OCR engine used by OCRopus (as of 3.0)
- Cuneiform — free OCR software
- ExactImage — free image processing software
[edit] External links
- Public Specification for the hOCR Format
- hocr-tools on Google Code
- hOCR discussion group
- moz-hocr-edit hOCR document editor
| This computer storage-related article is a stub. You can help Wikipedia by expanding it. |