|This article needs additional citations for verification. (May 2010)|
hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in form of Hypertext Markup Language (HTML) or XHTML.
Software that utilizes this format includes:
- Cuneiform — free OCR software
- OCRopus — free OCR software for Linux
- Tesseract — OCR engine used by OCRopus (as of 3.0)
- Thomas Breuel, ed. (March 2010). "The hOCR Embedded OCR Workflow and Output Format".
|This computer storage–related article is a stub. You can help Wikipedia by expanding it.|