hOCR

From Wikipedia, the free encyclopedia
Jump to: navigation, search

hOCR is an open standard which defines a data format for representation of OCR output. The standard aims to embed layout, recognition confidence, style and other information into the recognized text itself. Embedding this data into text in the standard HTML format is used to achieve that goal.

See also[edit]

  • Software that utilizes this format:
    • Cuneiform — free OCR software
    • OCRopus — free OCR software for Linux
    • Tesseract — OCR engine used by OCRopus (as of 3.0)

External links[edit]