ALTO (XML)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

ALTO is an open XML standard to describe OCR text and layout information of printed documents. It is often used with METS standard.

Structure[edit]

An ALTO file consists of three major sections as children of the root <alto> element:[1]

  • <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.
  • <Styles> section contains the text and paragraph styles with their individual descriptions:
    • <TextStyle> has font descriptions
    • <ParagraphStyle> has paragraph descriptions, e.g. alignment information
  • <Layout> section contains the content information. It is subdivided into <Page> elements.
    <?xml version="1.0"?>
    <alto>
      <Description>
        <MeasurementUnit/>
        <sourceImageInformation/>
        <Processing/>
      </Description>
      <Styles>
        <TextStyle/>
        <ParagraphStyle/>
      </Styles>
      <Layout>
        <Page>
          <TopMargin/>
          <LeftMargin/>
          <RightMargin/>
          <BottomMargin/>
          <PrintSpace/>
        </Page>
      </Layout>
    </alto>

See also[edit]

External links[edit]

References[edit]