Inside–outside–beginning (tagging)

From Wikipedia, the free encyclopedia
  (Redirected from Inside Outside Beginning)
Jump to navigation Jump to search

The IOB format (short for inside, outside, beginning) is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition).[1] It was presented by Ramshaw and Marcus in their paper "Text Chunking using Transformation-Based Learning", 1995[2] The B- prefix before a tag indicates that the tag is the beginning of a chunk, and an I- prefix before a tag indicates that the tag is inside a chunk. The B- tag is used only when a tag is followed by a tag of the same type without O tokens between them. An O tag indicates that a token belongs to no chunk.

Another similar format which is widely used is IOB2 format, which is the same as the IOB format except that the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

A readable introduction to entity tagging is given in Bob Carpenter's blog post, "Coding Chunkers as Taggers".[3] 'BIO' is plausibly a synonym for 'IOB'.

An example with IOB format:[inconsistent]

Alex I-PER
is O
going O
to O
Los B-LOC
Angeles I-LOC

An example with IOB2 format:

Alex B-PER
is O
going O
to O
Los B-LOC
Angeles I-LOC

Related tagging schemes sometimes include "START/END: This consists of the tags B, E, I, S or O where S is used to represent a chunk containing a single token. Chunks of length greater than or equal to two always start with the B tag and end with the E tag."[4]

Other Tagging Scheme's include BIOES/BILOU, where 'E' and 'L' denotes Last or Ending character is such a sequence and 'S' denotes single element.

An Example with BIOES format:

Alex S-PER
is O
going O
with O
Marty B-PER
A. I-PER
Rick E-PER
to O
Los B-LOC
Angeles E-LOC

References[edit]

  1. ^ "Entity Recognition".
  2. ^ Ramshaw and Marcus (1995). "Text Chunking using Transformation-Based Learning". arXiv:cmp-lg/9505040.
  3. ^ Bob Carpenter (2009). "Coding Chunkers as Taggers: IO, BIO, BMEWO, and BMEWO+".
  4. ^ http://cs229.stanford.edu/proj2005/KrishnanGanapathy-NamedEntityRecognition.pdf