Semi-structured data

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Semi-structured data[1] is a form of structured data that does not conform with the formal structure of tables and data models associated with relational databases but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as schemaless or self-describing structure.

In the semi-structured data, the entities belonging to the same class may have different attributes even though they are grouped together, and the attributes' order is not important.

Semi-structured data is increasingly occurring since the advent of the Internet where full-text documents and databases are not the only forms of data any more and different applications need a medium for exchanging information. In object-oriented databases, one often finds semi-structured data.

[edit] Types of Semi-structured data

XML,[2] other markup languages, email, and EDI are all forms of semi-structured data. OEM (Object Exchange Model) [3] was created prior to XML as a means of self-describing a data structure.

[edit] References

  1. ^ Tutorial on semi-structured data by Peter Buneman from Symposium on Principles of Database Systems, 1997 [1]
  2. ^ The Penn database group has semi-structured and XML data project
  3. ^ Stanford Universities Lore DBMS
Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages