Database preservation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics (Context, Content, Structure, Appearance and Behaviour) of the data.

Preservation formats[edit]

SIARD (Software Independent Archiving of Relational Databases) Version 1.0[edit]

An open format developed by the Swiss Federal Archives, designed for archiving relational databases in a vendor-neutral form. A SIARD archive is a ZIP-based package of files based on XML and SQL:1999. A SIARD file incorporates not only the database content, but also machine-processable structural metadata that records the structure of database tables and their relationships. The ZIP file contains an XML file describing the database structure (metadata.xml) as well as a collection of XML files, one per table, capturing the table content. The SIARD archive may also contain text files and binary files representing database large objects (BLOBs and CLOBs). SIARD permits direct access to individual tables by exploring with ZIP tools. A SIARD archive is not an operational database but supports re-integration of the archived database into another relational database management system (RDBMS) that supports SQL:1999. In addition, SIARD supports the addition of descriptive and contextual metadata that is not recorded in the database itself and the embedding of documentation files in the archive.[1]

SIARD (Software Independent Archiving of Relational Databases) Version 2.0[edit]

A new version of SIARD preservation format, that has been designed and developed by the EARK project.[2] A draft specification is available, together with a request for comments.

DBML (Database Markup Language)[edit]

A XML schema created by the academic researcher José Carlos Ramalho[3] from the University of Minho that is able to capture table information and data from a relational database.[4]

Software[edit]

Database Preservation Toolkit[edit]

The Database Preservation Toolkit (a.k.a DBPTK) allows conversion between Database formats, including connection to live systems, for purposes of digitally preserving databases. The toolkit allows conversion of live or backed-up databases into preservation formats such as SIARD, a XML-based format created for the purpose of database preservation. The toolkit also allows conversion of the preservation formats back into live systems to allow the full functionality of databases. For example, it supports a specialized export into MySQL, optimized for PhpMyAdmin, so the database can be fully experimented using a web interface.

This toolkit was originally part of the RODA[5] project and now has been released as a project by its own due to the increasing interest on this particular feature. It is now being further developed in the EARK project together with a new version of the SIARD preservation format.

The toolkit is created as a platform that uses input and output modules. Each module supports read and/or write to a particular database format or live system. New modules can easily be added by implementation of a new interface and adding of new drivers.[6]

Database preservation projects[edit]

In the past different research groups have contributed to the solutions of the problems of database preservation. Research projects carried out in the past in this regard include:

  • Software independent archival of relational databases (SIARD)[7]
  • Software Database Preservation Toolkit (open-source, supports SIARD 2.0)[6]
  • Repository of Authentic Digital Objects (RODA)[8]
  • Digital Preservation Testbed[9]
  • Lots of Copies Keep Stuff Safe (LOCKSS)[10]

See also[edit]

References[edit]