Database preservation

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics (context, content, structure, appearance and behaviour) of the data.

Preservation formats[edit]

SIARD[edit]

Software Independent Archiving of Relational Databases (SIARD) was developed by the Swiss Federal Archives, designed for archiving relational databases in a vendor-neutral form. A SIARD archive is a ZIP-based package of files based on XML and SQL:1999. A SIARD file incorporates not only the database content, but also machine-processable structural metadata that records the structure of database tables and their relationships. The ZIP file contains an XML file describing the database structure (metadata.xml) as well as a collection of XML files, one per table, capturing the table content. The SIARD archive may also contain text files and binary files representing database large objects (BLOBs and CLOBs). SIARD permits direct access to individual tables by exploring with ZIP tools. A SIARD archive is not an operational database but supports re-integration of the archived database into another relational database management system (RDBMS) that supports SQL:1999. In addition, SIARD supports the addition of descriptive and contextual metadata that is not recorded in the database itself and the embedding of documentation files in the archive.[1]

A version of SIARD preservation format was designed and developed by the EARK project.[2]

DBML (Database Markup Language)[edit]

A XML schema was created by researcher José Carlos Ramalho from the University of Minho to capture table information and data from a relational database. It was published in 2007.[3]

Software[edit]

Database Preservation Toolkit[edit]

The Database Preservation Toolkit (DBPTK) allows conversion between database formats, including connection to live systems, for purposes of digitally preserving databases. The toolkit allows conversion of live or backed-up databases into preservation formats such as SIARD, a XML-based format created for the purpose of database preservation. The toolkit also allows conversion of the preservation formats back into live systems to allow the full functionality of databases. For example, it supports a specialized export into MySQL, optimized for PhpMyAdmin, so the database can be fully experimented using a web interface.

This toolkit was originally part of the RODA project[4] and then released on its own. It has been further developed in the EARK project together with a new version of the SIARD preservation format.

The toolkit uses input and output modules. Each module supports read and/or write to a particular database format or live system. New modules can easily be added by implementation of a new interface and adding of new drivers.[5]

Database preservation projects[edit]

Research projects this regard include:

See also[edit]

References[edit]

  1. ^ "SIARD (Software Independent Archiving of Relational Databases) Version 1.0". 30 May 2015. 
  2. ^ Pardo, Chris. "Home - E-Ark Project". 
  3. ^ José Carlos Ramalho, Miguel Ferreira, Luís Faria, and Rui Castro (August 7, 2007). "Relational Database Preservation through XML modelling" (PDF). Extreme Markup Languages. Retrieved April 16, 2017. 
  4. ^ "RODA Community - Repository of Authentic Digital Objects". 
  5. ^ a b "db-preservation-toolkit by keeps". 
  6. ^ Heuscher, Stephan; Jaermann, Stephan; Keller-Marxer, Peter; Moehle, Frank (2004). "Providing Authentic Long-term Archival Access to Complex Relational Data". Proceedings PV-2004: Ensuring the Long-Term Preservation and Adding Value to the Scientific and Technical Data, 5-7 October 2004. pp. 241–261. arXiv:cs/0408054Freely accessible. Bibcode:2004cs........8054H. 
  7. ^ "RODA and Crib: A Service-Oriented Digital Repository" (PDF). 
  8. ^ "Duurzaam beheer van digitaal archiefmateriaal - Nationaal Archief" (PDF). 
  9. ^ "LOCKSS - Lots of Copies Keep Stuff Safe". Stanford University. Retrieved April 16, 2017.