Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics (context, content, structure, appearance and behaviour) of the data.
Version 1.0 of the Software Independent Archiving of Relational Databases (SIARD) format was developed by the Swiss Federal Archives in 2007. It was designed for archiving relational databases in a vendor-neutral form. A SIARD archive is a ZIP-based package of files based on XML and SQL:1999. A SIARD file incorporates both the database content and also machine-processable structural metadata that records the structure of database tables and their relationships. The ZIP file contains an XML file describing the database structure (metadata.xml) as well as a collection of XML files, one per table, capturing the table content. The SIARD archive may also contain text files and binary files representing database large objects (BLOBs and CLOBs). SIARD permits direct access to individual tables by exploring with ZIP tools. A SIARD archive is not an operational database but supports re-integration of the archived database into another relational database management system (RDBMS) that supports SQL:1999. In addition, SIARD supports the addition of descriptive and contextual metadata that is not recorded in the database itself and the embedding of documentation files in the archive. SIARD Version 1.0 was homologized as standard eCH-0165 in 2013.
Version 2.0 of the SIARD preservation format was designed and developed by the Swiss Federal Archives under the auspices of the E-ARK project. Version 2.0 is based on version 1.0 and defines a format that is backwards-compatible with version 1.0. New features in version 2.0 include:
- An upgrade of SQL:1999 support to SQL:2008 support
- Support for all SQL:2008 types, in particular user-defined data types (UDTs)
- More explicit validation rules for data type definitions using regular expressions
- Support for storing large objects outside of the SIARD file using “file:”URIs
- Support for “deflate” as a compression mechanism.
DBML (Database Markup Language)
A XML schema was created by researcher José Carlos Ramalho from the University of Minho to capture table information and data from a relational database. It was published in 2007.
Database Preservation Toolkit
The Database Preservation Toolkit (DBPTK) allows conversion between database formats, including connection to live systems, for purposes of digitally preserving databases. The toolkit allows conversion of live or backed-up databases into preservation formats such as SIARD, an XML-based format created for the purpose of database preservation. The toolkit also allows conversion of the preservation formats back into live systems to allow the full functionality of databases. For example, it supports a specialized export into MySQL, optimized for PhpMyAdmin, so the database can be fully experimented using a web interface.
This toolkit was originally part of the RODA project and then released on its own. It has been further developed in the E-ARK project together with a new version of the SIARD preservation format.
The toolkit uses input and output modules. Each module supports read and/or write to a particular database format or live system. New modules can easily be added by implementation of a new interface and adding new drivers.
Database preservation projects
Research projects this regard include:
- Software independent archival of relational databases (SIARD)
- Software Database Preservation Toolkit (open-source, supports SIARD 2.0)
- Repository of Authentic Digital Objects (RODA)
- Digital Preservation Testbed
- Lots of Copies Keep Stuff Safe (LOCKSS) project was led by libraries at Stanford University.
- "SIARD (Software Independent Archiving of Relational Databases) Version 1.0". 30 May 2015.
- "E-ARK Project".
- José Carlos Ramalho, Miguel Ferreira, Luís Faria, and Rui Castro (August 7, 2007). "Relational Database Preservation through XML modelling" (PDF). Extreme Markup Languages. Retrieved April 16, 2017.CS1 maint: Uses authors parameter (link)
- "RODA Community - Repository of Authentic Digital Objects".
- "db-preservation-toolkit by keeps".
- Heuscher, Stephan; Jaermann, Stephan; Keller-Marxer, Peter; Moehle, Frank (2004). "Providing Authentic Long-term Archival Access to Complex Relational Data". Proceedings PV-2004: Ensuring the Long-Term Preservation and Adding Value to the Scientific and Technical Data, 5-7 October 2004. pp. 241–261. arXiv:cs/0408054. Bibcode:2004cs........8054H.
- "RODA and Crib: A Service-Oriented Digital Repository" (PDF).
- "Duurzaam beheer van digitaal archiefmateriaal - Nationaal Archief" (PDF).
- "LOCKSS - Lots of Copies Keep Stuff Safe". Stanford University. Retrieved April 16, 2017.