Digital preservation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Davissp (talk | contribs)
mNo edit summary
Davissp (talk | contribs)
→‎Digital Format Preservation Concerns: Replaced introductory paragraphs with Definition and Challenges. [per NDSA Working Group WikiProject]
Line 5: Line 5:




== Challenges of digital preservation ==
==Digital Format Preservation Concerns==
Society's heritage has been presented on many different materials, including stone, vellum, bamboo, silk, and paper. Now a large quantity of information exists in digital forms, including emails, blogs, social networking websites, national elections websites, web photo albums, and sites which change their content over time. According to an article by Brewster Kahle, in 1996 founder of [[Internet Archive]], "Preserving the Internet", Scientific American, the average life of a URL was, in 1997, 44 days.<ref>Brewster Kahle [http://web.archive.org/web/19980627072808/http://www.sciam.com/0397issue/0397kahle.html/ ''Preserving the Internet''. «Scientific American», 276 (1997), n. 3, p. 72-74.] retrieved on 2011-02-06</ref>


Society's heritage has been presented on many different materials, including stone, vellum, bamboo, silk, and paper. Now a large quantity of information exists in digital forms, including emails, blogs, social networking websites, national elections websites, web photo albums, and sites which change their content over time. With digital media it is easier to create content and keep it up-to-date, but at the same time there are many challenges in the preservation of this content, both technical and economic.
The unique characteristic of digital forms makes it easy to create content and keep it up-to-date, but at the same time brings many difficulties in the preservation of this content. [[Margaret Hedstrom]] points out that "...digital preservation raises challenges of a fundamentally different nature which are added to the problems of preserving traditional format materials."<ref>Hedstrom, M. (1997). Digital preservation: a time bomb for Digital Libraries. Retrieved on December 4th, 2007, from http://www.uky.edu/~kiernan/DL/hedstrom.html.</ref>


Unlike traditional analog objects such as books or photographs where the user has unmediated access to the content, a digital object always needs a software environment to render it. These environments keep evolving and changing at a rapid pace, threatening the continuity of access to the content.<ref>{{cite journal|last=Becker,C. et al.|title=Systematic planning for digital preservation|journal=International Journal on Digital Libraries|date= 2009 |issue=10|pages=pp.133–157|doi=10.1007/s00799-009-0057-1}}</ref> Physical storage media, data formats, hardware, and software all become obsolete over time, posing significant threats to the survival of the content.<ref name="Evans 2008"></ref> This process can be referred to as [[digital obsolescence]].
===Physical deterioration===
The media on which digital contents are stored are more vulnerable to deterioration and catastrophic loss than some analog media such as [[paper]]. While acid paper is prone to deterioration, becoming brittle and yellowing with age, the deterioration may not become apparent for some decades and progresses slowly. It remains possible to retrieve information without loss once deterioration is noticed. Digital data recording media may deteriorate more rapidly and once the deterioration starts, in most cases there may already be data loss. This characteristic of digital forms leaves a very short time frame for preservation decisions and actions.


In the case of born-digital content (e.g., institutional archives, Web sites, electronic audio and video content, born-digital photography and art, research data sets, observational data) the enormous and growing quantity of content presents significant scaling issues.
===Digital obsolescence===
{{Main|digital obsolescence}}
Another challenge is the issue of long-term access to data. Digital technology is developing quickly and retrieval and playback technologies can become obsolete in a matter of years. When faster, more capable and less expensive storage and processing devices are developed, older versions may be quickly replaced. When a software or decoding technology is abandoned, or a hardware device is no longer in production, records created with such technologies are at great risk of loss, simply because they are no longer accessible. This process is known as [[digital obsolescence]].


Digital content can often present challenges to preservation because of its complex and dynamic nature, e.g., interactive Web pages, virtual reality and gaming environments, learning objects, social media sites.<ref>{{cite conference |author=Arora, Jagdish |year= 2009 |title=Digital Preservation, an Overview. |booktitle= Proceedings of the National Seminar on Open Access to Textual and Multimedia Content: Bridging the Digital Divide, January 29-30, 2009 |date=2009 |page=111}}</ref>
This challenge is exacerbated by a lack of established standards, protocols and proven methods for preserving digital information.<ref>Levy, D. M. & Marshall, C. C. (1995). Going digital: a look at assumptions underlying digital libraries," Communications of the ACM, 58, No. 4: 77-84.</ref> We used to save copies of data on tapes, but media standards for tapes have changed considerably over the last five to ten years, and there is no guarantee that tapes will be readable in the future.<ref>Flugstad, Myron. (2007). Website Archiving: the Long-Term Preservation of Local Born Digital Resources. Arkansas Libraries v. 64 no. 3 (Fall 2007) p. 5-7</ref> Recovering these materials may require special tools <ref>{{Cite book| last =Ross | first =Seamus |last2 =Gow | first2 =Ann | publication-date = 1999| title = Digital archaeology? Rescuing Neglected or Damaged Data Resources | url = http://www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf| publication-place = Bristol & London | publisher = British Library and Joint Information Systems Committee |isbn = 1-900508-51-6| postscript =<!--None--> }}</ref> Hedstrom further explained that almost all digital library researches have been focused on "...architectures and systems for information organization and retrieval, presentation and visualization, and administration of intellectual property rights" and that "...digital preservation remains largely experimental and replete with the risks associated with untested methods".

The economic challenges of digital preservation are also great. Preservation programs require significant up front investment to create, along with ongoing costs for data ingest, data management, data storage, and staffing. One of the key strategic challenges to such programs is the fact that while they require significant current and ongoing funding, their benefits accrue largely to future generations.<ref>{{cite web|last=Blue Ribbon Task Force on Sustainable Digital Preservation and Access|title=Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information, final report|url=http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf|publisher=La Jolla, Calif. |date=2010 |page=35 |accessdate=July 5, 2012}}</ref>


==Strategies==
==Strategies==

Revision as of 21:24, 13 September 2012

Digital preservation can be understood as the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.[1] It combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.[2]


Challenges of digital preservation

Society's heritage has been presented on many different materials, including stone, vellum, bamboo, silk, and paper. Now a large quantity of information exists in digital forms, including emails, blogs, social networking websites, national elections websites, web photo albums, and sites which change their content over time. With digital media it is easier to create content and keep it up-to-date, but at the same time there are many challenges in the preservation of this content, both technical and economic.

Unlike traditional analog objects such as books or photographs where the user has unmediated access to the content, a digital object always needs a software environment to render it. These environments keep evolving and changing at a rapid pace, threatening the continuity of access to the content.[3] Physical storage media, data formats, hardware, and software all become obsolete over time, posing significant threats to the survival of the content.[2] This process can be referred to as digital obsolescence.

In the case of born-digital content (e.g., institutional archives, Web sites, electronic audio and video content, born-digital photography and art, research data sets, observational data) the enormous and growing quantity of content presents significant scaling issues.

Digital content can often present challenges to preservation because of its complex and dynamic nature, e.g., interactive Web pages, virtual reality and gaming environments, learning objects, social media sites.[4]

The economic challenges of digital preservation are also great. Preservation programs require significant up front investment to create, along with ongoing costs for data ingest, data management, data storage, and staffing. One of the key strategic challenges to such programs is the fact that while they require significant current and ongoing funding, their benefits accrue largely to future generations.[5]

Strategies

In 2006, the Online Computer Library Center developed a four-point strategy for the long-term preservation of digital objects that consisted of:

  • Assessing the risks for loss of content posed by technology variables such as commonly used proprietary file formats and software applications.
  • Evaluating the digital content objects to determine what type and degree of format conversion or other preservation actions should be applied.
  • Determining the appropriate metadata needed for each object type and how it is associated with the objects.
  • Providing access to the content.[6]

There are several additional strategies that individuals and organizations may use to actively combat the loss of digital information.

Refreshing

Refreshing is the transfer of data between two types of the same storage medium so there are no bitrate changes or alteration of data.[7] For example, transferring census data from an old preservation CD to a new one. This strategy may need to be combined with migration when the software or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing will likely always be necessary due to the deterioration of physical media.

Migration

Migration is the transferring of data to newer system environments (Garrett et al., 1996). This may include conversion of resources from one file format to another (e.g., conversion of Microsoft Word to PDF or OpenDocument), from one operating system to another (e.g., Windows to GNU/Linux) or from one programming language to another (e.g., C to Java) so the resource remains fully accessible and functional. Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.

The US National Archives Electronic Records Archives and Lockheed Martin are jointly developing a migration system that will preserve any type of document, created on any application or platform, and delivered to the archives on any type of digital media.[8] In the system, files are translated into flexible formats, such as XML; they will therefore be accessible by technologies in the future.[8] Lockheed Martin argues that it would be impossible to develop an emulation system for the National Archives ERA because the volume of records and cost would be prohibitive.[8]

Replication

Creating duplicate copies of data on one or more systems is called replication. Data that exists as a single copy in only one location is highly vulnerable to software or hardware failure, intentional or accidental alteration, and environmental catastrophes like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Replicated data may introduce difficulties in refreshing, migration, versioning, and access control since the data is located in multiple places.

Emulation

Emulation is the replicating of functionality of an obsolete system.[9] Examples include emulating an Atari 2600 on a Windows system or emulating WordPerfect 1.0 on a Macintosh. Emulators may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems, such as with the MAME project. The feasibility of emulation as a catch-all solution has been debated in the academic community. (Granger, 2000)

Raymond A. Lorie has suggested a Universal Virtual Computer (UVC) could be used to run any software in the future on a yet unknown platform.[10] The UVC strategy uses a combination of emulation and migration. The UVC strategy has not yet been widely adopted by the digital preservation community.

Jeff Rothenberg, a major proponent of Emulation for digital preservation in libraries, working in partnership with Koninklijke Bibliotheek and National Archief of the Netherlands, developed a software program called Dioscuri, a modular emulator that succeeds in running MS-DOS, WordPerfect 5.1, DOS games, and more.[11]

Metadata attachment

Metadata is data on a digital file that includes information on creation, access rights, restrictions, preservation history, and rights management.[12] Metadata attached to digital files may be affected by file format obsolescence. ASCII is considered to be the most durable format for metadata [13] because it is widespread, backwards compatible when used with Unicode, and utilizes human-readable characters, not numeric codes. It retains information, but not the structure information it is presented in. For higher functionality, SGML or XML should be used. Both markup languages are stored in ASCII format, but contain tags that denote structure and format.

Trustworthy digital objects

Digital objects that can speak to their own authenticity are called trustworthy digital objects (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic.[14] Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.

Digital sustainability

Digital sustainability encompasses a range of issues and concerns that contribute to the longevity of digital information.[15] Unlike traditional, temporary strategies, and more permanent solutions, digital sustainability implies a more active and continuous process. Digital sustainability concentrates less on the solution and technology and more on building an infrastructure and approach that is flexible with an emphasis on interoperability, continued maintenance and continuous development.[16] Digital sustainability incorporates activities in the present that will facilitate access and availability in the future.[17][18]

Digital preservation standards

To standardize digital preservation practice and provide a set of recommendations for preservation program implementation, the Reference Model for an Open Archival Information System (OAIS) was developed. The reference model (ISO 14721:2003) includes the following responsibilities that an OAIS archive must abide by:

  • Negotiate for and accept appropriate information from information Producers.
  • Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation.
  • Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided.
  • Ensure that the information to be preserved is Independently Understandable to the Designated Community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information.
  • Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, and which enable the information to be disseminated as authenticated copies of the original, or as traceable to the original.
  • Make the preserved information available to the Designated Community.[19]

OAIS is concerned with all technical aspects of a digital object’s life cycle: ingest into and storage in a preservation infrastructure, data management, accessibility, and distribution. The model also addresses metadata issues and recommends that five types of metadata be attached to a digital object: reference (identification) information, provenance (including preservation history), context, fixity (authenticity indicators), and representation (formatting, file structure, and what "imparts meaning to an object’s bitstream").[20]

Prior to Gladney's proposal of TDOs was the Research Library Group's (RLG) development of "attributes and responsibilities" that denote the practices of a "Trusted Digital Repository" (TDR) The seven attributes of a TDR are: "compliance with the Reference Model for an Open Archival Information System (OAIS), Administrative responsibility, Organizational viability, Financial sustainability, Technological and procedural suitability, System security, Procedural accountability." Among RLG’s attributes and responsibilities were recommendations calling for the collaborative development of digital repository certifications, models for cooperative networks, and sharing of research and information on digital preservation with regards to intellectual property rights.[21]

Digital sound preservation standards

In January 2004, the Council on Library and Information Resources (CLIR) hosted a roundtable meeting of audio experts discussing best practices, which culminated in a report delivered March 2006. This report investigated procedures for reformatting sound from analog to digital, summarizing discussions and recommendations for best practices for digital preservation. Participants made a series of recommendations for improving the practice of analog audio transfer for archiving.[22]

Updated technical guidelines on the creation and preservation of digital audio have been prepared by the International Association of Sound and Audiovisual Archives (IASA).[23]

Preservation Repository Assessment and Certification

A few of the major frameworks for digital preservation repository assessment and certification are described below. A more detailed list is maintained by the U.S. Center for Research Libraries: [24]

Specific tools and methodologies

TRAC

In 2007, CRL/OCLC published Trustworthy Repositories Audit & Certification: Criteria & Checklist (TRAC), a document allowing digital repositories to assess their capability to reliably store, migrate, and provide access to digital content. TRAC is based upon existing standards and best practices for trustworthy digital repositories and incorporates a set of 84 audit and certification criteria arranged in three sections: Organizational Infrastructure; Digital Object Management; and Technologies, Technical Infrastructure, and Security. [25]

TRAC provides tools for the audit, assessment, and potential certification of digital repositories, establishes the documentation requirements required for audit, delineates a process for certification, and establishes appropriate methodologies for determining the soundness and sustainability of digital repositories. [26]

DRAMBORA

DRAMBORA (Digital Repository Audit Method Based On Risk Assessment), introduced by the Digital Curation Centre (DCC) and Digital Preservation Europe (DPE) in 2007, offers a methodology and a toolkit for digital repository self-assessment.

The DRAMBORA process is arranged in six stages and concentrates on evaluation of likelihood and potential impact of risks on the repository. The auditor is required to describe and document the repository’s role, objectives, policies, activities and assets, in order to identify and assess the risks associated with these activities and assets and define appropriate measures to manage them. [27]

PLATTER

PLATTER (Planning Tool for Trusted Electronic Repositories) is a tool released by DigitalPreservationEurope (DPE) to help digital repositories in identifying their self-defined goals and priorities in order to gain trust from the stakeholders. [28]

PLATTER is intended to be used as a complementary tool to DRAMBORA, NESTOR, and TRAC. It is based on ten core principles for trusted repositories and defines nine Strategic Objective Plans, covering such areas as acquisition, preservation and dissemination of content, finance, staffing, succession planning, technical infrastructure, data and metadata specifications, and disaster planning. The tool enables repositories to develop and maintain documentation required for an audit. [27]: 49 

Examples of digital preservation initiatives

Large-scale digital preservation initiatives (LSDIs)

Many research libraries and archives have begun or are about to begin Large-Scale digital preservation initiatives (LSDIs). The main players in LSDIs are cultural institutions, commercial companies such as Google and Microsoft, and non-profit groups including the Open Content Alliance (OCA), the Million Book Project (MBP), and HathiTrust. The primary motivation of these groups is to expand access to scholarly resources.

LSDIs: library perspective

Approximately 30 cultural entities, including the 12-member Committee on Institutional Cooperation (CIC), have signed digitization agreements with either Google or Microsoft. Several of these cultural entities are participating in the Open Content Alliance (OCA) and the Million Book Project (MBP). Some libraries are involved in only one initiative and others have diversified their digitization strategies through participation in multiple initiatives. The three main reasons for library participation in LSDIs are: Access, Preservation and Research and Development. It is hoped that digital preservation will ensure that library materials remain accessible for future generations. Libraries have a perpetual responsibility for their materials and a commitment to archive their digital materials. Libraries plan to use digitized copies as backups for works in case they go out of print, deteriorate, or are lost and damaged.

See also

Footnotes

  1. ^ Digital Preservation Coalition (2008). "Introduction: Definitions and Concepts". Digital Preservation Handbook. York, UK. Retrieved 24 February 2012.
  2. ^ a b Evans, Mark; Carter, Laura. (December 2008). The Challenges of Digital Preservation. Presentation at the Library of Parliament, Ottawa.
  3. ^ Becker,C.; et al. (2009). "Systematic planning for digital preservation". International Journal on Digital Libraries (10): pp.133–157. doi:10.1007/s00799-009-0057-1. {{cite journal}}: |pages= has extra text (help); Explicit use of et al. in: |last= (help)
  4. ^ Arora, Jagdish (2009). "Digital Preservation, an Overview.". Proceedings of the National Seminar on Open Access to Textual and Multimedia Content: Bridging the Digital Divide, January 29-30, 2009. p. 111. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: date and year (link)
  5. ^ Blue Ribbon Task Force on Sustainable Digital Preservation and Access (2010). "Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information, final report" (PDF). La Jolla, Calif. p. 35. Retrieved July 5, 2012.
  6. ^ Online Computer Library Center, Inc. (2006). OCLC Digital Archive Preservation Policy and Supporting Documentation, p. 5
  7. ^ Cornell University Library. (2005) Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems.
  8. ^ a b c Reagan, Brad (2006). "The Digital Ice Age". Popular Mechanics.
  9. ^ Rothenberg, Jeff (1998). Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Washington, DC, USA: Council on Library and Information Resources. ISBN 1-887334-63-7.
  10. ^ Lorie, Raymond A. (2001). "Long Term Preservation of Digital Information". Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01). Roanoke, Virginia, USA. pp. 346–352. {{cite conference}}: External link in |title= (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
  11. ^ Hoeven, J. (2007). "Dioscuri: emulator for digital preservation". D-Lib Magazine. 13 (11/12). doi:10.1045/november2007-inbrief.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  12. ^ NISO Framework Advisory Group. (2007). A Framework of Guidance for Building Good Digital Collections, 3rd edition, p. 57,
  13. ^ National Initiative for a Networked Cultural Heritage. (2002). NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials
  14. ^ Gladney, H. M. (2004). "Trustworthy 100-year digital objects: Evidence after every witness is dead". ACM Transactions on Information Systems. 22 (3): 406–436. doi:10.1145/1010614.1010617.
  15. ^ Bradley, K. (Summer 2007). Defining digital sustainability. Library Trends v. 56 no 1 p. 148-163.
  16. ^ Sustainability of Digital Resources. (2008). TASI: Technical Advisory Service for Images.
  17. ^ Towards a Theory of Digital Preservation. (2008). International Journal of Digital Curation
  18. ^ Electronic Archives Preservation Policy
  19. ^ Consultative Committee for Space Data Systems. (2002). Reference Model for an Open Archival Information System (OAIS). Washington, DC: CCSDS Secretariat, p. 3-1
  20. ^ Cornell University Library. (2005) Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems
  21. ^ Research Libraries Group. (2002). Trusted Digital Repositories: Attributes and Responsibilities
  22. ^ Council on Library and Information Resources. Publication 137: Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for Transferring Analog Discs and Tapes March 2006 Retrieved Sept.6, 2012
  23. ^ IASA (2009). Guidelines on the Production and Preservation of Digital Audio Objects
  24. ^ "Center for Research Libraries - Other Assessment Tools". Retrieved Sept. 6, 2012. {{cite web}}: Check date values in: |accessdate= (help)
  25. ^ OCLC and CRL (2007). "Trustworthy Repository Audit & Certification: Criteria & Checklist" (PDF). Retrieved April 16, 2012.
  26. ^ Phillips, Stephen C (2010). "Service level agreements for storage and preservation, p.13". Retrieved May 1, 2012.
  27. ^ a b Ball, Alex (2010). "Preservation and Curation in Institutional Repositories (version 1.3)" (PDF). Edinburgh, UK: Digital Curation Centre. p. 48. Retrieved June 24, 2012.
  28. ^ DigitalPreservationEurope (2008). "DPE Repository Planning Checklist and Guidance DPED3.2" (PDF). Retrieved June 24,2012. {{cite web}}: Check date values in: |accessdate= (help)

References

External links

Template:Link GA