Internet Memory Foundation

From Wikipedia, the free encyclopedia
  (Redirected from Internet memory)
Jump to: navigation, search
Internet Memory Foundation
Type Non-profit foundation
Industry Web archiving and preservation
Founded 2004 as European Archive 2010 as Internet Memory
Headquarters Amsterdam, The Netherlands
Key people Julien Masanès (Director) See also The Board
Website Official website

The Internet Memory Foundation (formerly the European Archive Foundation) is a non profit foundation whose purpose is archiving web content, it supports projects and research which include the preservation and protection of multimedia content. Its archives form a digital library of cultural content.

Contents

History [edit]

The non-profit institution was officially launched as the European Archive Foundation during the opening of the Cross Media Week in Amsterdam (September 2006). European Archive[1] Foundation creation was due to several Web personalities, including Brewster Kahle, who is also the founder of Internet Archive.[citation needed] Operating from Amsterdam and Paris, it was collecting and making freely accessible public domain collections (Refer to 3-1- Audio and Video collection) and large web archives.

In December 2010, the Foundation has changed its name, from European Archive to Internet Memory. This new name emphasizes clearly its mission and gives it a wider scope of activities in this field.

In 2011, the foundation archives dozens of Terabytes of data per months and develops several technologies to support the growth and use of the Internet Memory. The foundation is a member of the IIPC (International Internet Preservation Consortium)[2] and has developed collaborations, both with cultural institutions (The UK National Archives, CERN, The National Library of Ireland...) and research team (Max Planck Institute, TU Berlin, University of Southampton, Institut Telecom Paris Tech...) to fulfil its mission.

In the framework of the European Research Project Living Web Archives, IM carried out a survey on Web archiving among 360 European and International institutions. The aim of this survey was to have a clearer understanding of problems encountered in the field of Internet archiving.

Features [edit]

Collaborative Research Projects [edit]

The foundation is involved in several research projects to improve technologies of web-scale crawling, data extraction, text mining, preservation… to support the growth and use of the Internet memory.

  • Living Web Archives[3] funded by the European Commission[4](LiWA – project N°216267). The result of LiWA’s work (from February 2008 to January 2011) is a set of next generation Web archiving methods and tools making possible the creation and long-term usability of high-quality Web archives.
  • LivingKnowledge[5] funded by the European Commission (LK – project N°231126). The goal is to improve navigation and search in very large multimodal datasets.
  • Longitudinal Analytics of Web Archive data[6] funded by the European Commission (LAWA – Project N°258105). Major objective for IM is the design and implementation of a new architecture for Web scale crawling (billions of resources) and the storage of collected documents in an sophisticated data repository able to serve complex searches on Web scale datasets.
  • Collect-All ARchives to COmmunity MEMories[7] funded by the European Commission (ARCOMEM – project N°270239). It aims at reducing the risk of losing irreplaceable ephemeral web information, facilitating cost-efficient and effective archive creation, and supporting the creation of more valuable archives.
  • SCAlable Preservation Environments[8] funded by the European Commission (SCAPE – project N°270137). It will develop scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects.

Tools [edit]

The Web crawler currently used is Heritrix version 3. Heritrix generates resources stored in a “container”, the ARC file (.arc). The ARC file has been extended to the WARC (Web ARChive) file format (.warc), which was approved as an international standard in June 2009 (ISO 28500:2009).

Collections [edit]

Audio and Video collection [edit]

Before focusing on web archiving, the European Archive Foundation has collected one of the largest online free classical music collections (more than 800 pieces, from Mozart to Dvorak) and Public Information Films from the British Government, made in collaboration with the Netherlands Institute for Sound and Vision and the UK National Archives.

Selective web collection [edit]

The foundation has already archived a snapshot of the Italian web domain, made in collaboration with the National Library of Italy, an archive of political websites of the 25 EU member states captured during the European constitutional debate, and archives regularly governmental UK websites in collaboration with The National Archives of UK, CERN website,… Among others:

See also [edit]

References [edit]

External links [edit]