Jump to content

Web archive file

From Wikipedia, the free encyclopedia

A web archive file is an archive file that contains all resources necessary to display a web page, including the base HTML as well as images, audio, video, CSS, scripts, etc. Some web archive formats can store more than one web page, such as the Mozilla Archive Format.

Known formats

[edit]
Name Filename extension Description
Mozilla Archive Format (MAFF) .maff A legacy, open file format for Firefox[1] used to store one or more web pages with their associated resources into a single ZIP file.[2][3] The Mozilla extension that implements MAFF supports versions of Firefox from 2007 to 2017 but not later, and there are no plans to update it.[4]
Microsoft Compiled HTML Help .chm A legacy, proprietary format originally developed to for online help purposes. It can store multiple web pages, all their associated resources, as well as a table of contents for navigating said web pages, and an index to facilitate searching within the CHM contents. CHM files use the LZX compression scheme. CHM files are sometimes used for e-books; Microsoft Reader's .lit format is a modification of the CHM format.[5]
MHTML .mht, .mhtml A proposed open standard to store a single HTML file as well as all its associated resources.[6] MHTML is in plain text. It uses the Base64 binary-to-text encoding to store binary resources such as images. Most modern web browser based on Chromium support this format.
Webarchive .webarchive The web archive format of the Safari web browser, it can store a single HTML file and its associated resources.
WARC .warc An ISO standard that specifies a method for combining multiple digital resources into an aggregate archive file together with related information. These combined resources are saved as a WARC file which can be replayed on appropriate software, or utilized by archive websites such as the Wayback Machine. WARC is the successor of Internet Archive's ARC_IA File Format that has traditionally been used to store "web crawls" as sequences of content blocks.[7]
EPUB .epub An e-book format, it can store multiple HTML files and their associated resources inside a ZIP file.

References

[edit]
  1. ^ "Firefox Addon: MAF – Mozilla Archive Format". Archived from the original on 2 November 2017. Retrieved 9 September 2013.
  2. ^ "About the MAFF file format". maf.mozdev.org. 2011. Retrieved 16 November 2011.
  3. ^ "Mozilla Archive Format". maf.mozdev.org. 2011. Retrieved 16 November 2011.
  4. ^ "Known issues with the Mozilla Archive Format add-on". Retrieved 13 August 2017.
  5. ^ Salomon, David; Motta, Giovanni; Bryant, David (CON) (2009). Handbook of Data Compression (5th, illustrated ed.). Springer. ISBN 978-1-84882-902-2.
  6. ^ "What is MHT/MHTML File Extension?". SysTools. Retrieved 3 November 2019.
  7. ^ "ARC_IA, Internet Archive ARC file format". www.digitalpreservation.gov. 14 February 2008. Retrieved 2015-05-09.