Jump to content

ZIP (file format): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted edits by 190.34.149.117 (talk) identified as unconstructive (HG)
Replaced content with '{'
Line 1: Line 1:
{
{{Use dmy dates|date=August 2012}}
{{Redirect|unzip|the program|Info-ZIP|the inverse to convolution|Convolution (computer science)}}
{{cleanup|date=September 2010}}
{{Infobox file format
| name = Zip
| icon =
| screenshot =
| caption =
| extension = .zip<br />.zipx</code>&nbsp;(newer compression algorithms)
| mime = application/zip<ref name=iana>{{citation |url=http://www.iana.org/assignments/media-types/application/zip |title=Registration of a new MIME Content-Type/Subtype - application/zip |publisher=[[Internet Assigned Numbers Authority|IANA]] |date=20 July 1993 |accessdate=2012-01-05}}</ref>
| type code =
| uniform type = com.pkware.zip-archive
| magic = none, though <code>PK\003\004</code> , <code>PK\005\006</code> (empty archive), or <code>PK\007\008</code> (spanned archive) are common.
| owner = [[Phil Katz]], [[PKWARE]]
| released = 1989<!-- {{Start date and age|YYYY|mm|dd|df=yes/no}} -->
| latest release version = 6.3.2
| latest release date = {{Start date and age|2007|09|28|df=yes}}
| genre = [[Data compression]]
| container for =
| contained by =
| extended from =
| extended to = [[JAR (file format)|JAR]] <small>([[EAR (file format)|EAR]], [[Resource Adapter|RAR (Java)]], [[WAR (Sun file format)|WAR]])</small><br />[[Office Open XML]] (Microsoft)<br />[[Open Packaging Conventions]]<br />[[OpenDocument]] (ODF)<br />[[XPI]] (Mozilla extensions)
| standard = [http://www.pkware.com/documents/casestudies/APPNOTE.TXT APPNOTE] from PKWARE
}}

'''Zip''' is a [[file format]] used for [[data compression]] and [[file archiver|archiving]]. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is. The zip file format permits a number of compression [[algorithms]]. The format was originally created in 1989 by [[Phil Katz]], and was first implemented in [[PKWARE]]'s [[PKZIP]] utility,<ref>{{cite news | title = Phillip Katz, Computer Software Pioneer, 37 | publisher = The New York Times | date = 1 May 2000 | url = http://www.nytimes.com/2000/05/01/us/phillip-katz-computer-software-pioneer-37.html | accessdate = 2009-06-14}}</ref> as a replacement for the previous [[ARC (file format)|ARC]] compression format by [[Thom Henderson]]. The zip format is now supported by many software utilities other than PKZIP. Microsoft has included built-in zip support (under the name "compressed folders") in versions of [[Microsoft Windows]] since 1998. Apple has included built-in zip support in [[Mac&nbsp;OS&nbsp;X]] 10.3 (via BOMArchiveHelper, now [[Archive Utility]]) and later.

Zip files generally use the [[file extension]]s ".zip" or ".ZIP" and the [[MIME]] media type <code>application/zip</code>.<ref name=iana/> Zip is used as a base file format by many programs, usually under a different name. Zip files are often [[icon (computing)|represented]] by a document or other object prominently featuring a [[zipper]].

== History ==

The zip file format was created by Phil Katz of [[PKWARE, Inc|PKWARE]]. He created the format after his company had [[ARC (file format)#Lawsuits|lawsuits]] filed against him by Systems Enhancement Associates (SEA) claiming that his archiving products were derivatives of SEA's [[ARC (file format)|ARC]] archiving system. The name "zip" (meaning "move at high speed") was suggested by Katz's friend, Robert Mahoney. They wanted to imply that their product would be faster than [[ARC (file format)|ARC]] and other compression formats of the time. The earliest known version of ''.ZIP File Format Specification'' was first published as part of [[PKZIP]] 0.9 package under the file APPNOTE.TXT in 1989.

The zip file format was released into the [[public domain]],<ref>{{citation |url=http://brianlivingston.com/eweek/article2/0,4149,1257562,00.html |title=PKZip Must Open Up |author=Brian Livingston |quote=The ZIP file format is given freely into the public domain and can be claimed neither legally nor morally by any individual, entity or company |date=8 September 2003 |accessdate=2012-01-05}}</ref><ref>{{citation |url=http://www.idcnet.us/ziphistory.html |title=WHERE DID ZIP FILES COME FROM ANYWAY ? |publisher=Infinity Design Concepts, Inc. |accessdate=2012-01-05}}</ref><ref>{{citation |url=http://cd.textfiles.com/pcmedic9310/MAIN/MISC/COMPRESS/ZIP.PRS |title=PRESS RELEASE |year=1989 |accessdate=2012-01-05}}</ref><ref>{{citation |url=http://www.pkware.com/about-us/phil-katz |title=Our Founder - Phil Katz |publisher=PKWARE |accessdate=2012-01-05}}</ref><ref>{{citation |url=http://mailman.vse.cz/pipermail/sc34wg1study/2010-November/000082.html |title=sc34-wg1 |author=Gareth Horton, Rob Weir, Alex Brown |date=2 November 2010 |accessdate=2012-01-05}}</ref> but some ZIP features are covered by patents or pending patents.<ref>{{citation |url=http://www.pkware.com/support/zip-app-note/ |title=.ZIP Application Note |publisher=PKWARE |quote=Some ZIP technology is covered by patents or pending patents. |accessdate=2012-01-05}}</ref>

=== Version history ===

The .ZIP File Format Specification has its own version number, which does not necessarily correspond to the version numbers for the PKZIP tool, especially with PKZIP 6 or later. At various times, PKWARE has added preliminary features that allow PKZIP products to extract archives using advanced features, but PKZIP products that create such archives are not made available until the next major release. Other companies or organizations support the PKWARE specifications at their own pace.

The ZIP file format specification is formally named "APPNOTE - .ZIP File Format Specification" and it is published on the PKWARE.com website since the late 1990s.<ref>{{citation |url=http://www.pkware.com/support/zip-app-note/ |title= .ZIP Application Note |accessdate=2012-07-20}}</ref> Several versions of the specification were not published. Specifications of some features such as BZIP2 compression, strong encryption specification and others were published by PKWARE a few years after their creation. The URL of the online specification was changed several times on the PKWARE website.

A summary of key advances in various versions of the PKWARE specification:
*2.0: (1993)<ref name=iana/> File entries can be compressed with DEFLATE and use traditional PKWARE encryption.
*2.1: (1996) Deflate64 compression
*4.5: (2001)<ref>{{citation |url=http://www.pkware.com/documents/APPNOTE/APPNOTE-4.5.0.txt |title=APPNOTE.TXT - .ZIP File Format Specification, Version: 4.5 |date=1 November 2001 |accessdate=2012-01-05}}</ref><ref>{{citation |url=http://www.pkware.com/support/appnote.txt |archiveurl=http://web.archive.org/web/20011203085830/http://www.pkware.com/support/appnote.txt |archivedate=3 December 2001 |title=File: APPNOTE.TXT - .ZIP File Format Specification Version: 4.5 Revised: 11/01/2001 |date=3 December 2001 |accessdate=2012-04-21}}</ref> Documented 64-bit zip format.
*4.6: (2001) BZIP2 compression (not published online until the publication of APPNOTE 5.2)
*5.0: (2002) DES, Triple DES, RC2, RC4 supported for encryption (not published online until the publication of APPNOTE 5.2<ref>{{citation |url=http://www.pkware.com/products/enterprise/white_papers/appnote.html |archiveurl=http://web.archive.org/web/20030417163916/http://www.pkware.com/products/enterprise/white_papers/appnote.html |archivedate=17 April 2003 |title=File: APPNOTE.TXT - .ZIP File Format Specification Version: 4.5 Revised: 11/01/2001 |date=17 April 2003 |accessdate=2012-04-21}}</ref>)
*5.2: (2003)<ref>{{citation |url=http://www.pkware.com/documents/APPNOTE/APPNOTE-5.2.0.txt |title=APPNOTE.TXT - .ZIP File Format Specification, Version: 5.2 - NOTIFICATION OF CHANGE |date=16 July 2003 |accessdate=2012-01-05}}</ref><ref>{{citation |url=http://pkware.com/products/enterprise/white_papers/appnote.html |archiveurl=http://web.archive.org/web/20030702014023/http://pkware.com/products/enterprise/white_papers/appnote.html |archivedate=2 July 2003 |title=File: APPNOTE.TXT - .ZIP File Format Specification Version: 5.2 - NOTIFICATION OF CHANGE Revised: 06/02/2003 |date=2 July 2003 |accessdate=2012-04-21}}</ref> AES encryption support (defined in APPNOTE 5.1 that was not published online), corrected version of RC2-64 supported for encryption.
*6.1: (2004)<ref>{{citation |url=http://www.pkware.com/company/standards/appnote/ |archiveurl=http://web.archive.org/web/20040819182806/http://www.pkware.com/company/standards/appnote/ |archivedate=19 August 2004 |title=File: APPNOTE - .ZIP File Format Specification Version: 6.1.0 - NOTIFICATION OF CHANGE Revised: 01/20/2004 |date=19 August 2004 |accessdate=2012-04-21}}</ref> Documented certificate storage.
*6.2.0: (2004)<ref>{{citation |url=http://www.pkware.com/documents/APPNOTE/APPNOTE-6.2.0.txt |title=APPNOTE.TXT - .ZIP File Format Specification, Version: 6.2.0 - NOTIFICATION OF CHANGE |date=26 April 2004 |accessdate=2012-01-05}}</ref> Documented Central Directory Encryption.
*6.3.0: (2006)<ref>{{citation |url=http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT |title=APPNOTE.TXT - .ZIP File Format Specification, Version: 6.3.0 |date=29 September 2006 |accessdate=2012-01-05}}</ref> Documented Unicode (UTF-8) filename storage. Expanded list of supported hash, compression (LZMA, PPMd+), encryption algorithms.
*6.3.1: (2007)<ref>{{citation |url=http://www.pkware.com/documents/casestudies/APPNOTE.TXT |archiveurl=http://web.archive.org/web/20070514200623/http://www.pkware.com/documents/casestudies/APPNOTE.TXT |archivedate=14 May 2007 |title=File: APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.1 Revised: April 11, 2007 |date=14 May 2007 |accessdate=2012-04-21}}</ref> Corrected standard hash values for SHA-256/384/512.
*6.3.2: (2007)<ref>{{citation |url=http://www.pkware.com/documents/casestudies/APPNOTE.TXT |archiveurl=http://web.archive.org/web/20070928174718/http://www.pkware.com/documents/casestudies/APPNOTE.TXT |archivedate=28 September 2007 |title=File: APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.2 Revised: September 28, 2007 |date=28 September 2007 |accessdate=2012-04-21}}</ref> Documented compression method 97 ([[WavPack]]).
*6.3.3: (2012)<ref>{{citation |url=http://www.pkware.com/documents/casestudies/APPNOTE.TXT |title=File: APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.3 Revised: September 01, 2012 |date=01 September 2012 }}</ref> Document formatting changes to facilitate referencing the PKWARE Application Note from other standards using methods such as the JTC 1 Referencing Explanatory Report (RER) as directed by JTC 1/SC 34 N 1621.

[[WinZip]], starting with version 12.1, uses the extension <tt>.zipx</tt> for zip files that use compression methods newer than DEFLATE; specifically, methods BZip, LZMA, PPMd, Jpeg and Wavpack. The last 2 are applied to appropriate file types when "Best method" compression is selected.<ref>{{cite web | url = http://www.winzip.com/comp_info.htm | title = Additional Compression Methods Specification | work = WinZip | publisher = [[WinZip]] Computing, S.L | location = [[Mansfield, CT]] | date = 19 May 2009 | accessdate = 2009-05-24}}</ref><ref>{{cite web |url=http://kb.winzip.com/kb/entry/7/ |title=What is a Zipx File? |work=Winzip: Knowledgebase |publisher=[[WinZip]] Computing, S.L |location=[[Mansfield, CT]] |date=13 August 2010 |accessdate=17 August 2010}}</ref>

=== Standardization ===

In April 2010 [[JTC 1|ISO/IEC JTC 1]] initiated a ballot to determine whether a project should be initiated to create an ISO/IEC International Standard format compatible with zip.<ref>http://www.itscj.ipsj.or.jp/sc34/open/1414.pdf</ref> The proposed project, entitled ''Document Packaging'' envisaged a zip-compatible 'minimal compressed archive format' suitable for use with a number of existing standards including [[OpenDocument]], [[Office Open XML]] and [[EPUB]].

In July, 2010 the ballot for initiating this project failed to pass an international vote and was rejected through ISO/IEC JTC 1/SC 34 N 1461. Comments against this project cited the recognition that an existing published work on the zip format has been in existence for over 18 years in the form of the PKWARE APPNOTE, recommending instead "for JTC 1 to approve the ZIP Application Note as a Referenced Specification (RS) per Annex N of the currently published JTC 1 Directives".

This ballot did approve a request for the formation of a study period for the purpose of seeking wider input regarding this core technology. The study period, which began in October 2010, brought together a number of international experts to discuss using ZIP within international standards. In March, 2011 this group presented to JTC 1 a new recommendation on how to incorporate ZIP within international standards.

Acknowledging the broad interoperability that the ZIP format has achieved the study group concluded in their recommendation that "the best way to achieve our technical objectives is to have PKWARE continue its maintenance of the ZIP Application Note." The recommendations drafted by this study group were presented for balloting as ISO/IEC JTC 1/SC 34 N 1621<ref>http://www.itscj.ipsj.or.jp/sc34/open/1621.pdf</ref> in July, 2011 and was approved by an international vote.

Proposal N 1621 directs international standards that use ZIP to "not duplicate or contradict the provisions of PKWARE's ZIP Application Note, [and to] reference the ZIP Application Note's capabilities via an external normative reference to the latest version of the ZIP Application Note." Standards using ZIP should include a JTC 1 Referencing Explanatory Report (RER) when referencing the PKWARE Application Note.

A provision of N 1621 included an option for drafting a profile standard for referencing ZIP. This profile could be used by other international standards that use ZIP to avoid having to write their own RER document where similar use of ZIP may exist. At this time, no standards that use ZIP have requested this profile.

There is a new proposed standard in ISO/IEC JTC1 [[International Organization for Standardization#Standardization process|standardization process]] under the name ''ISO/IEC NP 21320-1 - Information technology -- Document Container File -- Part 1: Core''.<ref>{{citation |url=http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=60101 |title=ISO/IEC NP 21320-1 - Information technology -- Document Container File -- Part 1: Core |date=2 August 2011 |accessdate=2012-01-05}}</ref>

== Design ==

Zip is a simple archive format that stores multiple files. Zip allows contained files to be compressed using many different methods, as well as simply storing a file without compressing it. Each file is stored separately, allowing different files in the same archive to be compressed using different methods.

A directory is placed at the end of a zip file. This identifies what files are in the zip and identifies where in the zip that file is located. This allows zip readers to load the list of files without reading the entire zip archive. Zip archives can also include extra data that is not related to the zip archive. This allows for zip archives to be made into self-extracting archives, applications that decompress their contained data, by including the program code in a zip archive and marking the file as executable. On the other hand, it also allows for an innocuous file, such as a GIF image file, to hide malicious code by making the file a zip archive.

The zip format uses a 32-bit CRC algorithm and includes two copies of the directory structure of the archive to provide greater protection against data loss.

=== Structure ===

A zip file is identified by the presence of a ''central directory'' which is located at the end of the structure in order to allow the appending of new files. The central directory stores a list of the names of the entries (files or directories) stored in the zip file, along with other metadata about the entry, and an offset into the zip file, pointing to the actual entry data. This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files. The entries in the zip file also include this information for redundancy.

The order of the file entries in the directory need not coincide with the order of file entries in the archive.

Each entry is introduced by a local header with information about the file such as the comment, file size and file name, followed by optional "extra" data fields, and then the possibly compressed, possibly encrypted file data. The "Extra" data fields are the key to the extensibility of the zip format. "Extra" fields are exploited to support the ZIP64 format, WinZip-compatible AES encryption, file attributes, and higher-resolution NTFS or Unix file timestamps. Other extensions are possible via the "Extra" field. Zip tools are required by the specification to ignore Extra fields they do not recognize.

[[File:ZIP-64 Internal Layout.svg|thumb|ZIP-64 Internal Layout]]

The zip format uses specific 4-byte "signatures" to denote the various structures in the file. Each file entry is marked by a specific signature. The beginning of the central directory is indicated with a different signature, and each entry in the central directory is marked with yet another particular 4-byte signature.

There is no BOF or EOF marker in the zip specification. Often the first thing in a zip file is a zip entry, which can be identified easily by its signature. But it is not necessarily the case that a zip file begins with a zip entry, and is not required by the zip specification.

Tools that correctly read zip archives must scan for the signatures of the various fields, the zip central directory. They must not scan for entries because only the directory specifies where a file chunk starts. Scanning could lead to false positives, as the format doesn't forbid other data to be between chunks, or uncompressed stream containing such signatures.

Most of the signatures end with the short integer 0x4b50 (read as a little-endian number) which when viewed as an ASCII string the hexadecimal 50 4B read "PK" the initials of the inventor Phil Katz. This means when a ZIP file is viewed in a text editor the first two bytes of the file are "PK". (A self-extracting ZIP has an [[EXE]] before the ZIP so would start with "MZ".)

The zip specification also supports spreading archives across multiple filesystem files. Originally intended for storage of large zip files across multiple 1.44 MB [[floppy disk]]s, this feature is now used for sending zip archives in parts over email, or over other transports or removable media.

The [[File Allocation Table|FAT filesystem]] of DOS has a timestamp resolution of only two seconds; zip file records mimic this. As a result, the built-in timestamp resolution of files in a zip archive is only two seconds, though extra fields can be used to store more accurate timestamps. The zip format has no notion of [[time zone]], so timestamps are only meaningful if it is known what time zone they were created in.

In September 2007, PKWARE released a revision of the zip specification that contains a provision to store file names using [[UTF-8]], finally adding Unicode compatibility to zip.<ref name="appnote">http://www.pkware.com/documents/casestudies/APPNOTE.TXT</ref>

=== File headers ===

All multi-byte values in the header are stored in [[little-endian]] byte order. All length fields count the length in bytes.

{|class="wikitable"
|+ Local file header
|-
! Offset !! Bytes !! Description<ref name="appnote"/>
|-
|  0 || 4 || Local file header signature = 0x04034b50 (read as a little-endian number)
|-
|  4 || 2 || Version needed to extract (minimum)
|-
|  6 || 2 || General purpose bit flag
|-
|  8 || 2 || Compression method
|-
| 10 || 2 || File last modification time
|-
| 12 || 2 || File last modification date
|-
| 14 || 4 || CRC-32
|-
| 18 || 4 || Compressed size
|-
| 22 || 4 || Uncompressed size
|-
| 26 || 2 || File name length (''n'')
|-
| 28 || 2 || Extra field length (''m'')
|-
| 30 || ''n'' || File name
|-
| 30+''n'' || ''m'' || Extra field
|}

The extra field contains a variety of optional data such as OS-specific attributes. It is divided into chunks, each with a 16-bit ID code and a 16-bit length.

This is immediately followed by the compressed data.

If bit 3 (0x08) of the general-purpose flags field is set, then the CRC-32 and file sizes are not known when the header is written. The fields in the local header are filled with zero, and the CRC-32 and size are appended in a 12-byte structure (optionally preceded by a 4-byte signature) immediately after the compressed data:

{|class="wikitable"
|+ Data descriptor
|-
! Offset !! Bytes !! Description<ref name="appnote"/>
|-
|  0 || 0/4 || ''Optional'' data descriptor signature = 0x08074b50
|-
|  0/4 || 4 || CRC-32
|-
|  4/8 || 4 || Compressed size
|-
|  8/12 || 4 || Uncompressed size
|}

The central directory entry is an expanded form of the local header:
{|class="wikitable"
|+ Central directory file header
|-
! Offset !! Bytes !! Description<ref name="appnote"/>
|-
|  0 || 4 || Central directory file header signature = 0x02014b50
|-
|  4 || 2 || Version made by
|-
|  6 || 2 || Version needed to extract (minimum)
|-
|  8 || 2 || General purpose bit flag
|-
| 10 || 2 || Compression method
|-
| 12 || 2 || File last modification time
|-
| 14 || 2 || File last modification date
|-
| 16 || 4 || CRC-32
|-
| 20 || 4 || Compressed size
|-
| 24 || 4 || Uncompressed size
|-
| 28 || 2 || File name length (''n'')
|-
| 30 || 2 || Extra field length (''m'')
|-
| 32 || 2 || File comment length (''k'')
|-
| 34 || 2 || Disk number where file starts
|-
| 36 || 2 || Internal file attributes
|-
| 38 || 4 || External file attributes
|-
| 42 || 4 || Relative offset of local file header. This is the number of bytes between the start of the first disk on which the file occurs, and the start of the local file header. This allows software reading the central directory to locate the position of the file inside the ZIP file.
|-
| 46 || ''n'' || File name
|-
| 46+''n'' || ''m'' || Extra field
|-
| 46+''n''+''m'' || ''k'' || File comment
|}

After all the central directory entries comes the end of central directory record, which marks the end of the ZIP file:

{|class="wikitable"
|+ End of central directory record
|-
! Offset !! Bytes !! Description<ref name="appnote"/>
|-
|  0 || 4 || End of central directory signature = 0x06054b50
|-
|  4 || 2 || Number of this disk
|-
|  6 || 2 || Disk where central directory starts
|-
|  8 || 2 || Number of central directory records on this disk
|-
| 10 || 2 || Total number of central directory records
|-
| 12 || 4 || Size of central directory (bytes)
|-
| 16 || 4 || Offset of start of central directory, relative to start of archive
|-
| 20 || 2 || Comment length (''n'')
|-
| 22 || ''n'' || Comment
|}

This ordering allows a zip file to be created in one pass, but it is usually decompressed by first reading the central directory at the end.

=== Compression methods ===

The .ZIP File Format Specification documents the following compression methods: stored (no compression), Shrunk, Reduced (methods 1-4), Imploded, Tokenizing, Deflated, Deflate64, [[bzip2]], [[LZMA]] (EFS), [[WavPack]], [[Prediction by Partial Matching|PPMd]]. The most commonly used compression method is [[DEFLATE]], which is described in IETF RFC 1951.

Compression methods mentioned, but not documented in detail in the specification include: PKWARE Data Compression Library (DCL) Imploding (old IBM TERSE), IBM TERSE (new), IBM LZ77 z Architecture (PFS).

=== Encryption ===

Zip supports a simple [[password]]-based [[symmetric-key algorithm|symmetric encryption]] system which is documented in the zip specification, and known to be seriously flawed. In particular it is vulnerable to [[known-plaintext attack]]s which are in some cases made worse by poor implementations of [[random number generator]]s.<ref>Stay, Michael. "ZIP Attacks with Reduced Known Plaintext". http://math.ucr.edu/~mike/zipattacks.pdf</ref>

New features including new [[Data compression|compression]] and [[encryption]] (e.g. [[Advanced Encryption Standard|AES]]) methods have been documented in the .ZIP File Format Specification since version 5.2. A [[WinZip]]-developed AES-based standard is used also by [[7-Zip]], XCeed, and DotNetZip, but some vendors use other formats.<ref>[http://www.winzip.com/aes_info.htm AES Encryption Information: Encryption Specification AE-1 and AE-2]</ref> PKWARE SecureZIP also supports RC2, RC4, DES, Triple DES encryption methods, Digital Certificate-based encryption and authentication ([[X.509]]), and archive header encryption.<ref name="pkware">[http://www.pkware.com/support/zip-app-note/ Application Note on the .ZIP file format]</ref>

[[File name]] [[encryption]] is introduced in .ZIP File Format Specification 6.2, which encrypts metadata stored in Central Directory portion of an archive, but Local Header sections remain unencrypted. A compliant archiver can falsify the Local Header data when using Central Directory Encryption. As of Version 6.2 of the specification,
the Compression Method and Compressed Size fields within Local Header are not yet masked.

=== ZIP64 ===

The original zip format had a 4&nbsp;GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limitation to 16&nbsp;[[EiB]] (2<sup>64</sup> bytes).<!--fixme: more detail needed on support from vendors other than PKWARE-->

The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip supports ZIP64 from version [[Java Dolphin|Java 7]].<ref>{{cite web | url=http://blogs.sun.com/xuemingshen/entry/zip64_support_for_4g_zipfile | title=ZIP64, The Format for > 4G Zipfile, Is Now Supported | accessdate=2010-09-27 | last=Shen | first = Xueming | date=17 April 2009 | work=Xueming Shen's Blog | publisher= [[Sun Microsystems]] }}</ref>

=== Combination with other file formats ===

The zip file format allows for a comment containing any data to occur at the end of the file after the central directory.<ref name="appnote"/> Also, because the central directory specifies the offset of each file in the archive with respect to the start, it is possible in practice for the first file entry to start at an offset other than zero.

This allows arbitrary data to occur in the file both before and after the zip archive data, and for the archive to still be read by a zip application. A side-effect of this is that it is possible to author a file that is both a working zip archive and another format, provided that the other format tolerates arbitrary data at its end, beginning, or middle. [[Self-extracting archives]] (SFX), of the form supported by WinZip and DotNetZip, take advantage of this—they are .exe files that conform to the PKZIP AppNote.txt specification and can be read by compliant zip tools or libraries.

This property of the zip format, and of the JAR format which is a variant of zip, can be exploited to hide harmful Java classes inside a seemingly harmless file, such as a GIF image uploaded to the web. This so-called [[GIFAR]] exploit has been demonstrated as an effective attack against web applications such as Facebook.<ref>[http://www.infoworld.com/article/08/08/01/A_photo_that_can_steal_your_online_credentials_1.html A photo that can steal your online credentials]</ref>

=== Limits ===

The minimum size of a zip file is 22 bytes.

The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (2<sup>32</sup>−1 bytes, or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (2<sup>64</sup>−1 bytes, or 16 EiB) for ZIP64.<ref name="ziplimit">[http://www.artpol-software.com/ZipArchive/KB/0610051629.aspx Limits of ZIP file: Standard versus ZIP64.]</ref>

===Proprietary extensions===
====Extra field====
.ZIP file format includes the extra field facility within file headers, which can be used to store extra data not defined by existing .ZIP specifications, and allow compliant archivers not recognizing the fields to safely skip the fields. Header ID's 0-31 are reserved for use by PKWARE. The remaining ID's can be used by third party vendors for proprietary usage.

====Strong encryption controversy====
When [[WinZip]] 9.0 public beta was released in 2003, WinZip introduced its own [[AES-256]] encryption, using a different file format, along with the documentation for the new specification.<ref>[http://www.winzip.com/aes_info.htm WinZip - AES Encryption Information<!-- Bot generated title -->]</ref> The encryption standards themselves were not [[Proprietary software|proprietary]], but PKWARE had not updated APPNOTE.TXT to include Strong Encryption Specification (SES) since 2001, which had been used by PKZIP versions 5.0 and 6.0. WinZip technical consultant Kevin Kearney and [[StuffIt]] product manager Mathew Covington accused PKWARE of withholding SES, but PKZIP chief technology officer Jim Peterson claimed that Certificate-based encryption was still incomplete.

To overcome this shortcoming, contemporary products such as [[PentaZip]] implemented strong zip encryption by encrypting zip archives into a different file format.<ref>[http://www.infoworld.com/article/03/06/10/HNzipsplinters_1.html The .zip standard splinters | InfoWorld | News | 2003-06-10 | By Lincoln Spector, PC World.com<!-- Bot generated title -->]</ref>

In another controversial move, PKWare applied for a patent on 2003-07-16 describing a method for combining zip and strong encryption to create a secure file.<ref>[http://www.infoworld.com/article/03/07/25/HNpkware_1.html PKWare seeks patent for .zip file format | InfoWorld | News | 2003-07-25 | By Robert McMillan, IDG News Service<!-- Bot generated title -->]</ref>

In the end, PKWARE and WinZip agreed to support each other's products. On 2004-01-21, PKWARE announced the support of WinZip-based AES compression format.<ref>[http://www.news.com/2100-1012_3-5145491.html?tag=fd_nbs_ent Software makers patch Zip tiff - CNET News.com<!-- Bot generated title -->]</ref> In a later version of WinZip beta, it was able to support SES-based zip files.<ref>http://www.theregister.co.uk/2004/01/21/zip_file_encryption_compromise_thrashed/</ref> PKWARE eventually released version 5.2 of the .ZIP File Format Specification to the public, which documented SES. The [[Free Software]] project [[7-Zip]] also supports AES in zip files (as does its [[POSIX]] [[Porting|port]] [[p7zip]]).

When using AES encryption under WinZip, compression method is always set to 99, with actual compression method stored in AES extra data field.<ref>[http://www.winzip.com/win/en/aes_info.htm AES Encryption Information: Encryption Specification AE-1 and AE-2]</ref> In contrast, Strong Encryption Specification stores compression method in the basic file header segment of Local Header and Central Directory, unless Central Directory Encryption is used to mask/encrypt metadata.

== Advantages and disadvantages ==

Compressing files separately, as is done in zip files, allows for [[random access]]: individual files can be retrieved without reading through other data. It may allow better overall compression by using different algorithms for different files. Even when confining the possibility to DEFLATE compression, the use of different compression dictionaries for each file may result in a smaller archive overall.

This approach is less well-suited, in general, to archival of a large number of small files. In the zip archive format, the metadata for each entry—the information about each individual entry—is not compressed. This limits the maximum achievable compression ratio, especially as the size of the individual entries diminishes and approaches the size of the metadata for the entry.

An alternate approach is used in a [[Tar (file format)|compressed tar archive (<code>.tar.gz</code>, or <code>.tgz</code>)]], in which the file data and metadata are compressed as a unit using [[gzip]]. The downside of this approach is the loss of random access. The same approach can be used with zip: creating first a zip archive in which the individual files are uncompressed (STORE method), and then compressing the first zip file into another zip file which contains the first, will emulate solid archives. As in the case of compressed tar archives, random access is not possible.

== Implementation ==

There are numerous zip tools available, and numerous zip libraries for various programming environments; licenses used include commercial and [[open source]]. For instance, [[WinZip]] is one well-known zip tool running on Windows and [[WinRAR]], [[IZarc]], Info-zip, [[7-Zip]], [[PeaZip]] and DotNetZip are other tools, available on various platforms. Some of those tools have library or programmatic interfaces.

Some development libraries licensed under open source agreement are the [[GNU]] [[gzip]] project and [[Info-ZIP]]. For Java: [[Java Platform, Standard Edition]] contains the package "java.util.zip" to handle standard zip files; the Zip64File library specifically supports large files (larger than 4&nbsp;GB) and treats zip files using random access; and the [[Apache Ant]] tool contains a more complete implementation released under the [[Apache Software License]].

For .NET applications, there is a no-cost open-source library called DotNetZip<ref>{{cite web|url=http://www.codeplex.com/DotNetZip |title=DotNetZip Library |publisher=Codeplex.com |date=}}</ref> available in source and binary form under the Microsoft Public License.<ref>{{cite web|url=http://www.codeplex.com/DotNetZip/license |title=DotNetZip Library |publisher=Codeplex.com |date=}}</ref> It supports many zip features, including passwords for traditional zip encryption or WinZip-compatible AES encryption, Unicode, ZIP64, zip comments, spanned archives, and self-extracting archives. The [[.NET Framework|Microsoft .NET 3.5]] runtime library includes a class System.IO.Packaging.Package<ref>{{cite web|url=http://msdn.microsoft.com/en-us/library/system.io.packaging.package.aspx |title=Package Class (System.IO.Packaging) |publisher=Msdn.microsoft.com |date=}}</ref> that supports the zip format. It is primarily designed for document formats using the [[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] international standard [[Open Packaging Conventions]].

The [[Info-ZIP]] implementations of the zip format adds support for Unix filesystem features, such as user and group IDs, file permissions, and support for symbolic links. The [[Apache Ant]] implementation is aware of these to the extent that it can create files with predefined Unix permissions. The Info-ZIP implementations also know how to use the error correction capabilities built into the zip compression format. Some programs (such as [[IZArc]]) do not and will choke on a file that has errors.

The Info-ZIP Windows tools also support [[NTFS]] [[filesystem]] permissions, and will make an attempt to translate from NTFS permissions to Unix permissions or vice-versa when extracting files. This can result in potentially unintended combinations, e.g. [[.exe]] files being created on NTFS volumes with executable permission denied.

Versions of Microsoft Windows have included support for zip compression in Explorer since the Plus! pack was released for Windows 98.<ref>{{cite web|author=Česky |url=http://en.wikipedia.org/wiki/Windows_Millennium |title=Windows Me - Wikipedia, the free encyclopedia |publisher=En.wikipedia.org |date=}}</ref> Microsoft calls this feature "Compressed Folders". Not all zip features are supported by the Windows Compressed Folders capability. For example, AES Encryption, split or spanned archives, and Unicode entry encoding are not known to be readable or writable by the Compressed Folders feature in Windows XP or Windows Vista.

== Legacy ==

There are numerous other standards and formats using "zip" as part of their name. Phil Katz stated that he wanted to allow the "zip" name for any archive type.{{Citation needed|date=May 2009}} For example, zip is distinct from [[gzip]], and the latter is defined in an [[IETF]] [[Request for Comments|RFC]] (RFC 1952). Both zip and gzip primarily use the [[DEFLATE]] algorithm for compression. Likewise, the [[ZLIB]] format (IETF RFC 1950) also uses the DEFLATE compression algorithm, but specifies different headers for error and consistency checking. Other common, similarly named formats and programs with different native formats include [[7-Zip]], [[bzip2]], and [[rzip]].

== See also ==

* [[Comparison of file archivers]]
* [[Comparison of archive formats]]
* [[List of archive formats]]
* [[LZW]]

== References ==

{{refs|30em}}

== External links ==
*[http://www.pkware.com/support/zip-app-note/ .ZIP Application Note] contains links to .ZIP File Format Specification
*[ftp://ftp.info-zip.org/pub/infozip/doc/ Technical specifications of the PKZIP file formats annotated by info-ZIP] + [http://web.archive.org/web/*/http://www.info-zip.org/pub/infozip/doc/ archived version]
* [http://www.bbsdocumentary.com/library/CONTROVERSY/LAWSUITS/SEA/judgment.txt Judgment in favor of SEA in ''SEA v. PKWARE and Phil Katz'']
* [http://www.pkware.com/documents/casestudies/APPNOTE.TXT Current file format specification from PKWARE (including many recent features that are not widely supported)]
* [http://www.c10n.info/archives/430 18 Years of ZIP format: Happy Birthday] at The Data Compression News Blog
* [http://rlwpx.free.fr/WPFF/comploc.htm Comparison of the performances of various methods of data compression (french)]
* [http://www.dlugosz.com/ZIP2/index.html ZIP2 file format specification]
* [http://research.swtch.com/2010/03/zip-files-all-way-down.html Zip Files All The Way Down]
* [http://www.steike.com/code/useless/zip-file-quine/ ZIP File Quine]
* [http://www.unziponline.net Free tool to zip and unzip files online]
* [http://mindprod.com/jgloss/zip.html#GOTCHAS Limitations of <tt>java.util.zip</tt>]

{{Archive formats}}

{{DEFAULTSORT:Zip (File Format)}}
[[Category:Archive formats]]
[[Category:American inventions]]

[[bg:ZIP]]
[[ca:ZIP]]
[[cs:ZIP (souborový formát)]]
[[cy:ZIP (fformat ffeil)]]
[[de:ZIP-Dateiformat]]
[[es:Formato de compresión ZIP]]
[[eo:ZIP (densigilo)]]
[[fa:زدآی‌پی]]
[[fr:ZIP (format de fichier)]]
[[ko:ZIP (파일 포맷)]]
[[id:ZIP (format berkas)]]
[[it:ZIP (formato di file)]]
[[he:ZIP (פורמט)]]
[[lb:.zip]]
[[mr:झिप]]
[[nl:ZIP (bestandstype)]]
[[ja:ZIP (ファイルフォーマット)]]
[[no:ZIP]]
[[uz:ZIP]]
[[pl:ZIP]]
[[pt:ZIP]]
[[ru:ZIP]]
[[simple:ZIP (file format)]]
[[sl:Datotečni format ZIP]]
[[sr:ZIP]]
[[sh:ZIP]]
[[fi:ZIP]]
[[sv:Zip]]
[[tr:ZIP]]
[[uk:Zip]]
[[wuu:Zip]]
[[zh:ZIP格式]]

Revision as of 17:21, 24 September 2012

{