HTTP compression

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Yobot (talk | contribs) at 11:22, 17 December 2011 (WP:CHECKWIKI error 61 fixes + general fixes using AWB (7879)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

HTTP compression is a capability that can be built into web servers and web clients to make better use of available bandwidth , and provide faster transmission speeds between both.[1] HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemas include gzip and deflate, however a full list of available schemas is maintained by IANA.[2] Additionally, third parties develop new methods and include them in their products (e.g. the Google SDCH schema implemented in Google Chrome browser and used on certain Google servers).

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to page load time increases when users do not receive compressed content. This occurs where anti-virus software interferes with connections to force them to uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy- a common configuration in corporate environments- was the mainstream browser most prone to failing back to uncompressed HTTP.[3]

Client/Server compression scheme negotiation

In most cases, excluding the SDCH, the negotiation is done in two steps, described in the RFC 2616:

1. The web client includes an Accept-Encoding field in the HTTP request, with supported compression schema names (called content-coding tokens), separated by commas.

GET /encrypted-area HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip, deflate

2. If the server supports one or more compression schemas, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a Content-Encoding field in the HTTP response with the used schemas, separated by commas.

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip

The web server is by no means obligated to use any compression method - this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.

In case of SDCH a dictionary negotiation is also required, which may involve additional steps, like downloading a proper dictionary from the external server.

Content-coding tokens

  • compress - UNIX "compress" program method
  • deflate - despite its name the zlib compression (RFC 1950) should be used (in combination with the deflate compression (RFC 1951)) as described in the RFC 2616. The implementation in the real world however seems to vary between the zlib compression and the (raw) deflate compression.[4][5] Due to this confusion, gzip has positioned itself as the more reliable default method (March 2011).
  • exi - W3C Efficient XML Interchange
  • gzip - GNU zip format (described in RFC 1952). This method is the most broadly supported as of March 2011.[6]
  • identity - No transformation is used. This is the default value for content coding.
  • pack200-gzip - Network Transfer Format for Java Archives [7]
  • sdch - Google Shared Dictionary Compression for HTTP
  • bzip2 - free and open source lossless data compression algorithm
  • peerdist - Microsoft Peer Content Caching and Retrieval (described in MS-PCCRPT)

Servers that support HTTP compression

The compression in HTTP can also be achieved by using the functionality of server-side scripting languages, like PHP or Java.

References

  1. ^ "Using HTTP Compression (IIS 6.0)". Microsoft Corporation. Retrieved 9 February 2010.
  2. ^ RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."
  3. ^ http://code.google.com/speed/articles/use-compression.html
  4. ^ "Compression Tests". Verve Studios, Co. Retrieved 23 March 2011.
  5. ^ "Frequently Asked Questions about zlib - What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?". Greg Roelofs, Jean-loup Gailly and Mark Adler. Retrieved 23 March 2011.
  6. ^ "Compression Tests: Results". Verve Studios, Co. Retrieved 23 March 2011.
  7. ^ JSR 200: Network Transfer Format for Java Archives.
  8. ^ "Compression Tests". Verve Studios, Co. Retrieved 23 March 2011.
  9. ^ "HOWTO: Use Apache mod_deflate To Compress Web Content (Accept-Encoding: gzip) - Mark S. Kolich". Mark S. Kolich. Retrieved 23 March 2011.

External links