HTTP compression
HTTP |
---|
Request methods |
Header fields |
Response status codes |
Security access control methods |
Security vulnerabilities |
HTTP compression is a capability that can be built into web servers and web clients to make better use of available bandwidth , and provide faster transmission speeds between both.[1] HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemas include gzip and deflate, however a full list of available schemas is maintained by IANA.[2] Additionally, third parties develop new methods and include them in their products (e.g. the Google SDCH schema implemented in Google Chrome browser and used on certain Google servers).
A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to page load time increases when users do not receive compressed content. This occurs where anti-virus software interferes with connections to force them to uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy- a common configuration in corporate environments- was the mainstream browser most prone to failing back to uncompressed HTTP.[3]
Client/Server compression scheme negotiation
In most cases, excluding the SDCH, the negotiation is done in two steps, described in the RFC 2616:
1. The web client includes an Accept-Encoding field in the HTTP request, with supported compression schema names (called content-coding tokens), separated by commas.
GET /encrypted-area HTTP/1.1 Host: www.example.com Accept-Encoding: gzip, deflate
2. If the server supports one or more compression schemas, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a Content-Encoding field in the HTTP response with the used schemas, separated by commas.
HTTP/1.1 200 OK Date: Mon, 23 May 2005 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Etag: "3f80f-1b6-3e1cb03b" Accept-Ranges: bytes Content-Length: 438 Connection: close Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip
The web server is by no means obligated to use any compression method - this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.
In case of SDCH a dictionary negotiation is also required, which may involve additional steps, like downloading a proper dictionary from the external server.
Content-coding tokens
- compress - UNIX "compress" program method
- deflate - despite its name the zlib compression (RFC 1950) should be used (in combination with the deflate compression (RFC 1951)) as described in the RFC 2616. The implementation in the real world however seems to vary between the zlib compression and the (raw) deflate compression.[4][5] Due to this confusion, gzip has positioned itself as the more reliable default method (March 2011).
- exi - W3C Efficient XML Interchange
- gzip - GNU zip format (described in RFC 1952). This method is the most broadly supported as of March 2011.[6]
- identity - No transformation is used. This is the default value for content coding.
- pack200-gzip - Network Transfer Format for Java Archives [7]
- sdch - Google Shared Dictionary Compression for HTTP
- bzip2 - free and open source lossless data compression algorithm
- peerdist - Microsoft Peer Content Caching and Retrieval (described in MS-PCCRPT)
Servers that support HTTP compression
- Microsoft IIS: built-in or using 3rd-party module
- Apache HTTP Server, via mod_deflate (despite its name currently only supporting gzip[8][9]) or mod_gzip
- Sun Java System Web Server
- Zeus Web Server
- Lighttpd, via mod_compress and the newer mod_deflate (1.5.x)
- Nginx - built-in
- Geoserver
The compression in HTTP can also be achieved by using the functionality of server-side scripting languages, like PHP or Java.
References
- ^ "Using HTTP Compression (IIS 6.0)". Microsoft Corporation. Retrieved 9 February 2010.
- ^ RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."
- ^ http://code.google.com/speed/articles/use-compression.html
- ^ "Compression Tests". Verve Studios, Co. Retrieved 23 March 2011.
- ^ "Frequently Asked Questions about zlib - What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?". Greg Roelofs, Jean-loup Gailly and Mark Adler. Retrieved 23 March 2011.
- ^ "Compression Tests: Results". Verve Studios, Co. Retrieved 23 March 2011.
- ^ JSR 200: Network Transfer Format for Java Archives.
- ^ "Compression Tests". Verve Studios, Co. Retrieved 23 March 2011.
- ^ "HOWTO: Use Apache mod_deflate To Compress Web Content (Accept-Encoding: gzip) - Mark S. Kolich". Mark S. Kolich. Retrieved 23 March 2011.
External links
- RFC 2616: Hypertext Transfer Protocol - HTTP/1.1
- HTTP Content-Coding Values by Internet Assigned Numbers Authority
- Apache: mod_deflate & mod_gzip
- Compression with lighttpd
- Coding Horror: HTTP Compression on IIS 6.0
- 15 Seconds: Web Site Compression
- HTTP Compression: resource page by the founder of VIGOS AG, Constantin Rack
- Using HTTP Compression by Martin Brown of Server Watch
- Using HTTP Compression in PHP
- check http compression
- Dynamic and static HTTP compression with Apache httpd