HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. Compression as defined in RFC 2616 can be applied to the response data from the server (but not to the request data from the client)
The most common compression schemas include gzip and deflate, however a full list of available schemas is maintained by IANA. Additionally, third parties develop new methods and include them in their products (e.g. the Google Shared Dictionary Compression Over HTTP (SDCH) schema implemented in Google Chrome browser and used on certain Google servers).
Client/Server compression scheme negotiation
In most cases, excluding the SDCH, the negotiation is done in two steps, described in RFC 2616:
GET /encrypted-area HTTP/1.1 Host: www.example.com Accept-Encoding: gzip, deflate
2. If the server supports one or more compression schemas, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a Content-Encoding field in the HTTP response with the used schemas, separated by commas.
HTTP/1.1 200 OK Date: mon, 16 Apr 2014 22:38:34 GMT Server: Apache/184.108.40.206 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Accept-Ranges: bytes Content-Length: 438 Connection: close Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip
The web server is by no means obliged to use any compression method – this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.
In case of SDCH a dictionary negotiation is also required, which may involve additional steps, like downloading a proper dictionary from the external server.
Problems preventing the use of HTTP compression
A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to page load time increases when users do not receive compressed content. This occurs where anti-virus software interferes with connections to force them to be uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy- a common configuration in corporate environments- was the mainstream browser most prone to failing back to uncompressed HTTP.
Most common current encoding and compression schemes supported in browsers and HTTP libraries (like cURL):
- identity – No transformation is used. This is the default value for content coding.
- gzip – GNU zip format (described in RFC 1952). This method is the most broadly supported as of March 2011.
- deflate – despite its name the zlib compression (RFC 1950) should be used (in combination with the deflate compression (RFC 1951)) as described in the RFC 2616. The implementation in the real world however seems to vary between the zlib compression and the (raw) deflate compression. Due to this confusion, gzip has positioned itself as the more reliable default method for software implementations (March 2011), but deflate is supported reliably by hardware, which make it more suitable for server-side compression or in front-end proxies operating with contents on large bandwidths.
- sdch – Google Shared Dictionary Compression for HTTP (based on RFC 3284; supported natively in recent versions of Chrome, Chromium and Android, as well as on Google websites; may be supported in other popular browsers as well, and frequently used on websites and portals related to distribution of audio, video, TV and games medias).
Other compression and encodings which may require plugin or libraries specific to some applications or which were used historically:
- compress – UNIX "compress" program method (historic; deprecated in most applications and replaced by gzip or deflate)
- exi – W3C Efficient XML Interchange
- pack200-gzip – Network Transfer Format for Java Archives
- bzip2 – free and open source lossless data compression algorithm
- peerdist – Microsoft Peer Content Caching and Retrieval (described in MS-PCCRPT)
- lzma – elinks supports LZMA via a compile-time option. Firefox and Gecko will be supporting LZMA compression, this is particularly interesting for smartphones and tablet where bandwidth is limited: LZMA has a very high compression ratio compared to gzip (patch discussed in )
Servers that support HTTP compression
- SAP NetWeaver
- Microsoft IIS: built-in or using third-party module
- Apache HTTP Server, via mod_deflate (despite its name currently only supporting gzip) or mod_gzip
- Hiawatha HTTP server: serves pre-compressed files
- Cherokee HTTP server, On the fly gzip and deflate compressions
- Oracle iPlanet Web Server
- Zeus Web Server
- lighttpd, via mod_compress and the newer mod_deflate (1.5.x)
- nginx – built-in
- Applications based on Tornado, if "gzip" is set to True in the application settings
- Jetty Server – built-into default static content serving and available via servlet filter configurations
- Apache Tomcat
- IBM Websphere
- Ruby Rack, via the Rack::Deflater middleware
In 2012, a general attack against the use of data compression, called CRIME, was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined.
In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting a malicious web link. All versions of TLS and SSL are at risk from BREACH regardless of the encryption algorithm or cipher used. Unlike previous instances of CRIME, which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression which cannot realistically be turned off, as virtually all web servers rely upon it to improve data transmission speeds for users.
- "Using HTTP Compression (IIS 6.0)". Microsoft Corporation. Retrieved 9 February 2010.
- RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."
- "Use compression to make the web faster". Google Developers. Retrieved 22 May 2013.
- "Compression Tests: Results". Verve Studios, Co. Retrieved 19 July 2012.
- "Compression Tests". Verve Studios, Co. Retrieved 19 July 2012.
- "Frequently Asked Questions about zlib – What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?". Greg Roelofs, Jean-loup Gailly and Mark Adler. Retrieved 23 March 2011.
- JSR 200: Network Transfer Format for Java Archives.
- elinks LZMA decompression
- "HOWTO: Use Apache mod_deflate To Compress Web Content (Accept-Encoding: gzip) – Mark S. Kolich". Mark S. Kolich. Retrieved 23 March 2011.
- Extra part of Hiawatha webserver's manual
- Goodin, Dan (1 August 2013). "Gone in 30 seconds: New attack plucks secrets from HTTPS-protected pages". Ars Technica. Condé Nast. Retrieved 2 August 2013.
- Leyden, John (2 August 2013). "Step into the BREACH: New attack developed to read encrypted web data". The Register. Retrieved 2 August 2013.
- RFC 2616: Hypertext Transfer Protocol – HTTP/1.1
- HTTP Content-Coding Values by Internet Assigned Numbers Authority
- Apache: mod_deflate & mod_gzip
- Compression with lighttpd
- Coding Horror: HTTP Compression on IIS 6.0
- 15 Seconds: Web Site Compression at the Wayback Machine (archived July 16, 2011)
- HTTP Compression: resource page by the founder of VIGOS AG, Constantin Rack
- Using HTTP Compression by Martin Brown of Server Watch
- Using HTTP Compression in PHP
- check http compression
- Dynamic and static HTTP compression with Apache httpd
- Browser HTTP Compression Test