Chunked transfer encoding
|This article needs additional citations for verification. (June 2014)|
Chunked transfer encoding is a data transfer mechanism in version 1.1 of the Hypertext Transfer Protocol (HTTP) in which data is sent in a series of "chunks". It uses the Transfer-Encoding HTTP header in place of the Content-Length header, which the earlier version of the protocol would otherwise require. Because the Content-Length header is not used, the sender does not need to know the length of the content before it starts transmitting a response to the receiver. Senders can begin transmitting dynamically-generated content before knowing the total size of that content.
The size of each chunk is sent right before the chunk itself so that the receiver can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.
An early form of the chunked encoding was proposed in 1994. Later it was standardized in HTTP 1.1.
The introduction of chunked encoding provided various benefits:
- Chunked transfer encoding allows a server to maintain an HTTP persistent connection for dynamically generated content. In this case the HTTP Content-Length header cannot be used to delimit the content and the next HTTP request/response, as the content size is as yet unknown. Chunked encoding has the benefit that it is not necessary to generate the full content before writing the header, as it allows streaming of content as chunks and explicitly signaling the end of the content, making the connection available for the next HTTP request/response.
- Chunked encoding allows the sender to send additional header fields after the message body. This is important in cases where values of a field cannot be known until the content has been produced such as when the content of the message must be digitally signed. Without chunked encoding, the sender would have to buffer the content until it was complete in order to calculate a field value and send it before the content.
- HTTP servers often use compression (gzip or deflate methods) to optimize transmission. The interaction between chunked and gzip encoding is dictated by the two-staged encoding of HTTP: first the content stream is encoded as (Content-Encoding: gzip), after which the resulting byte stream is encoded for transfer using another encoder (Transfer-Encoding: chunked). This means that in case both compression and chunked encoding are enabled, the chunk encoding itself is not compressed, and the data in each chunk should not be compressed individually. The remote endpoint can decode the incoming stream by first decoding it with the Transfer-Encoding, followed by the specified Content-Encoding.
For version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always and anyways acceptable, even if not listed in the TE (transfer encoding) request header field, and when used with other transfer mechanisms, should always be applied last to the transferred data and never more than one time. This transfer coding method also allows additional entity header fields to be sent after the last chunk if the client specified the "trailers" parameter as an argument of the TE field. The origin server of the response can also decide to send additional entity trailers even if the client did not specify the "trailers" option in the TE request field, but only if the metadata is optional (i.e. the client can use the received entity without them). Whenever the trailers are used, the server should list their names in the Trailer header field; 3 header field types are specifically prohibited from appearing as a trailer field: Transfer-Encoding, Content-Length and Trailer.
If a Transfer-Encoding field with a value of "chunked" is specified in an HTTP message (either a request sent by a client or the response from the server), the body of the message consists of an unspecified number of chunks, a terminating chunk, trailer, and a final CRLF sequence (i.e. carriage return followed by line feed).
Each chunk starts with the number of octets of the data it embeds expressed as a hexadecimal number in ASCII followed by optional parameters (chunk extension) and a terminating CRLF sequence, followed by the chunk data. The chunk is terminated by CRLF.
If chunk extensions are provided, the chunk size is terminated by a semicolon and followed by the parameters, each also delimited by semicolons. Each parameter is encoded as an extension name followed by an optional equal sign and value. These parameters could be used for a running message digest or digital signature, or to indicate an estimated transfer progress, for instance.
The terminating chunk is a regular chunk, with the exception that its length is zero. It is followed by the trailer, which consists of a (possibly empty) sequence of entity header fields. Normally, such header fields would be sent in the message's header; however, it may be more efficient to determine them after processing the entire message entity. In that case, it is useful to send those headers in the trailer.
Header fields that regulate the use of trailers are TE (used in requests), and Trailers (used in responses).
In the following example, every second line is the start of a new chunk, with the chunk size as a hexadecimal number followed by \r\n as a line separator.
4\r\n Wiki\r\n 5\r\n pedia\r\n e\r\n in\r\n\r\nchunks.\r\n 0\r\n \r\n
Note: the chunk size indicates the size of only the chunk data. This does not include the trailing CRLF ("\r\n") at the end of the counted characters. In this particular example, the CRLF following "in" is counted 2 toward the chunk length of 0xE (14), and the CRLF in its own line is also counted 2 toward the chunk length of 0xE (14). The period character at the end of "chunks" is the 14th character, so it is the last character of the chunk of length 0xE (14). The CRLF following the period is the trailing CRLF, so it is not counted toward the chunk length of 0xE (14).
Wikipedia in chunks.