Quoted-printable
Quoted-Printable, or QP encoding, is an encoding using printable ASCII characters (alphanumeric and the equals sign "=
") to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean.[1] It is defined as a MIME content transfer encoding for use in e-mail.
QP works by using the equals sign "=
" as an escape character. It also limits line length to 76, as some software has limits on line length.
Introduction
MIME defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English, using character encodings other than ASCII. However, these encodings often use byte values outside the ASCII range so they need to be encoded further before they are suitable for use in a non-8-bit-clean environment. Quoted-Printable encoding is one method used for mapping arbitrary bytes into sequences of ASCII characters. So, Quoted-Printable is not a character encoding scheme itself, but a data coding layer to be used under some byte-oriented character encoding. QP encoding is reversible, meaning the original bytes and hence the non-ASCII characters they represent can be identically recovered.
Quoted-Printable and Base64 are the two basic MIME content transfer encodings, if a trivial "8bit" encoding is not counted. If the text to be encoded does not contain many non-ASCII characters, then Quoted-Printable results in a fairly readable[2] and compact encoded result. On the other hand, if the input is not mostly ASCII, then Quoted-Printable becomes both unreadable and extremely inefficient. Base64 is not human-readable, but has a uniform overhead for all data and is the more sensible choice for binary formats or text in non-Latin based languages.
Quoted-Printable encoding
Any 8-bit byte value may be encoded with 3 characters: an "=
" followed by two hexadecimal digits (0
–9
or A
–F
) representing the byte's numeric value. For example, an ASCII form feed character (decimal value 12) can be represented by "=0C
", and an ASCII equal sign (decimal value 61) must be represented by "=3D
". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.
All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=
" (decimal 61).
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters would appear at the end of the encoded line. In that case, they would need to be escaped as "=09
" (tab) or "=20
" (space), or be followed by a "=
" (soft line break) as the last character of the encoded line. This last solution is valid because it prevents the tab or space from being the last character of the encoded line.
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values, neither directly nor via "=
" signs. Conversely, if byte values 13 and 10 have meanings other than end of line (in media types,[3] for example), then they must be encoded as =0D
and =0A
respectively.
Lines of Quoted-Printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=
" at the end of an encoded line, and does not appear as a line break in the decoded text. These soft line breaks also allow encoding text without line breaks (or containing very long lines) for an environment where line size is limited, such as the "1000 characters per line" limit of some SMTP software, as allowed by RFC 2821.
A slightly modified version of Quoted-Printable is used in message headers; see MIME#Encoded-Word.
Example
If you believe that truth=3Dbeauty, then surely mathematics is the most bea= utiful branch of philosophy.
This encodes the string:
If you believe that truth=beauty, then surely mathematics is the most beautiful branch of philosophy.
Notes
- ^ Historically, e-mail was often referred as non-8-bit-clean, because various media were used to transfer messages, sometimes other than Internet. Modern ESMTP servers are 8-bit clean though in most cases, see 8BITMIME.
- ^ This implies that an ASCII compatible encoding is used. A QP-encoded text in e.g. EBCDIC would not be readable of course.
- ^ Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. November 1996. RFC 2045 # 6.7 Quoted-Printable Content-Transfer-Encoding, part "(4) (Line Breaks)". Retrieved March 18, 2013.
Similar encoding schemes
- Percent-encoding (data encoding in URLs, mostly used for text)
- Numeric character reference (text encoding in SGML, HTML, XML)
- Rich Text Format#Character encoding (a component of text encoding)
External links
- RFC 1521 (obsolete)
- RFC 2045 (MIME)
- Online Quoted-Printable Decoder