Base32 is one of several base 32 transfer encodings using a 32-character subset of the twenty-six letters A–Z and ten digits 0–9. Its closest encoding relation is Base30 that is used by the Natural Area Code.
Primarily Base32 is used to encode binary data, but is able to encode binary text like ASCII.
Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols that can be conveniently used by humans and processed by computers.
Base32 consists of a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into the Base32 alphabet. Because more than one 5-bit Base32 symbol is needed to represent each 8-bit input byte, it also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The closely related Base64 system, in contrast, uses a set of 64 symbols.
Base32 has number of advantages over Base64:
- The resulting character set is all one case (usually represented as uppercase), which can often be beneficial when using a case-insensitive filesystem, spoken language, or human memory.
- The result can be used as a file name because it can not possibly contain the '/' symbol which is usually acts as path separator in Unix-based operating systems.
- The alphabet can be selected to avoid similar-looking pairs of different symbols, so the strings can be accurately transcribed by hand. (For example, the RFC 4648 symbol set omits the digits for one, eight and zero, since they could be confused with the letters 'I', 'B', and 'O'.)
- A result excluding padding can be included in a URL without encoding any characters.
Base32 representation takes roughly 20% more space than Base64. Also, because it encodes 5 bytes to 8 characters (rather than 3 bytes to 4 characters), padding to an 8-character boundary is a greater burden on short messages.
The most widely used Base32 alphabet is defined in RFC 4648. It uses an alphabet of A–Z, followed by 2–7. 0 and 1 are skipped due to their similarity with the letters O and I (thus "2" actually has a decimal value of 26).
In some circumstances padding is not required or used. RFC 4648 states that padding must be used unless the specification of the standard referring to the RFC explicitly states otherwise. Excluding padding is useful when using base32 encoded data in URL tokens or file names where the padding character could pose a problem.
z-base-32 is a Base32 encoding designed to be easier for human use and more compact. It includes 1, 8 and 9 but excludes l, v and 2. It also permutes the alphabet so that the easier characters are the ones that occur more frequently. It compactly encodes bitstrings whose length in bits is not a multiple of 8, and omits trailing padding characters. z-base-32 was used in Mnet open source project, and is currently used in Phil Zimmermann's ZRTP protocol, and in the Tahoe-LAFS open source project.
Another alternative design for Base32 is created by Douglas Crockford, who proposes using additional characters for a checksum. It excludes the letters I, L, and O to avoid confusion with digits. It also excludes the letter U to reduce the likelihood of accidental obscenity.
|Value||Encode Digit||Decode Digit||Value||Encode Digit||Decode Digit|
|0||0||0 o O||16||G||g G|
|1||1||1 i I l L||17||H||h H|
|10||A||a A||26||T||t T|
|11||B||b B||27||V||v V|
|12||C||c C||28||W||w W|
|13||D||d D||29||X||x X|
|14||E||e E||30||Y||y Y|
|15||F||f F||31||Z||z Z|
An earlier form of base 32 notation was used by programmers working on the Electrologica X1 to represent machine addresses. The "digits" were represented as decimal numbers from 0 to 31. For example, 12-16 would represent the machine address 400 (= 12*32 + 16).
Triacontakaidecimal is another alternative design for Base 32, that extends hexadecimal in a more natural way. First proposed by Christian Lanctot, a programmer working at Sage software, in a letter to Dr.Dobbs magazine in March 1999 as a proposed solution for solving the Y2K bug and referred to as "Double Hex". RFC 4648 uses base32hex as name for this encoding deployed in RFC 2938.
Unlike many other base 32 notation systems, triacontakaidecimal is contiguous and includes characters that may visually conflict. With the right font it is possible to visually distinguish between 0, O and 1, I. Other fonts are unsuitable because the context that English usually provides is not provided by a notation system that is expressing numbers. However, the choice of font is not controlled by notation or encoding which is why it's risky to assume a distinguishable font will be used.
Before NVRAM became universal, several video games for Nintendo platforms used base 32 numbers for passwords. These systems, like Natural Area Code, omit vowels to prevent the game from accidentally giving a profane password. Thus, the characters are generally some minor variation of the following set: 0–9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks. Games known to use such a system include Mario Is Missing!, Mario's Time Machine, Tetris Blast, and The Lord of the Rings (Super NES).
- Ascii85 (also called Base85)
- Binary-to-text encoding for a comparison of various encoding algorithms
- "Crockford Base32 Encoder".
- "The crockford package"
- "A Python implementation of Douglas Crockford's base32 encoding scheme".
- "Python base32-crockford"