Base32: Difference between revisions
Line 70: | Line 70: | ||
== Alternative versions == |
== Alternative versions == |
||
=== z-base-32 === |
|||
[http://zooko.com/repos/z-base-32/base32/DESIGN z-base-32] is a Base32 encoding designed to be easier for human use and more compact. It includes [[8 (number)|8]] and [[9 (number)|9]] but excludes [[U]] and [[2 (number)|2]]. It also permutes the alphabet so that the easier characters are the ones that occur more frequently. It compactly encodes bitstrings whose length in bits is not a multiple of 8, and omits trailing padding characters. z-base-32 was used in [[Mnet (Computer program)|Mnet]] open source project, and is currently used in [[Phil Zimmermann]]'s [[ZRTP]] protocol, and in the [http://allmydata.org/ Allmydata-Tahoe] open source project. |
[http://zooko.com/repos/z-base-32/base32/DESIGN z-base-32] is a Base32 encoding designed to be easier for human use and more compact. It includes [[8 (number)|8]] and [[9 (number)|9]] but excludes [[U]] and [[2 (number)|2]]. It also permutes the alphabet so that the easier characters are the ones that occur more frequently. It compactly encodes bitstrings whose length in bits is not a multiple of 8, and omits trailing padding characters. z-base-32 was used in [[Mnet (Computer program)|Mnet]] open source project, and is currently used in [[Phil Zimmermann]]'s [[ZRTP]] protocol, and in the [http://allmydata.org/ Allmydata-Tahoe] open source project. |
||
Revision as of 16:04, 3 March 2011
Base32 is a base-32 transfer encoding using the twenty-six letters A-Z and six digits 2-7.
Software
Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols which can be conveniently used by humans and processed by old computer systems which only recognize restricted character sets. It comprises a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary strings using 8-bit characters into the Base32 alphabet. This uses more than one 5-bit Base32 symbol for each 8-bit input character, and thus also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The Base64 system, in contrast, is closely related but uses a larger set of 64 symbols.
Advantages
Base32 has three main advantages over Base64:
- The resulting character set is all one case (usually represented as uppercase), which can often be beneficial when using a case-insensitive filesystem, spoken speech, or human memory.
- The alphabet was selected to avoid similar-looking pairs of different symbols, so the strings can be efficiently transcribed by hand. (For example, the symbol set omits the symbols for 1, 8 and zero, since they could be confused with the letters 'I', 'B', and 'O'.)
- The result can be included in a URL without encoding any characters.
Disadvantages
Base32 representation takes roughly 20% more space than Base64.
Base64 | Base32 | |
---|---|---|
8-bit | 133% | 160% |
7-bit | 117% | 140% |
RFC 4648 Base32 alphabet
The most widely used Base32 alphabet is defined in RFC 4648. It uses an alphabet of A–Z, followed by 2–7. 0 and 1 are skipped due to their similarity with the letters O and I (thus "2" actually has a numerical value of 26).
In some circumstances padding is not required or used. RFC 4648 states that padding MUST be used unless the specification of the standard referring to the RFC explicitly states otherwise. Excluding padding is useful when using base32 encoded data in URL tokens or file names where the padding character could pose a problem.
Value | Symbol | Value | Symbol | Value | Symbol | Value | Symbol | |||
---|---|---|---|---|---|---|---|---|---|---|
0 | A | 9 | J | 18 | S | 27 | 3 | |||
1 | B | 10 | K | 19 | T | 28 | 4 | |||
2 | C | 11 | L | 20 | U | 29 | 5 | |||
3 | D | 12 | M | 21 | V | 30 | 6 | |||
4 | E | 13 | N | 22 | W | 31 | 7 | |||
5 | F | 14 | O | 23 | X | |||||
6 | G | 15 | P | 24 | Y | |||||
7 | H | 16 | Q | 25 | Z | |||||
8 | I | 17 | R | 26 | 2 | pad | = |
Alternative versions
z-base-32
z-base-32 is a Base32 encoding designed to be easier for human use and more compact. It includes 8 and 9 but excludes U and 2. It also permutes the alphabet so that the easier characters are the ones that occur more frequently. It compactly encodes bitstrings whose length in bits is not a multiple of 8, and omits trailing padding characters. z-base-32 was used in Mnet open source project, and is currently used in Phil Zimmermann's ZRTP protocol, and in the Allmydata-Tahoe open source project.
An earlier form of base 32 notation was used by programmers working on the Electrologica X1 to represent machine addresses. The "digits" were represented as decimal numbers from 0 to 31. For example, 12-16 would represent the machine address 400.
Another alternative design for Base32 is created by Douglas Crockford, who proposes using additional characters for a checksum.[1] It excludes the letters I, L, and O to avoid confusion with digits. It also excludes the letter U to reduce the likelihood of accidental obscenity.
Value | Encode Digit | Decode Digit | Value | Encode Digit | Decode Digit | |
---|---|---|---|---|---|---|
0 | 0 | 0 o O | 16 | G | g G | |
1 | 1 | 1 i I l L | 17 | H | h H | |
2 | 2 | 2 | 18 | J | j J | |
3 | 3 | 3 | 19 | K | k K | |
4 | 4 | 4 | 20 | M | m M | |
5 | 5 | 5 | 21 | N | n N | |
6 | 6 | 6 | 22 | P | p P | |
7 | 7 | 7 | 23 | Q | q Q | |
8 | 8 | 8 | 24 | R | r R | |
9 | 9 | 9 | 25 | S | s S | |
10 | A | a A | 26 | T | t T | |
11 | B | b B | 27 | V | v V | |
12 | C | c C | 28 | W | w W | |
13 | D | d D | 29 | X | x X | |
14 | E | e E | 30 | Y | y Y | |
15 | F | f F | 31 | Z | z Z |
Ionisis [2] uses Crockford's Base-32 alphabet.
base32hex
Triacontakaidecimal is another alternative design for Base 32, that extends Hexadecimal in a more natural way. RFC 4648 uses base32hex as name for this encoding deployed in RFC 2938. Note the difference between 0, O and 1, I. They are similar, but still distinguishable in ASCII. In binary it would end up represented in 8-bits, although it could be contained in 5, as 8-bits actually represents a base 256 numbering system.
Triacontakia | Binary | Decimal |
---|---|---|
0 | 00000 | 0 |
1 | 00001 | 1 |
2 | 00010 | 2 |
3 | 00011 | 3 |
4 | 00100 | 4 |
5 | 00101 | 5 |
6 | 00110 | 6 |
7 | 00111 | 7 |
8 | 01000 | 8 |
9 | 01001 | 9 |
A | 01010 | 10 |
B | 01011 | 11 |
C | 01100 | 12 |
D | 01101 | 13 |
E | 01110 | 14 |
F | 01111 | 15 |
G | 10000 | 16 |
H | 10001 | 17 |
I | 10010 | 18 |
J | 10011 | 19 |
K | 10100 | 20 |
L | 10101 | 21 |
M | 10110 | 22 |
N | 10111 | 23 |
O | 11000 | 24 |
P | 11001 | 25 |
Q | 11010 | 26 |
R | 11011 | 27 |
S | 11100 | 28 |
T | 11101 | 29 |
U | 11110 | 30 |
V | 11111 | 31 |
To avoid the similarity of 0, O and 1, I, one can omit I and O and thus use 0-H for 0-17, J-N for 18-22 and P-X for 23 till 31.
Video games
Before NVRAM became universal, several video games for Nintendo platforms used base 32 numbers for passwords. These systems, like Natural Area Code, omit vowels to prevent the game from accidentally giving a profane password. Thus, the characters are generally some minor variation of the following set: 0-9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks. Games known to use such a system include Mario Is Missing!, Mario's Time Machine, Tetris Blast, and The Lord of the Rings (Super NES).
See also
References
- RFC 4648