Bijective numeration

Bijective numeration is any numeral system in which every non-negative integer can be represented in exactly one way using a finite string of digits. The name refers to the bijection (i.e. one-to-one correspondence) that exists in this case between the set of non-negative integers and the set of finite strings using a finite set of symbols (the "digits").

Most ordinary numeral systems, such as the common decimal system, are not bijective because more than one string of digits can represent the same positive integer. In particular, adding leading zeroes does not change the value represented, so "1", "01" and "001" all represent the number one. Even though only the first is usual, the fact that the others are possible means that the decimal system is not bijective. However, the unary numeral system, with only one digit, is bijective.

A bijective base-k numeration is a bijective positional notation. It uses a string of digits from the set {1, 2, ..., k} (where k ≥ 1) to encode each positive integer; a digit's position in the string defines its value as a multiple of a power of k. Smullyan (1961) calls this notation k-adic, but it should not be confused with the p-adic numbers: bijective numerals are a system for representing ordinary integers by finite strings of nonzero digits, whereas the p-adic numbers are a system of mathematical values that contain the integers as a subset and may need infinite sequences of digits in any numerical representation.

Definition

The base-k bijective numeration system uses the digit-set {1, 2, ..., k} (k ≥ 1) to uniquely represent every non-negative integer, as follows:

The integer zero is represented by the empty string.
The integer represented by the nonempty digit-string

a n a n -1 ... a 1 a 0

is

a n k n + a n -1 k n -1 + ... + a 1 k 1 + a 0 k 0

.

The digit-string representing the integer m > 0 is

a n a n -1 ... a 1 a 0

where

{\begin{aligned}a_{0}&=m-q_{0}k,&q_{0}&=f\left({\frac {m}{k}}\right)&\\a_{1}&=q_{0}-q_{1}k,&q_{1}&=f\left({\frac {q_{0}}{k}}\right)&\\a_{2}&=q_{1}-q_{2}k,&q_{2}&=f\left({\frac {q_{1}}{k}}\right)&\\&\,\,\,\vdots &&\,\,\,\vdots \\a_{n}&=q_{n-1}-0k,&q_{n}&=f\left({\frac {q_{n-1}}{k}}\right)=0\end{aligned}}

and

f(x)=\lceil x\rceil -1,

\lceil x\rceil

being the least integer not less than x (the ceiling function).

In contrast, standard positional notation can be defined with a similar recursive algorithm where

f(x)=\lfloor x\rfloor ,

Extension to integers

For base $k>1$ , the bijective base- $k$ numeration system could be extended to negative integers in the same way as the standard base- $b$ numeral system by use of an infinite number of the digit $d_{k-1}$ , where $f(d_{k-1})=k-1$ , represented as a left-infinite sequence of digits $\ldots d_{k-1}d_{k-1}d_{k-1}={\overline {d_{k-1}}}$ . This is because the Euler summation

g({\overline {d_{k-1}}})=\sum _{i=0}^{\infty }f(d_{k-1})k^{i}=-{\frac {k-1}{k-1}}=-1

meaning that

g({\overline {d_{k-1}}}d_{k})=f(d_{k})\sum _{i=1}^{\infty }f(d_{k-1})k^{i}=1+\sum _{i=0}^{\infty }f(d_{k-1})k^{i}=0

and for every positive number $n$ with bijective numeration digit representation $d$ is represented by ${\overline {d_{k-1}}}d_{k}d$ . For base $k>2$ , negative numbers $n<-1$ are represented by ${\overline {d_{k-1}}}d_{i}d$ with $i<k-1$ , while for base $k=2$ , negative numbers $n<-1$ are represented by ${\overline {d_{k}}}d$ . This is similar to how in signed-digit representations, all integers $n$ with digit representations $d$ are represented as ${\overline {d_{0}}}d$ where $f(d_{0})=0$ . This representation is no longer bijective, as the entire set of left-infinite sequences of digits is used to represent the $k$ -adic integers, of which the integers are only a subset.

Properties of bijective base-k numerals

For a given base $k\geq 2$ ,

the number of digits in the bijective base-k numeral representing a nonnegative integer n is
$\lfloor \log _{k}((n+1)(k-1))\rfloor$ ,^[1] in contrast to $\lceil \log _{k}(n+1)\rceil$ for ordinary base-k numerals;
if k = 1 (i.e., unary), then the number of digits is just n;
the smallest nonnegative integer, representable in a bijective base-k numeral of length $\ell \geq 0$ , is
$\mathrm {min} (\ell )={\frac {k^{\ell }-1}{k-1}}$ ;
the largest nonnegative integer, representable in a bijective base-k numeral of length $\ell \geq 0$ , is
$\mathrm {max} (\ell )={\frac {k^{\ell +1}-k}{k-1}}$ , equivalent to $\mathrm {max} (\ell )=k\times \mathrm {min} (\ell )$ , or $\mathrm {max} (\ell )=\mathrm {min} (\ell +1)-1$ ;
the bijective base-k and ordinary base-k numerals for a nonnegative integer n are identical if and only if the ordinary numeral does not contain the digit 0 (or, equivalently, the bijective numeral is neither the empty string nor contains the digit k).

For a given base $k\geq 1$ ,

there are exactly $k^{\ell }$ bijective base-k numerals of length $\ell \geq 0$ ;^[2]
a list of bijective base-k numerals, in natural order of the integers represented, is automatically in shortlex order (shortest first, lexicographical within each length). Thus, using λ to denote the empty string, the base 1, 2, 3, 8, 10, 12, and 16 numerals are as follows (where the ordinary representations are listed for comparison):

bijective base 1:	`λ`	`1`	`11`	`111`	`1111`	`11111`	`111111`	`1111111`	`11111111`	`111111111`	`1111111111`	`11111111111`	`111111111111`	`1111111111111`	`11111111111111`	`111111111111111`	`1111111111111111`	`...`	(unary numeral system)
bijective base 2:	`λ`	`1`	`2`	`11`	`12`	`21`	`22`	`111`	`112`	`121`	`122`	`211`	`212`	`221`	`222`	`1111`	`1112`	`...`
binary:	`0`	`1`	`10`	`11`	`100`	`101`	`110`	`111`	`1000`	`1001`	`1010`	`1011`	`1100`	`1101`	`1110`	`1111`	`10000`	`...`
bijective base 3:	`λ`	`1`	`2`	`3`	`11`	`12`	`13`	`21`	`22`	`23`	`31`	`32`	`33`	`111`	`112`	`113`	`121`	`...`
ternary:	`0`	`1`	`2`	`10`	`11`	`12`	`20`	`21`	`22`	`100`	`101`	`102`	`110`	`111`	`112`	`120`	`121`	`...`
bijective base 8:	`λ`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`11`	`12`	`13`	`14`	`15`	`16`	`17`	`18`	`...`
octal:	`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`10`	`11`	`12`	`13`	`14`	`15`	`16`	`17`	`20`	`...`
bijective base 10:	`λ`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`A`	`11`	`12`	`13`	`14`	`15`	`16`	`...`
decimal:	`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`10`	`11`	`12`	`13`	`14`	`15`	`16`	`...`
bijective base 12:	`λ`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`A`	`B`	`C`	`11`	`12`	`13`	`14`	`...`
duodecimal:	`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`A`	`B`	`10`	`11`	`12`	`13`	`14`	`...`
bijective base 16:	`λ`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`A`	`B`	`C`	`D`	`E`	`F`	`G`	`...`
hexadecimal:	`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`A`	`B`	`C`	`D`	`E`	`F`	`10`	`...`

Examples

34152 (in bijective base-5) = 3×5⁴ + 4×5³ + 1×5² + 5×5¹ + 2×1 = 2427 (in decimal).

119A (in bijective base-10, with "A" representing the digit value ten) = 1×10³ + 1×10² + 9×10¹ + 10×1 = 1200 (in decimal).

A typical alphabetic list with more than 26 elements is bijective, using the order of A, B, C...X, Y, Z, AA, AB, AC...ZX, ZY, ZZ, AAA, AAB, AAC...

The bijective base-10 system

The bijective base-10 system is a base ten positional numeral system that does not use a digit to represent zero. It instead has a digit to represent ten, such as A.

As with conventional decimal, each digit position represents a power of ten, so for example 123 is "one hundred, plus two tens, plus three units." All positive integers that are represented solely with non-zero digits in conventional decimal (such as 123) have the same representation in the bijective base-10 system. Those that use a zero must be rewritten, so for example 10 becomes A, conventional 20 becomes 1A, conventional 100 becomes 9A, conventional 101 becomes A1, conventional 302 becomes 2A2, conventional 1000 becomes 99A, conventional 1110 becomes AAA, conventional 2010 becomes 19AA, and so on.

Addition and multiplication in this system are essentially the same as with conventional decimal, except that carries occur when a position exceeds ten, rather than when it exceeds nine. So to calculate 643 + 759, there are twelve units (write 2 at the right and carry 1 to the tens), ten tens (write A with no need to carry to the hundreds), thirteen hundreds (write 3 and carry 1 to the thousands), and one thousand (write 1), to give the result 13A2 rather than the conventional 1402.

The bijective base-26 system

In the bijective base-26 system one may use the Latin alphabet letters "A" to "Z" to represent the 26 digit values one to twenty-six. (A=1, B=2, C=3, ..., Z=26)

With this choice of notation, the number sequence (starting from 1) begins A, B, C, ..., X, Y, Z, AA, AB, AC, ..., AX, AY, AZ, BA, BB, BC, ...

Each digit position represents a power of twenty-six, so for example, the numeral WI represents the value 23 × 26¹ + 9 × 26⁰ = 607 in base 10.

Many spreadsheets including Microsoft Excel use this system to assign labels to the columns of a spreadsheet, starting A, B, C, ..., Z, AA, AB, ..., AZ, BA, ..., ZZ, AAA, etc. For instance, in Excel 2013, there can be up to 16384 columns (2¹⁴ in binary code), labeled from A to XFD.^[3] Malware variants are also named using this system: for example, the first widespread Microsoft Word macro virus, Concept, is formally named WM/Concept.A, its 26th variant WM/Concept.Z, the 27th variant WM/Concept.AA, et seq. A variant of this system is used to name variable stars.^[4] It can be applied to any problem where a systematic naming using letters is desired, while using the shortest possible strings.

Historical notes

The fact that every non-negative integer has a unique representation in bijective base-k (k ≥ 1) is a "folk theorem" that has been rediscovered many times. Early instances are Foster (1947) for the case k = 10, and Smullyan (1961) and Böhm (1964) for all k ≥ 1. Smullyan uses this system to provide a Gödel numbering of the strings of symbols in a logical system; Böhm uses these representations to perform computations in the programming language P′′. Knuth (1969) mentions the special case of k = 10, and Salomaa (1973) discusses the cases k ≥ 2. Forslund (1995) appears to be another rediscovery, and hypothesizes that if ancient numeration systems used bijective base-k, they might not be recognized as such in archaeological documents, due to general unfamiliarity with this system.

Notes

^ "How many digits are in the bijective base-k numeral for n?". Stackexchange. Retrieved 22 September 2018.
^ Forslund (1995).
^ Harvey, Greg (2013), Excel 2013 For Dummies, John Wiley & Sons, ISBN 9781118550007.
^ Hellier, Coel (2001), "Appendix D: Variable star nomenclature", Cataclysmic Variable Stars - How and Why They Vary, Praxis Books in Astronomy and Space, Springer, p. 197, ISBN 9781852332112.

References

Böhm, C. (July 1964), "On a family of Turing machines and the related programming language", ICC Bulletin, 3: 191.
Forslund, Robert R. (1995), "A logical alternative to the existing positional number system", Southwest Journal of Pure and Applied Mathematics, 1: 27–29, MR 1386376, S2CID 19010664.
Foster, J. E. (1947), "A number system without a zero symbol", Mathematics Magazine, 21 (1): 39–41, doi:10.2307/3029479, JSTOR 3029479.
Knuth, D. E. (1969), The Art of Computer Programming, Vol. 2: Seminumerical Algorithms (1st ed.), Addison-Wesley, Solution to Exercise 4.1-24, p. 195. (Discusses bijective base-10.)
Salomaa, A. (1973), Formal Languages, Academic Press, Note 9.1, pp. 90–91. (Discusses bijective base-k for all k ≥ 2.)
Smullyan, R. (1961), "9. Lexicographical ordering; n-adic representation of integers", Theory of Formal Systems, Annals of Mathematics Studies, vol. 47, Princeton University Press, pp. 34–36.

[1] "How many digits are in the bijective base-k numeral for n?". Stackexchange. Retrieved 22 September 2018.

[FOOTNOTEForslund1995-2] Forslund (1995).

[3] Harvey, Greg (2013), Excel 2013 For Dummies, John Wiley & Sons, ISBN 9781118550007.

[4] Hellier, Coel (2001), "Appendix D: Variable star nomenclature", Cataclysmic Variable Stars - How and Why They Vary, Praxis Books in Astronomy and Space, Springer, p. 197, ISBN 9781852332112.

[1]

[2]

[3]

[4]