Binary lambda calculus
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
Binary lambda calculus (BLC) is a minimal, pure functional programming language invented by John Tromp in 2004, based on a binary encoding of the untyped lambda calculus in De Bruijn index notation.
BLC is designed to provide a very simple and elegant concrete definition of descriptional complexity (Kolmogorov complexity), where the complexity of an object is the length of its shortest description.
This is made this precise by identifying a description method with a computable function that transforms bitstrings (descriptions) into objects. Objects are usually also just bitstrings, but can have additional structure as well, e.g., pairs of strings.
Originally, Turing machines, the most well known formalism for computation, were used for this purpose. But they are somewhat lacking in ease of construction and composability. Another classical computational formalism, the Lambda calculus, offers distinct advantages in ease of use. BLC is the result of incorporating a notion of binary I/O into lambda calculus, so as to turn it into an effective description method.
Binary strings in BLC
BLC represents bits 0 and 1 are as the standard lambda booleans B0 = True and B1 = False:
- True =
- False =
which can be seen to directly implement the if-then-else operator.
The standard pairing function
applied to two terms M and N
can be applied to a boolean to yield the desired component of choice.
BLC represents a string s = b0b1…bn−1 by repeated pairing as
- which is denoted as .
The z works as a list continuation, that could be a nil list (to end the string) or another string (that would be appended to the original string).
Delimited versus undelimited
Descriptional complexity comes in two distinct flavors, depending on whether the input is considered to be delimited.
Knowing the end of your input makes it easier to describe objects. For instance, you can just copy the whole input to output. This flavor is called plain or simple complexity.
But in a sense it is additional information. A file system for instance needs to separately store the length of files. The C language uses the null character to denote the end of a string, but this comes at the cost of not having that character available within strings.
The other flavor is called prefix complexity, named after prefix codes, where the machine needs to figure out, from the input read so far, whether it needs to read more bits. We say that the input is self-delimiting. This works better for communication channels, since one can send multiple descriptions, one after the other, and still tell them apart.
In the I/O model of BLC, the flavor is dictated by the choice of z. When kept as a free variable, and required to appear as part of the output, then the machine must be working in a self-delimiting manner. If on the other hand z is a lambda term specifically designed to be easy to distinguish from any pairing, then the input becomes delimited. BLC chooses False for this purpose but gives it the more descriptive alternative name of Nil. Dealing with lists that may be Nil is straightforward: since
- , and
one can write functions M and N to deal with the two cases, the only caveat being that N will be passed to M as its third argument.
One can find a description method U such that for any other description method D, there is a constant c (depending only on D) such that no object takes more than c extra bits to describe with method U than with method D. BLC is designed to make these constants relatively small. In fact the constant will be the length of a binary encoding of a D-interpreter written in BLC, and U will be a lambda term that parses this encoding and runs this decoded interpreter on the rest of the input. U won't even have to know whether the description is delimited or not; it works the same either way.
BLC not only represents bitstrings as lambda calculus terms, but the other way around as well.
First, lambda terms are written in a particular notation using what is known as De Bruijn indices. The encoding is then defined recursively as follows
For instance, the pairing function is written in De Bruijn format, which has encoding .
A closed lambda term is one in which all variables are bound, i.e. without any free variables. In De Bruijn format, this means that an index i can only appear within at least i nested lambdas. The number of closed terms of size n bits is given by sequence A114852 of the On-Line Encyclopedia of Integer Sequences.
The shortest possible closed term is the identity function . In delimited mode, this machine just copies its input to its output.
The universal machine U in BLC is then, in De Bruijn format (all indices are single digit):
This is in binary:
- (only 232 bits (29 bytes) long)
A detailed analysis of machine U may be found in.
In general, complexity of an object can be conditional on several other objects that are provided as additional argument to the universal machine. BLC defines Plain (or simple) complexity KS and prefix complexity KP by
The identity program proves that
The program proves that
where is the Levenstein code for x defined by
in which we identify numbers and bitstrings according to lexicographic order. This code has the nice property that for all k,
Furthermore, it makes lexicographic order of delimited numbers coincide with numeric order.
|4||01||1110 0 00|
|5||10||1110 0 01|
|6||11||1110 0 10|
|7||000||1110 0 11|
|8||001||1110 1 000|
|9||010||1110 1 001|
The halting probability of the prefix universal machine is defined as the probability it will output any term that has a closed normal form (this includes all translated strings):
With some effort, we can determine the first 4 bits of this particular number of wisdom:
where probability .00012 = 2−4 is already contributed by programs 00100 and 00101 for terms True and False.
BLC8: byte sized I/O
While bit streams are nice in theory, they fare poorly in interfacing with the real world. The language BLC8 is a more practical variation on BLC in which programs operate on a stream of bytes, where each byte is represented as a delimited list of 8 bits in big-endian order.
BLC in the IOCCC 2012
- John Tromp, Binary Lambda Calculus and Combinatory Logic, in Randomness And Complexity, from Leibniz To Chaitin, ed. Cristian S. Calude, World Scientific Publishing Company, October 2008. (The last reference, to an initial Haskell implementation, is dated 2004) (pdf version)