Variable-length quantity

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer. It was defined for use in the standard MIDI file format[1] to save additional space for a resource constrained system, and is also used in the later Extensible Music Format (XMF). A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. See the example below.

Base-128 is also used in ASN.1 BER encoding to encode tag numbers and Object Identifiers.[2] It is also used in the WAP environment, where it is called variable length unsigned integer or uintvar. The DWARF debugging format[3] defines a variant called LEB128 (or ULEB128 for unsigned numbers), where the least significant group of 7 bits are encoded in the first byte and the most significant bits are in the last byte (so effectively it is the little-endian analog of a variable-length quantity). Google's protocol buffers use a similar format to have compact representation of integer values,[4] as does Oracle's Portable Object Format (POF)[5] and the Microsoft .NET Framework's "7-bit encoded int" in the BinaryReader and BinaryWriter classes.[6]

It's also used extensively in web browsers for source mapping - which contain a lot of integer line & column number mappings - to keep the size of the map to a minimum.[7]

General structure[edit]

The encoding assumes an octet (an eight-bit byte) where the most significant bit (MSB), also commonly known as the sign bit, is reserved to indicate whether another VLQ octet follows.

VLQ Octet
7 6 5 4 3 2 1 0
27 26 25 24 23 22 21 20
A Bn

If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.

B is a 7-bit number [0x00, 0x7F] and n is the position of the VLQ octet where B0 is the least significant. The VLQ octets are arranged most significant first in a stream.

Removing Redundancy[edit]

With the VLQ encoding described above, any number that can be encoded with N octets can also be encoding with more than N octets simply by prepending additional 0x80 octets. For example, the number 358 can be written as a 2-octet VLQ 0x8166 or as the 3-octet VLQ 0x808166 or as a 4-octet VLQ 0x80808166 and so forth. The VLQ format used in Git removes this redundancy and extends the representable range of smaller VLQs by adding an offsets to 2-octet and larger VLQs so that the lowest possible value for an (N+1)-octet VLQ is exactly one more than the maximum possible value for an N-octet VLQ. Since a 1-octet VLQ can store a maximum value of 127, the minimum 2-octet VLQ (0x8000) is assigned the value 128 instead of 0. This means that the maximum value 2-octet VLQ (0xff7f) is 16511 instead of just 16383. The minimum 3-octet VLQ (0x808000) has a value of 16512 instead of zero, which means that the maximum 3-octet VLQ (0xffff7f) is 2113663 instead of just 2097151. And so forth.

Other variants[edit]

In the data format for Unreal Packages used by the Unreal Engine, a variable length quantity scheme called Compact Indices[8] is used. The only difference in this encoding is that the first VLQ has the sixth binary digit reserved to indicate whether the encoded integer is positive or negative.

First VLQ Octet
7 6 5 4 3 2 1 0
27 26 25 24 23 22 21 20
A B Cn

If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.

If B is 0, then the VLQ represents a positive integer. If B is 1, then the VLQ represents a negative number.

C is a 6-bit number [0x00, 0x3F] and n is the position of the VLQ octet where C0 is the least significant. The VLQ octets are arranged most significant first in a stream.

Any consecutive VLQ octet follows the general structure.

Examples[edit]

Diagram showing how to convert 106,903 from decimal to uintvar representation

Here is a worked out example for the decimal number 137:

  • Represent the value in binary notation (e.g. 137 as 10001001)
  • Break it up in groups of 7 bits starting from the lowest significant bit (e.g. 137 as 0000001 0001001). This is equivalent to representing the number in base 128.
  • Take the lowest 7 bits and that gives you the least significant byte (0000 1001). This byte comes last.
  • For all the other groups of 7 bits (in the example, this is 000 0001), set the MSB to 1 (which gives 1000 0001 in our example). Thus 137 becomes 1000 0001 0000 1001 where the bits in boldface are something we added. These added bits denote if there is another byte to follow or not. Thus, by definition, the very last byte of a variable length integer will have 0 as its MSB.

Another way to look at this is to represent the value in base-128, and then set the MSB of all but the last base-128 digit to 1.

The Standard MIDI File format specification gives more examples:[1][9]

Integer Variable-length quantity
0x00000000 0x00
0x0000007F 0x7F
0x00000080 0x81 0x00
0x00002000 0xC0 0x00
0x00003FFF 0xFF 0x7F
0x00004000 0x81 0x80 0x00
0x001FFFFF 0xFF 0xFF 0x7F
0x00200000 0x81 0x80 0x80 0x00
0x08000000 0xC0 0x80 0x80 0x00
0x0FFFFFFF 0xFF 0xFF 0xFF 0x7F

References[edit]

External links[edit]