Binary scaling

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Binary scaling is a computer programming technique used typically in embedded C, DSP and assembler programs to implement non-integer operations by using the native integer arithmetic of the processor.


A representation of a value using binary scaling is more precise than a floating-point representation occupying the same number of bits, but typically represents values of a more limited range, therefore more easily leading to arithmetic overflow during computation. Implementation of operations using integer arithmetic instructions is often (but not always) faster than the corresponding floating-point instructions.

A position for the 'binary point' is chosen for each variable to be represented, and binary shifts associated with arithmetic operations are adjusted accordingly. The binary scaling corresponds in Q (number format) to the first digit, i.e. Q1.15 is a 16 bit integer scaled with one bit as integer and fifteen as fractional. A Bscal 1 or Q1.15 number would represent approximately 0.999 to −1.0.

To give an example, a common way to use integer arithmetic to simulate floating point, using 32-bit numbers, is to multiply the coefficients by 65536.

Using binary scientific notation, this will place the binary point at B16. That is to say, the most significant 16 bits represent the integer part the remainder are represent the fractional part. This means, as a signed two's complement integer B16 number can hold a highest value of and a lowest value of −32768.0. Put another way, the B number, is the number of integer bits used to represent the number which defines its value range. Remaining low bits (i.e. the non-integer bits) are used to store fractional quantities and supply more accuracy.

For instance, to represent 1.2 and 5.6 as B16 one multiplies them by 216, giving 78643 and 367001 as the closest integers.

Multiplying these together gives


To convert it back to B16, divide it by 216.

This gives 440400B16, which when converted back to a floating-point number (by dividing again by 216, but holding the result as floating point) gives 6.71997 approximately. The correct result is 6.72.

Re-scaling after multiplication[edit]

The example above for a B16 multiplication is a simplified example. Re-scaling depends on both the B scale value and the word size. B16 is often used in 32 bit systems because it works simply by multiplying and dividing by 65536 (or shifting 16 bits).

Consider the Binary Point in a signed 32 bit word thus:

0 1 2 3 4 5 6 7 8 9
 S X X X X X X X   X X X X X X X X   X X X X X X X X   X X X X X X X X

where S is the sign bit and X are the other bits.

Placing the binary point at

  • 0 gives a range of −1.0 to 0.999999.
  • 1 gives a range of −2.0 to 1.999999
  • 2 gives a range of −4.0 to 3.999999 and so on.

When using different B scalings and/or word sizes the complete B scaling conversion formula must be used.

Consider a 32 bit word size, and two variables, one with a B scaling of 2 and the other with a scaling of 4.

1.4 @ B2 is 1.4 * (2 ^ (wordsize-2-1)) == 1.4 * 2 ^ 29 == 0x2CCCCCCD

Note that here the 1.4 values is very well represented with 30 fraction bits. A 32 bit floating-point number has 23 bits to store the fraction in. This is why B scaling is always more accurate than floating point of the same word size. This is especially useful in integrators or repeated summing of small quantities where rounding error can be a subtle but very dangerous problem when using floating point.

Now a larger number 15.2 at B4.

15.2 @ B4 is 15.2 * (2 ^ (wordsize-4-1)) == 15.2 * 2 ^ 27 == 0x7999999A

The number of bits to store the fraction is 28 bits. Multiplying these 32 bit numbers give the 64 bit result 0x1547AE14A51EB852

This result is in B7 in a 64 bit word. Shifting it down by 32 bits gives the result in B7 in 32 bits.


To convert back to floating point, divide this by (2^(wordsize-7-1)) == 21.2800000099

Various scalings may be used. B0 for instance can be used to represent any number between −1 and 0.999999999.

Binary angles[edit]

Binary scaling (B0) representation of angles. Black is traditional degrees representation, green is BAM as a decimal number and red is hexadecimal 32 bit representation of the BAM.

Binary angles are mapped using B0, with 0 as 0 degrees, 0.5 as 90° (or ), −1.0 or 0.9999999 as 180° (or π rad), and −0.5 as 270° (or ). When these binary angles are added using normal two's complement mathematics, the rotation of the angles is correct, even when crossing the sign boundary; this conveniently does away with checks for angles ≥ 360° when handling ordinary angles[1] (data that allow angles with more than one rotation must use some other encoding).

The terms binary angular measurement (BAM)[2] and binary angular measurement system (BAMS)[3] as well as brads (binary radians or binary degree) refer to implementations of binary angles. They find use in robotics, navigation,[4] computer games,[5] and digital sensors.[6]

No matter what bit-pattern is stored in a binary angle, when it is multiplied by 180° (or π rad) using standard signed fixed-point arithmetic, the result is always a valid angle in the range of −180 degrees (−π radians) to +180 degrees ( radians). In some cases, it is convenient to use unsigned multiplication (rather than signed multiplication) on a binary angle, which gives the correct angle in the range of 0 to +360 degrees (+2π radians or +1 turn). Compared to storing angles in a binary angle format, storing angles in any other format inevitably results in some bit patterns giving "angles" outside that range, requiring extra steps to range-reduce the value to the desired range, or results in some bit patterns that are not valid angles at all (NaN), or both.

Application of binary scaling techniques[edit]

Binary scaling techniques were used in the 1970s and 1980s for real-time computing that was mathematically intensive, such as flight simulation and in Nuclear Power Plant control algorithms since the late 1960s. The code was often commented with the binary scalings of the intermediate results of equations.

Binary scaling is still used in many DSP applications and custom made microprocessors are usually based on binary scaling techniques. The Binary angular measurement is used in the STM32G4 series built in CORDIC co-processors.

Binary scaling is currently used in the DCT used to compress JPEG images in utilities such as GIMP.

Although floating point has taken over to a large degree, where speed and extra accuracy are required, binary scaling works on simpler hardware and is more accurate when the range of values is known in advance and is sufficiently limited. This is because all the bits in a binary scaled integer are used for the precision of the value (though there may be leading zeros if the range of values is large), whereas in floating point, some bits are used to define the scaling.

See also[edit]


  1. ^ Hargreaves, Shawn. "Angles, integers, and modulo arithmetic". Archived from the original on 2019-06-30. Retrieved 2019-08-05.
  2. ^ "Binary angular measurement". Archived from the original on 2009-12-21.
  3. ^ "Binary Angular Measurement System". acronyms.thefreedictionary.
  4. ^ LaPlante, Phillip A. (2004). "Chapter 7.5.3, Binary Angular Measure". Real-Time Systems Design and Analysis.
  5. ^ Sanglard, Fabien (2010-01-13). "Doom 1993 code review - Section "Walls"".
  6. ^ "Hitachi HM55B Compass Module (#29123)" (PDF). Parallax Digital Compass Sensor (#29123). Parallax, Inc. May 2005. Archived from the original (PDF) on 2011-07-11 – via