Q (number format)

Q is a fixed point number format where the number of fractional bits (and optionally the number of integer bits) is specified. For example, a Q15 number has 15 fractional bits; a Q1.14 number has 1 integer bit and 14 fractional bits. Q format is often used in hardware that does not have a floating-point unit and in applications that require constant resolution.

Characteristics

Because Q format numbers are fixed point, they can be stored and operated on as integers. The number of fractional bits and the underlying integer size are chosen on an application-specific basis, depending on the range and resolution needed.

The notation used is Qm.n, where:

Q designates that the number is in Q format notation — the Texas Instruments representation for signed fixed-point numbers.
m (optional; default=0) is the number of bits used to designate the two's complement integer portion of the number, exclusive of the sign bit.
n is the number of bits used to designate the two's complement fractional portion of the number, i.e. the number of bits to the right of the binary point.

Note that the most significant bit is always designated as the sign bit (the number is stored as a two's complement number). Representing a signed fixed-point data type in Q format therefore always requires m+n+1 bits to account for the sign.

For a given Qm.n format, using an m+n+1 bit signed integer container with n fractional bits:

its range is [-2^m, 2^m - 2^-n]
its resolution is 2^-n

For example, a Q14.1 format number:

requires 14+1+1 = 16 bits
its range is [-2¹⁴, 2¹⁴ - 2^-1] = [-16384.0, +16383.5] = [0x8000, 0x8001 … 0xFFFF, 0x0000, 0x0001 … 0x7FFE, 0x7FFF]
its resolution is 2^-1 = 0.5

Unlike floating point, the resolution will remain constant over the entire range.

Conversion

Float to Q

To convert a number from floating point to Qm.n format:

Multiply the floating point number by 2ⁿ
Round to the nearest integer

Q to Float

To convert a number from Qm.n format to floating point:

Convert the number directly to floating point
Divide by 2ⁿ

Math operations

Q numbers are a ratio of two integers: the numerator is kept in storage, the denominator is equal to 2ⁿ.

Consider the following example:

The Q8 denominator equals 2⁸ = 256

1.5 equals 384/256

384 is stored, 256 is inferred because it is a Q8 number.

If the Q numbers base is to be maintained (n remains constant) the Q number math operations must keep the denominator constant.

\textstyle {\frac {num1}{d}}+{\frac {num2}{d}}={\frac {num1+num2}{d}}

\textstyle {\frac {num1}{d}}-{\frac {num2}{d}}={\frac {num1-num2}{d}}

(\textstyle {\frac {num1}{d}}\times {\frac {num2}{d}})\times d={\frac {num1\times num2}{d}}

(\textstyle {\frac {num1}{d}}/{\frac {num2}{d}})/d={\frac {num1/num2}{d}}

Because the denominator is a power of two the multiplication can be implemented as an arithmetic shift to the left and the division as an arithmetic shift to the right; on many processors shifts are faster than multiplication and division.

To maintain accuracy the intermediate multiplication and division results must be double precision and care must be taken in rounding the intermediate result before converting back to the desired Q number.

Using C the operations are (note that here, Q refers to the fractional part's number of bits) :

Addition

signed int a,b,result;
result = a+b;

Subtraction

signed int a,b,result;
result = a-b;

Multiplication

// precomputed value:
#define K   (1 << (Q-1))

signed int       a, b, result;
signed long int  temp;
temp = (long int) a * (long int) b; // result type is operand's type
// Rounding; mid values are rounded up
temp += K;
// Correct by dividing by base
result= temp >> Q;

Division

signed int a,b,result;
signed long int temp;
// pre-multiply by the base
temp = a<<Q;
// So the result will be rounded ; mid values are rounded up.
temp = temp+b/2;
result = temp/b;

External links

Fixed Point Representation And Fractional Math (Note: the accuracy of the article is in dispute; see discussion.)