Scale factor (computer science)

From Wikipedia, the free encyclopedia

In computer science, a scale factor is a number used as a multiplier to represent a number on a different scale, functioning similarly to an exponent in mathematics. A scale factor is used when a real-world set of numbers needs to be represented on a different scale in order to fit a specific number format. Although using a scale factor extends the range of representable values, it also decreases the precision, resulting in rounding error for certain calculations.


Certain number formats may be chosen for an application for convenience in programming, or because of certain advantages offered by the hardware for that number format. For instance, early processors did not natively support floating-point arithmetic for representing fractional values, so integers were used to store representations of the real world values by applying a scale factor to the real value. Similarly, because hardware arithmetic has a fixed width (commonly 16, 32, or 64 bits, depending on the data type), scale factors allow representation of larger numbers (by manually multiplying or dividing by the specified scale factor), though at the expense of precision.[1] By necessity, this was done in software, since the hardware did not support fractional value. Scale factors are also used in floating-point numbers, and most commonly are powers of two. For example, the double-precision format sets aside 11 bits for the scaling factor (a binary exponent) and 53 bits for the significand, allowing various degrees of precision for representing different ranges of numbers, and expanding the range of representable numbers beyond what could be represented using 64 explicit bits (though at the cost of precision).[2]

As an example of where precision is lost, a 16-bit unsigned integer (uint16) can only hold a value as large as 65,53510. If unsigned 16-bit integers are used to represent values from 0 to 131,07010, then a scale factor of 12 would be introduced, such that the scaled values correspond exactly to the real-world even integers. As a consequence, for example, the number 3 cannot be represented, because a stored 1 represents a real-world 2, and a stored 2 represents a real-world 4; there are not enough bits available to avoid this error in this representation.

Operations on scaled values[edit]

Once the scaled representation of a real value is stored, the scaling can often be ignored until the value needs to come back into the "real world". For instance, adding two scaled values is just as valid as unscaling the values, adding the real values, and then scaling the result, and the former is much easier and faster. In either approach, though, the two added numbers must be scaled the same.[3] For other operations, the scaling is very important.

Multiplication, for instance, needs to take into account that both numbers are scaled. As an example, consider two real world values A and B. The real world multiplication of these real world values is:

A * B = P

If they are instead represented with a scale factor of Z, and these scaled representations are subsequently multiplied, the result is the following:

AZ * BZ = Q

AZ is the scaled real world value of A, or simply the product of A * Z, and likewise, BZ is the scaled representation of B. After the scaled multiplication, the answer is not written PZ, because the value stored in PZ is not the answer. This can be seen by rearranging the statement, where each line in the following is equivalent:

AZ * BZ = Q
A * Z * B * Z = Q
(A * B) * Z * Z = Q
P * Z * Z = Q
PZ * Z = Q

In line 4, P substitutes A * B. It follows that the result of AZ * BZ (which is Q) is not PZ, but rather PZ * Z. If PZ were the answer, it could be stored directly since it has the scale factor built in, as is the case with addition and subtraction. For multiplication, however, the product of two scaled values has an extra scaling built in. As long as this is taken into account, there is still no need to convert AZ and BZ into A and B before performing the operation; the result must be divided by Z before storing it back. After this, PZ will be stored as the result of the multiplication, which is indeed the scaled representation of the result of A * B (the desired answer) rather than the result of AZ * BZ (which is still scaled).

Common scaling scenarios[edit]

Fractional values scaled to integers[edit]

As previously described, many older processors (and possibly some current ones) do not natively support fractional arithmetic. In this case, fractional values can be scaled into integers by multiplying them by ten to the power of whatever decimal precision is desired. In other words, to preserve n digits to the right of the decimal point, it is necessary to multiply the entire number by 10n. In computers, which perform calculations in binary, the real number is multiplied by 2m to preserve m digits to the right of the binary point; alternatively, one can bit shift the value m places to the left. For example, in the following set of real world fractional values, all have three digits to the right of the decimal point:

15.400, 0.133, 4.650, 1.000, 8.001

To save all of that information (in other words, not lose any precision), these numbers must be multiplied by 103 (1,000), giving integer values of:

15400, 133, 4650, 1000, 8001

Because of the value of the scaled numbers, they cannot be stored in 8bit integers; they will require at least 14 unsigned bits, or, more realistically, 16.

Integer values to fractions[edit]

Certain processors, particularly DSPs common in the embedded system industry, have built in support for the fixed-point arithmetic, such as Q and IQ formats.

Since the fractional part of a number takes up some bits in the field, the range of values possible in a fixed9point value is less than the same number of bits would provide to an integer.[4] For instance, in an 8 bit field, an unsigned integer can store values from [0, 255], but an unsigned fixed-point with 5 bits allocated to the fractional part only has 3 bits left over for the integer value, and so can only store integer values from [0, 7]. (The number of distinct values that the two fields can store is the same, 28 = 256, because the fixed-point field can also store 32 fractional values for each integer value.) It is therefore common that a scaling factor is used to store real world values that may be larger than the maximum value of the fixed-point format.

As an example, when using an unsigned 8-bit fixed-point format (which has 4 integer bits and 4 fractional bits), the highest representable integer value is 15, and the highest representable mixed value is 15.9375 (0xF.F or 1111.1111b). If the desired real world values are in the range [0,160], they must be scaled to fit within this fixed-point representation. A scale factor of 110 cannot be used here, because scaling 160 by 110 gives 16, which is greater than the greatest value that can be stored in this fixed-point format. However, 111 will work as a scale factor, because the maximum scaled value, 16011 = 14.54, fits within this range. Given this set:

154, 101, 54, 3, 0, 160

Scaling these with the scale factor 111 gives the following values:

154/11 = 14
101/11 = 9.1818...
54/11 = 4.9090...
3/11 = 0.2727...
0/11 = 0
160/11 = 14.5454...

Many of these values have been truncated because they contain repeating decimals, which follows from the chosen scale factor (elevenths do not terminate in decimal). When storing these in our fixed-point format, some precision will be lost (in contrast to the exact values of the original integers). This is also a problem because an 8-bit format can store 256 different values, but the numbers in this set are from a range with only 161 possible values (0 through 160). As it turns out, the problem was the scale factor, 111, which introduced unnecessary precision requirements and rounding error (when approximating a real value with the nearest representable value).[5] To avoid or resolve this problem, one must choose a better scale factor.

Choosing a scale factor[edit]

The example above illustrates how certain scale factors can cause unnecessary precision loss or rounding error, highlighting the importance of choosing the right scale factor. Using the scale factor of 111 and converting to binary representations, the following values are obtained:

154/11 = 14 = 1110.0
101/11 = 9.1818... = 1001.00101110...
54/11 = 4.9090... = 100.111010...
3/11 = 0.2727... = 0.010010...
0/11 = 0 = 0.0
160/11 = 14.5454... = 1110.10010...

Several of the binary fractions require more than the four fractional bits provided by the set fixed-point format. (This is in part because elevenths do not terminate in binary either.) To fit them into the fields (4 integer and 4 fractional bits), it is possible to truncate the remaining bits, giving the following stored representations:


Or in decimal:


When they are called back into the real world, they are divided by the scale factor, 111. This is the inverse of the original scaling, giving the following "real world" values:


These values are not equivalent to the originals (before scaling down and fitting into this 8-bit representation). Most noticeably, they are not all integers anymore, immediately indicating that an error was introduced in the storage, due to a poor choice of scaling factor.

Choosing a better scale factor[edit]

Most data sets will not have a perfect scale factor; most likely, there will be some error introduced by the scaling process. However, it may be possible to choose a better scale factor. The ideal scale factor may not be the smallest, but rather one that preserves as much precision as possible.

Dividing a number by a power of two is the same as shifting all the bits to the right once for each power of two. (This is the binary equivalent to shifting all decimal digits to the left or right when, respectively, multiplying or dividing by powers of ten.) The pattern of bits does not change, it just moves the number of places equal to the binary exponent (for instance, 3 places to the right when dividing by 8 = 23). On the other hand, when dividing by a number that is not an integer power of two in binary, the bit pattern changes. This is likely to produce a bit pattern with more bits to the right of the binary point, artificially introducing required precision. This is especially true when the fractional part has a denominator that is not a power of two, as all fractions not reciprocals of powers of two recur in binary.[6] Therefore, it is almost always preferable to use a scale factor that is a power of two. It may still be possible to lose bits that get shifted right off the end of the field as a result of truncation, but this avoids introducing new bits that will be imprecise (due to rounding error) or truncated.[6]

As an illustration the use of powers of two in the scale factor, a scale factor of 116 can be applied to the above data set. The binary values for the original data set are given below:

154 = 1001 1010
101 = 0110 0101
54 =  0011 0110
3 =   0000 0011
0 =   0000 0000
160 = 1010 0000

Being integers between 0 and 255, these all can be represented precisely with 8 bits. Scaling these by 116 is the same as dividing by 16, which is the same as shifting the bits 4 places to the right. In this case, scaling is done by inserting a binary point between the first 4 bits and last 4 bits of each number. That happens to equal the predetermined format of this representation. Consequently, since all these numbers do not require more than 8 bits to represent them as integers, no more than 8 bits are required to scale them down and store them in a fixed-point format.

See also[edit]


  1. ^ Linz & Wang 2003, pp. 12–13.
  2. ^ Linz & Wang 2003, pp. 14–15.
  3. ^ Yates 2013, p. 6.
  4. ^ Yates 2013, pp. 4–5.
  5. ^ Linz & Wang 2003, p. 18.
  6. ^ a b "Binary Fractions". Retrieved 6 July 2020.