Saturation arithmetic

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Saturation arithmetic is a version of arithmetic in which all operations such as addition and multiplication are limited to a fixed range between a minimum and maximum value.

If the result of an operation is greater than the maximum, it is set ("clamped") to the maximum; if it is below the minimum, it is clamped to the minimum. The name comes from how the value becomes "saturated" once it reaches the extreme values; further additions to a maximum or subtractions from a minimum will not change the result.

For example, if the valid range of values is from -100 to 100, the following operations produce the following values:

  • 60 + 30 = 90
  • 60 + 43 = 100
  • (60 + 43) − (75 + 75) = -47
  • 10 × 11 = 100
  • 99 × 99 = 100
  • 30 × (5 − 1) = 100
  • 30 × 5 − 30 × 1 = 70

As can be seen from these examples, familiar properties like associativity and distributivity may fail in saturation arithmetic.[1] This makes it unpleasant to deal with in abstract mathematics, but it has an important role to play in digital hardware and algorithms.

Modern use[edit]

Typically, general-purpose microprocessors do not implement integer arithmetic operations using saturation arithmetic; instead, they use the easier-to-implement modular arithmetic, in which values exceeding the maximum value "wrap around" to the minimum value, like the hours on a clock passing from 12 to 1. In hardware, modular arithmetic with a minimum of zero and a maximum of rn-1, where r is the radix can be implemented by simply discarding all but the lowest n digits. For binary hardware, which the vast majority of modern hardware is, the radix is 2 and the digits are bits.

However, although more difficult to implement, saturation arithmetic has numerous practical advantages. The result is as numerically close to the true answer as possible; for 8-bit binary signed arithmetic, when the correct answer is 130, it is considerably less surprising to get an answer of 127 from saturating arithmetic than to get an answer of −126 from modular arithmetic. Likewise, for 8-bit binary unsigned arithmetic, when the correct answer is 258, it is less surprising to get an answer of 255 from saturating arithmetic than to get an answer of 2 from modular arithmetic.

Saturation arithmetic also enables overflow of additions and multiplications to be detected consistently without an overflow bit or excessive computation, by simple comparison with the maximum or minimum value (provided the datum is not permitted to take on these values).

Additionally, saturation arithmetic enables efficient algorithms for many problems, particularly in digital signal processing. For example, adjusting the volume level of a sound signal can result in overflow, and saturation causes significantly less distortion to the sound than wrap-around. In the words of researchers G. A. Constantinides et al.:

When adding two numbers using two’s complement representation, overflow results in a ‘wrap-around’ phenomenon. The result can be a catastrophic loss in signal-to-noise ratio in a DSP system. Signals in DSP designs are therefore usually either scaled appropriately to avoid overflow for all but the most extreme input vectors, or produced using saturation arithmetic components.[2]

Saturation arithmetic operations are available on many modern platforms, and in particular was one of the extensions made by the Intel MMX platform, specifically for such signal processing applications. This functionality is also available in wider versions in the SSE2 and AVX2 integer instruction sets.

Saturation arithmetic for integers has also been implemented in software for a number of programming languages including C, C++, such as the GNU Compiler Collection,[3] and Eiffel. This helps programmers anticipate and understand the effects of overflow better. On the other hand, saturation is challenging to implement efficiently in software on a machine with only modular arithmetic operations, since simple implementations require branches that create huge pipeline delays. However, it is possible to implement saturating addition and subtraction in software without branches, using only modular arithmetic and bitwise logical operations that are available on all modern CPUs and their predecessors, including all x86 CPUs (back to the original Intel 8086) and some popular 8-bit CPUs (some of which, such as the Zilog Z80, are still in production). (However, on simple 8-bit and 16-bit CPUs, a branching algorithm might actually be faster if programmed in assembly, since there are no pipelines to stall and each instruction always takes multiple clock cycles.)

Although saturation arithmetic is less popular for integer arithmetic in hardware, the IEEE floating-point standard, the most popular abstraction for dealing with approximate real numbers, uses a form of saturation in which overflow is converted into "infinity" or "negative infinity", and any other operation on this result continues to produce the same value. This has the advantage over simple saturation that later operations which decrease the value will not end up producing a misleadingly "reasonable" result, such as in the computation . Alternatively, there may be special states such as "exponent overflow" (and "exponent underflow") that will similarly persist through subsequent operations, or cause immediate termination, or be tested for as in IF ACCUMULTOR OVERFLOW ... as in FORTRAN for the IBM704 (October 1956)

DSP & GPU Support[edit]

  • The VideoCore GPU system used on many mobile telephones implements saturation arithmetic. The support is mainly for video decoding, so as to avoid visual defects.


  1. ^ In fact, non-saturation arithmetic can also suffer associativity and distributivity failures in limited-precision environments, but such failures tend to be less obvious.
  2. ^ G. A. Constantinides, P. Y. K. Cheung, and W. Luk. Synthesis of Saturation Arithmetic Architectures
  3. ^ "GNU Compiler Collection (GCC) Internals: Arithmetic". 

External links[edit]