A carry-save adder[nb 1] is a type of digital adder, used to efficiently compute the sum of three or more binary numbers. It differs from other digital adders in that it outputs two (or more) numbers, and the answer of the original summation can be achieved by adding these outputs together. A carry save adder is typically used in a binary multiplier, since a binary multiplier involves addition of more than two binary numbers after multiplication. A big adder implemented using this technique will usually be much faster than conventional addition of those numbers.

## Motivation

Consider the sum:

```   12345678
+  87654322
= 100000000
```

Using basic arithmetic, we calculate right to left, "8 + 2 = 0, carry 1", "7 + 2 + 1 = 0, carry 1", "6 + 3 + 1 = 0, carry 1", and so on to the end of the sum. Although we know the last digit of the result at once, we cannot know the first digit until we have gone through every digit in the calculation, passing the carry from each digit to the one on its left. Thus adding two n-digit numbers has to take a time proportional to n, even if the machinery we are using would otherwise be capable of performing many calculations simultaneously.

In electronic terms, using bits (binary digits), this means that even if we have n one-bit adders at our disposal, we still have to allow a time proportional to n to allow a possible carry to propagate from one end of the number to the other. Until we have done this,

1. We do not know the result of the addition.
2. We do not know whether the result of the addition is larger or smaller than a given number (for instance, we do not know whether it is positive or negative).

A carry look-ahead adder can reduce the delay. In principle the delay can be reduced so that it is proportional to log n, but for large numbers this is no longer the case, because even when carry look-ahead is implemented, the distances that signals have to travel on the chip increase in proportion to n, and propagation delays increase at the same rate. Once we get to the 512-bit to 2048-bit number sizes that are required in public-key cryptography, carry look-ahead is not of much help.

## The basic concept

The idea of delaying carry resolution until the end, or saving carries, is due to John von Neumann.

Here is an example of a binary sum of 3 long binary numbers:

```  10111010101011011111000000001101 (a)
+ 11011110101011011011111011101111 (b)
+ 10010010101101110101001101010010 (c)
```

Conventional way to do it would be to first compute (a+b), and then compute ((a+b)+c). Carry-save arithmetic works by abandoning any kind of carry propagation. It computes the sum digit by digit, as:

```  10111010101011011111000000001101
+ 11011110101011011011111011101111
+ 00010010101101110101001101010010
= 21132130303123132223112112112222
```

The notation is unconventional, but the result is still unambiguous. Here, the result will be described as the sum of 2 binary numbers:

```  01110110101101110001110110110000 and
100110101010110111110010010011110
```

Now these 2 numbers can be sent to a carry-propagate adder which will output the result.

This was very advantageous from a delay (computation-time) perspective. If you were to add these 3 numbers using conventional methods, it would take you 2 carry-propagate adder delays to get to the answer. If you use the carry-save technique, you require one only 1 carry-propagate adder delay and 1 full-adder delay (which is much lower than a carry-propagate delay) and. Thus, CSA adders are typically very fast.

## Carry-save accumulators

Supposing that we have two bits of storage per digit, we can use a redundant binary representation, storing the values 0, 1, 2, or 3 in each digit position. It is therefore obvious that one more binary number can be added to our carry-save result without overflowing our storage capacity: but then what?

The key to success is that at the moment of each partial addition we add three bits:

• 0 or 1, from the number we are adding.
• 0 if the digit in our store is 0 or 2, or 1 if it is 1 or 3.
• 0 if the digit to its right is 0 or 1, or 1 if it is 2 or 3.

To put it another way, we are taking a carry digit from the position on our right, and passing a carry digit to the left, just as in conventional addition; but the carry digit we pass to the left is the result of the previous calculation and not the current one. In each clock cycle, carries only have to move one step along, and not n steps as in conventional addition.

Because signals don't have to move as far, the clock can tick much faster. ..

There is still a need to convert the result to binary at the end of a calculation, which effectively just means letting the carries travel all the way through the number just as in a conventional adder. But if we have done 512 additions in the process of performing a 512-bit multiplication, the cost of that final conversion is effectively split across those 512 additions, so each addition bears 1/512 of the cost of that final "conventional" addition.

## Drawbacks

At each stage of a carry-save addition,

1. We know the result of the addition at once.
2. We still do not know whether the result of the addition is larger or smaller than a given number (for instance, we do not know whether it is positive or negative).

This latter point is a drawback when using carry-save adders to implement modular multiplication (multiplication followed by division, keeping the remainder only). If we cannot know whether the intermediate result is greater or less than the modulus, how can we know whether to subtract the modulus?

Montgomery multiplication, which depends on the rightmost digit of the result, is one solution; though rather like carry-save addition itself, it carries a fixed overhead, so that a sequence of Montgomery multiplications saves time but a single one does not. Fortunately exponentiation, which is effectively a sequence of multiplications, is the most common operation in public-key cryptography.

Careful error analysis allows a choice to be made about subtracting the modulus even though we don't know for certain whether the result of the addition is big enough to warrant the subtraction. For this to work, it is necessary for the circuit design to be able to add −2, −1, 0, +1 or +2 times the modulus. The advantage over Montgomery multiplication is that there is no fixed overhead attached to each sequence of multiplications.

## Technical details

The carry-save unit consists of n full adders, each of which computes a single sum and carry bit based solely on the corresponding bits of the three input numbers. Given the three n-bit numbers a, b, and c, it produces a partial sum ps and a shift-carry sc:

$ps_{i}=a_{i}\oplus b_{i}\oplus c_{i},$ $sc_{i}=(a_{i}\wedge b_{i})\vee (a_{i}\wedge c_{i})\vee (b_{i}\wedge c_{i}).$ The entire sum can then be computed by:

1. Shifting the carry sequence sc left by one place.
2. Appending a 0 to the front (most significant bit) of the partial sum sequence ps.
3. Using a ripple carry adder to add these two together and produce the resulting (n + 1)-bit value.