# Wallace tree

basic principle known from manual multiplication
Example of reduction on 8x8 multiplier

A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two integers, devised by Australian Computer Scientist Chris Wallace in 1964.[1]

The Wallace tree has three steps:

1. Multiply (that is - AND) each bit of one of the arguments, by each bit of the other, yielding $n^2$ results. Depending on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of $a_2 b_3$ is 32 (see explanation of weights below).
2. Reduce the number of partial products to two by layers of full and half adders.
3. Group the wires in two numbers, and add them with a conventional adder.[2]

The second phase works as follows. As long as there are three or more wires with the same weight add a following layer:

• Take any three wires with the same weights and input them into a full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires.
• If there are two wires of the same weight left, input them into a half adder.
• If there is just one wire left, connect it to the next layer.

The benefit of the Wallace tree is that there are only $O(\log n)$ reduction layers, and each layer has $O(1)$ propagation delay. As making the partial products is $O(1)$ and the final addition is $O(\log n)$, the multiplication is only $O(\log n)$, not much slower than addition (however, much more expensive in the gate count). Naively adding partial products with regular adders would require $O(\log^2n)$ time. From a complexity theoretic perspective, the Wallace tree algorithm puts multiplication in the class NC1.

These computations only consider gate delays and don't deal with wire delays, which can also be very substantial.

The Wallace tree can be also represented by a tree of 3/2 or 4/2 adders.

It is sometimes combined with Booth encoding.[3][4]

## Weights explained

The weight of a wire is the radix (to base 2) of the digit that the wire carries. In general, $a_nb_m$ – have indexes of $n$ and $m$; and since $2^n 2^m = 2^{n + m}$ the weight of $a_n b_m$ is $2^{n + m}$.

## Example

$n=4$, multiplying $a_3a_2a_1a_0$ by $b_3b_2b_1b_0$:

1. First we multiply every bit by every bit:
• weight 1 - $a_0b_0$
• weight 2 - $a_0b_1$, $a_1b_0$
• weight 4 - $a_0b_2$, $a_1b_1$, $a_2b_0$
• weight 8 - $a_0b_3$, $a_1b_2$, $a_2b_1$, $a_3b_0$
• weight 16 - $a_1b_3$, $a_2b_2$, $a_3b_1$
• weight 32 - $a_2b_3$, $a_3b_2$
• weight 64 - $a_3b_3$
2. Reduction layer 1:
• Pass the only weight-1 wire through, output: 1 weight-1 wire
• Add a half adder for weight 2, outputs: 1 weight-2 wire, 1 weight-4 wire
• Add a full adder for weight 4, outputs: 1 weight-4 wire, 1 weight-8 wire
• Add a full adder for weight 8, and pass the remaining wire through, outputs: 2 weight-8 wires, 1 weight-16 wire
• Add a full adder for weight 16, outputs: 1 weight-16 wire, 1 weight-32 wire
• Add a half adder for weight 32, outputs: 1 weight-32 wire, 1 weight-64 wire
• Pass the only weight-64 wire through, output: 1 weight-64 wire
3. Wires at the output of reduction layer 1:
• weight 1 - 1
• weight 2 - 1
• weight 4 - 2
• weight 8 - 3
• weight 16 - 2
• weight 32 - 2
• weight 64 - 2
4. Reduction layer 2:
• Add a full adder for weight 8, and half adders for weights 4, 16, 32, 64
5. Outputs:
• weight 1 - 1
• weight 2 - 1
• weight 4 - 1
• weight 8 - 2
• weight 16 - 2
• weight 32 - 2
• weight 64 - 2
• weight 128 - 1
6. Group the wires into a pair integers and an adder to add them.

VHDL code:

```entity ppr1 is
Port ( A : in  STD_LOGIC_VECTOR (3 downto 0);
B : in  STD_LOGIC_VECTOR (3 downto 0);
PROD : out  STD_LOGIC_VECTOR (7 downto 0));
end ppr1;

architecture Behavioral of ppr1 is

Port(a:in STD_LOGIC;
b:in STD_LOGIC;
c:in STD_LOGIC;
sum:out STD_LOGIC;
carry:out STD_LOGIC);
end component;

Port(a:in STD_LOGIC;
b:in STD_LOGIC;
sum:out STD_LOGIC;
carry:out STD_LOGIC);
end component;

signal s11,s12,s13,s14,s15,s21,s22,s23,s24,s25,s26,s31,s32,s33,s34,s35,s36,s37 :STD_LOGIC;
signal c11,c12,c13,c14,c15,c21,c22,c23,c24,c25,c26,c31,c32,c33,c34,c35,c36,c37 :STD_LOGIC;
signal p0,p1,p2,p3 :STD_LOGIC_VECTOR(6 downto 0);
begin
--partial products generation stage
process(A,B)
begin
for i in 0 to 3 loop
P0(i)<=A(i) and B(0);
P1(i)<=A(i) and B(1);
P2(i)<=A(i) and B(2);
P3(i)<=A(i) and B(3);
end loop;
end process;

--first partial products reduction stage

--second partial products reduction stage

--third partial products reduction stage

---final output(7 terms)
PROD(0)<=p0(0);
PROD(1)<=s11;
PROD(2)<=s22;
PROD(3)<=s32;
PROD(4)<=s34;
PROD(5)<=s35;
PROD(6)<=s36;
PROD(7)<=s37;

end Behavioral;
```