Arithmetic logic unit

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In digital electronics, an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and bitwise logical operations on integer binary numbers. It is a fundamental building block of the central processing unit (CPU) found in many computers. This is in contrast to a floating-point unit (FPU), which is a digital circuit that operates on floating point numbers with the aid of one or more internal ALUs. Powerful and complex ALUs are often used in modern, high performance CPUs, FPUs and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs.

The inputs to an ALU are the data to be operated on (called operands) and a code indicating the operation to be performed; the ALU's output is the result of the performed operation. In many designs, the ALU also exchanges additional information with a status register, which relates to the result of the current or previous operations.

Mathematician John von Neumann proposed the ALU concept in 1945 in a report on the foundations for a new computer called the EDVAC.[1]


A symbolic representation of an ALU and its input and output signals, indicated by arrows pointing into or out of the ALU, respectively. Each arrow represents one or more signals.

An ALU has a variety of input and output nets, which are the shared electrical connections used to convey digital signals between the ALU and external circuitry. When an ALU is operating, external circuits apply signals to the ALU inputs and, in response, the ALU produces and conveys signals to external circuitry via its outputs.

A basic ALU has three parallel data buses consisting of two input operands (A and B) and a result output (Y). Each data bus is a group of signals that conveys one binary integer number. Typically, the A, B and Y bus widths (the number of signals comprising each bus) are identical and match the native word size of the encapsulating CPU (or other processor).

The opcode input is a parallel bus that conveys to the ALU an operation selection code, which is an enumerated value that specifies the desired arithmetic or logic operation to be performed by the ALU. The opcode size (its bus width) is related to the number of different operations the ALU can perform; for example, a four-bit opcode can specify up to sixteen different ALU operations. Generally, an ALU opcode is not the same as a machine language opcode, though in some cases it may be directly encoded as a bit field within a machine language opcode.

The status input allows additional information (for example, a carry-in bit from a previous ALU operation) to be made available to the ALU when performing an operation.

Finally, one or more status outputs produce supplemental indicators about the result of an ALU operation (for example, carry-out, parity, overflow, zero or negative indicators) that could be useful in future ALU operations or for controlling conditional branching.

Circuit operation[edit]

The combinational logic circuitry of a simple four-bit ALU, the 74181 integrated circuit

An ALU is a combinational logic circuit, meaning that its outputs will change asynchronously in response to input changes. In normal operation, stable signals are applied to all of the ALU inputs and, when enough time (known as the "propagation delay") has passed for the signals to propagate through the ALU circuitry, the result of the ALU operation appears at the ALU outputs. The external circuitry connected to the ALU is responsible for ensuring the stability of ALU input signals throughout the operation, and for allowing sufficient time for the signals to propagate through the ALU before acquiring the ALU result.

In general, external circuitry controls an ALU by applying signals to its inputs. Typically, the external circuitry employs sequential logic to control the ALU operation, which is paced by a clock signal of a sufficiently low frequency to ensure enough time for the ALU outputs to settle under worst-case conditions.

For example, a CPU begins an ALU addition operation by routing operands from their sources (which are usually registers) to the ALU's operand inputs, while the control unit simultaneously applies a value to the ALU's opcode input, configuring it to perform addition. At the same time, the CPU also routes the ALU result output to a destination register that will receive the sum. The ALU's input signals, which are held stable until the next clock, are allowed to propagate through the ALU and to the destination register while the CPU waits for the next clock. When the next clock arrives, the destination register stores the ALU result and, since the ALU operation has completed, the ALU inputs may be set up for the next ALU operation.

Numerical representations[edit]

Cascadable 8 Bit ALU Texas Instruments SN74AS888

An ALU must process numbers using the same formats as the rest of the digital circuit. The format of modern processors is almost always the two's complement binary number representation. Early computers used a wide variety of number systems, including ones' complement, two's complement, sign-magnitude format, and even true decimal systems, with various[NB 2] representation of the digits.

The ones' complement and two's complement number systems allow for subtraction to be accomplished by adding the negative of a number in a very simple way which negates the need for specialized circuits to do subtraction; however, calculating the negative in two's complement requires adding a one to the low order bit and propagating the carry. An alternative way to do two's complement subtraction of A−B is to present a one to the carry input of the adder and use ¬B rather than B as the second input. The arithmetic, logic and shift circuits introduced in previous sections can be combined into one ALU with common selection.

Complex operations[edit]

Engineers can design an arithmetic logic unit to calculate most operations. The more complex the operation, the more expensive the ALU is, the more space it uses in the processor, and the more power it dissipates. Therefore, engineers compromise. They make the ALU powerful enough to make the processor fast, yet not so complex as to become prohibitive. For example, computing the square root of a number might use:

  • Calculation in a single clock: an extraordinarily complex ALU that calculates the square root of any number in a single step.
  • Calculation pipeline: a very complex ALU that calculates the square root of any number in several steps. The intermediate results go through a series of circuits arranged like a factory production line. The ALU can accept new numbers to calculate even before having finished the previous ones. The ALU can now produce numbers as fast as a single-clock ALU, although the results start to flow out of the ALU only after an initial delay.
  • Iterative calculation: a complex ALU that calculates the square root through several steps. This usually relies on control from a complex control unit with built-in microcode.
  • Co-processor: a simple ALU in the processor, and sell a separate specialized and costly processor that the customer can install just beside this one, and implements one of the options above.
  • Software libraries: instead of having software to use co-processors or emulation directly, separate algorithms would be created to calculate square roots in the software itself.
  • Software emulation: emulation of the existence of a co-processor. Whenever a program attempts to perform the square root calculation, make the processor check if there is a co-processor present and use it if there is one; if there is not one, interrupt the processing of the program and invoke the operating system to perform the square root calculation through some software algorithm.

The options above go from the fastest and most expensive one to the slowest and least expensive one. Therefore, while even the simplest computer can calculate the most complicated formula, the simplest computers will usually take a long time doing that because of the several steps for calculating the formula.

Powerful processors like the Intel Core and AMD64 implement option #1 for several simple operations, #2 for the most common complex operations and #3 for the extremely complex operations.

See also[edit]


  1. ^ a b IBM and UNIVAC used the term biquinary with different meanings.
  2. ^ Including Binary-Coded Decimal (BCD) in 4 bits, 2-out-of-5 coding in five bits,[2] 5-bit biquinary[NB 1] encoding,[3] and 2-out-of-seven biquinary[NB 1] encoding in 7 bits[4]


External links[edit]