|This article needs additional citations for verification. (November 2010) (Learn how and when to remove this template message)|
Fan-out of 4 is a process-dependent delay metric used in digital CMOS technologies.
Fan out = Cload / Cin, where
- Cload = total MOS gate capacitance driven by the logic gate under consideration
- Cin = the MOS gate capacitance of the logic gate under consideration
As a delay metric, one FO4 is the delay of an inverter, driven by an inverter 4x smaller than itself, and driving an inverter 4x larger than itself. Both conditions are necessary since input signal rise/fall time affects the delay as well as output loading.
FO4 is generally used as a delay metric because such a load is generally seen in case of tapered buffers driving large loads, and approximately in any logic gate of a logic path sized for minimum delay. Also, for most technologies the optimum fanout for such buffers generally varies from 2.7 to 5.3.
A fan out of 4 is the answer to the canonical problem stated as follows: Given a fixed size inverter, small in comparison to a fixed large load, minimize the delay in driving the large load. After some math, it can be shown that the minimum delay is achieved when the load is driven by a chain of N inverters, each successive inverter ~4x larger than the previous; N ~ log4(Cload/Cin).
In the absence of parasitic capacitances (drain diffusion capacitance and wire capacitance), the result is "a fan out of e" (now N ~ ln(Cload/Cin).
If the load itself is not large, then using a fan out of 4 scaling in successive logic stages does not make sense. In these cases, minimum sized transistors may be faster.
Because scaled technologies are inherently faster (in absolute terms), circuit performance can be more fairly compared using the fan out of 4 as a metric. For example, given two 64-bit adders, one implemented in a 0.5 µm technology and the other in 90 nm technology, it would be unfair to say the 90 nm adder is better from a circuits and architecture standpoint just because it has less latency. The 90 nm adder might be faster only due to its inherently faster devices. To compare the adder architecture and circuit design, it is more fair to normalize each adder's latency to the delay of one FO4 inverter.
Some examples of high-frequency CPUs with long pipeline and low stage delay: IBM Power6 has design with cycle delay of 13 FO4; clock period of Intel's Pentium 4 at 3.4 GHz is estimated as 16.3 FO4.
- Horowitz, Mark; Harris, David; Ho, Ron; Wei, Gu-Yeon. "The Fanout-of-4 Inverter Delay Metric". CiteSeerX .
- Kostenko, Natalya. "IBM POWER6 Processor and Systems" (PDF). Retrieved 29 November 2013.
- "This document details the relationship between CV/I device delay metrics, fan-out-of-4 (FO4) inverter gate delay metrics, and high-performance microprocessor clock frequency trends." (PDF). U.S. Design Technology Working Group; ITRS. 2003. Archived from the original (PDF) on 3 December 2013. Retrieved 29 November 2013.
- Logical Effort Revisited
- Revisiting the FO4 Metric // RWT, Aug 15, 2002
- David Harris, Slides on Logical Effort – with a succinct example of design using FO4 inverters (p. 19).
- MS Hrishikesh, The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays // ACM SIGARCH Computer Architecture News. Vol. 30. No. 2. IEEE Computer Society, 2002