Amdahl's law

In computer architecture, Amdahl's law (or Amdahl's argument^[1]) gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

Amdahl's law can be formulated the following way:

S_{\text{latency}}(s)={\frac {1}{(1-p)+{\frac {p}{s}}}}

where

S_latency is the theoretical speedup in latency of the execution of the whole task;
s is the speedup in latency of the execution of the part of the task that benefits from the improvement of the resources of the system;
p is the percentage of the execution time of the whole task concerning the part that benefits from the improvement of the resources of the system before the improvement.

Furthermore,

{\begin{cases}S_{\text{latency}}(s)\leq {\dfrac {1}{1-p}}\\[8pt]\lim \limits _{s\to \infty }S_{\text{latency}}(s)={\dfrac {1}{1-p}}.\end{cases}}

show that the theoretical speedup of the execution of the whole task increases with the improvement of the resources of the system and that regardless the magnitude of the improvement, the theoretical speedup is always limited by the part of the task that cannot benefit from the improvement.

Amdahl's law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this reason parallel computing is relevant only for a low number of processors and very parallelizable programs.

Derivation

A task executed by a system whose resources are improved compared to an initial similar system can be split up into two parts:

a part that does not benefit from the improvement of the resources of the system;
a part that benefits from the improvement of the resources of the system.

Example. — A computer program that processes files from disk. A part of that program may scan the directory of the disk and create a list of files internally in memory. After that, another part of the program passes each file to a separate thread for processing. The part that scans the directory and creates the file list cannot be sped up on a parallel computer, but the part that processes the files can.

The execution time of the whole task before the improvement of the resources of the system is denoted T. It includes the execution time of the part that does not benefit from the improvement of the resources and the execution time of the one that benefits from it. The percentage of the execution time of the task that benefits from the improvement of the resources is denoted p. The one concerning the part that does not benefit from it is therefore 1 − p. Then

T=(1-p)T+pT.

It is the execution of the part that benefits from the improvement of the resources that is sped up by the factor s after the improvement of the resources. Consequently, the execution time of the part that does not benefit from it remains the same, while the part that benefits from it becomes

{\frac {p}{s}}T.

The theoretical execution time T(s) of the whole task after the improvement of the resources is then

T(s)=(1-p)T+{\frac {p}{s}}T.

Amdahl's law gives the theoretical speedup in latency of the execution of the whole task at fixed workload W, which yields

S_{\text{latency}}(s)={\frac {TW}{T(s)W}}={\frac {T}{T(s)}}={\frac {1}{1-p+{\frac {p}{s}}}}.

Examples

Example 1

If 30% of the execution time may be the subject of a speedup, p will be 0.3; if the improvement makes the affected part twice faster, s will be 2. Amdahl's law states that the overall speedup of applying the improvement will be

S_{\text{latency}}={\frac {1}{1-p+{\frac {p}{s}}}}={\frac {1}{1-0.3+{\frac {0.3}{2}}}}=1.18.

Example 2

We are given a serial task which is split into four consecutive parts, whose percentages of execution time are p1 = 0.11, p2 = 0.18, p3 = 0.23, and p4 = 0.48 respectively. Then we are told that the 1st part is not sped up, so s1 = 1, while the 2nd part is sped up 5 times, so s2 = 5, the 3rd part is sped up 20 times, so s3 = 20, and the 4th part is sped up 1.6 times, so s4 = 1.6. By using Amdahl's law, the overall speedup is

S_{\text{latency}}={\frac {1}{{\frac {p1}{s1}}+{\frac {p2}{s2}}+{\frac {p3}{s3}}+{\frac {p4}{s4}}}}={\frac {1}{{\frac {0.11}{1}}+{\frac {0.18}{5}}+{\frac {0.23}{20}}+{\frac {0.48}{1.6}}}}=2.19.

Notice how the 20 times and 5 times speedup on the 2nd and 3rd parts respectively don't have much effect on the overall speedup when the 4th part (48% of the execution time) is sped up only 1.6 times.

Relation to law of diminishing returns

Amdahl's law is often conflated with the law of diminishing returns, whereas only a special case of applying Amdahl's law demonstrates law of diminishing returns. If one picks optimally (in terms of the achieved speedup) what to improve, then one will see monotonically decreasing improvements as one improves. If, however, one picks non-optimally, after improving a sub-optimal component and moving on to improve a more optimal component, one can see an increase in return. Note that it is often rational to improve a system in an order that is "non-optimal" in this sense, given that some improvements are more difficult or consuming of development time than others.

Amdahl's law does represent the law of diminishing returns if you are considering what sort of return you get by adding more processors to a machine, if you are running a fixed-size computation that will use all available processors to their capacity. Each new processor you add to the system will add less usable power than the previous one. Each time you double the number of processors the speedup ratio will diminish, as the total throughput heads toward the limit of 1/(1 − p).

This analysis neglects other potential bottlenecks such as memory bandwidth and I/O bandwidth, if they do not scale with the number of processors; however, taking into account such bottlenecks would tend to further demonstrate the diminishing returns of only adding processors.

Speedup in a serial program

Assume that a task has two independent parts, A and B. Part B takes roughly 25% of the time of the whole computation. By working very hard, one may be able to make this part 5 times faster, but this only reduces the time for the whole computation by a little. In contrast, one may need to perform less work to make part A be twice as fast. This will make the computation much faster than by optimizing part B, even though part B's speedup is greater by ratio, (5 times versus 2 times).

For example, with a serial program in two parts A and B for which T_A = 3 s and T_B = 1 s,

if part B is made to run 5 times faster, that is s = 5 and p = T_B/(T_A + T_B) = 0.25, then

S_{\text{latency}}={\frac {1}{1-0.25+{\frac {0.25}{5}}}}=1.25;

if part A is made to run 2 times faster, that is s = 2 and p = T_A/(T_A + T_B) = 0.75, then

S_{\text{latency}}={\frac {1}{1-0.75+{\frac {0.75}{2}}}}=1.60.

Therefore, making part A to run 2 times faster is better than making part B to run 5 times faster. The percentage improvement in speed can be calculated as

{\text{percentage improvement}}=100\left(1-{\frac {1}{S_{\text{latency}}}}\right).

Improving part A by a factor of 2 will increase overall program speed by a factor of 1.60, which makes it 37.5% faster than the original computation.
However, improving part B by a factor of 5, which presumably requires more effort, will only achieve an overall speedup factor of 1.25, which makes it 20% faster.

Limitations

Amdahl's law only applies to cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this case, Gustafson's law gives a less pessimistic and more realistic assessment of parallel performance.^[2]

Notes

^ (Rodgers 1985, p. 226)
^ Michael McCool; James Reinders; Arch Robison (2013). Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. p. 61.

References

Rodgers, David P. (June 1985). "Improvements in multiprocessor system design". ACM SIGARCH Computer Architecture News archive. 13 (3). New York, NY, USA: ACM: 225–231. doi:10.1145/327070.327215. ISBN 0-8186-0634-7. ISSN 0163-5964. {{cite journal}}: Invalid |ref=harv (help)

External links

Cases where Amdahl's law is inapplicable
Oral history interview with Gene M. Amdahl Charles Babbage Institute, University of Minnesota. Amdahl discusses his graduate work at the University of Wisconsin and his design of WISC. Discusses his role in the design of several computers for IBM including the STRETCH, IBM 701, and IBM 704. He discusses his work with Nathaniel Rochester and IBM's management of the design process. Mentions work with Ramo-Wooldridge, Aeronutronic, and Computer Sciences Corporation
A simple interactive Amdahl's Law calculator
"Amdahl's Law" by Joel F. Klein, Wolfram Demonstrations Project, 2007.
Amdahl's Law in the Multicore Era
Blog Post: "What the $#@! is Parallelism, Anyhow?"
Amdahl's Law applied to OS system calls on multicore CPU
Evaluation of the Intel Core i7 Turbo Boost feature, by James Charles, Preet Jassi, Ananth Narayan S, Abbas Sadat and Alexandra Fedorova
Calculation of the acceleration of parallel programs as a function of the number of threads, by George Popov, Valeri Mladenov and Nikos Mastorakis

[1] (Rodgers 1985, p. 226)

[spp-2] Michael McCool; James Reinders; Arch Robison (2013). Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. p. 61.

[1]

[2]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing