Talk:Gustafson's law

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Mid

This article has been rated as Mid-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Strong vs. weak scaling[edit]

In the high-performance computing community, scaling according to Amdahl's Law, i.e. under the assumption of fixing the total problem size, is commonly referred to as "strong" scaling. Conversely, scaling under conditions where the problem size per computing unit is kept constant, is referred to as "weak" scaling. Depending on the application and use-case at hand, the developer/user might aim to increase strong or weak scalability of the code of interest. — Preceding unsigned comment added by 141.58.5.7 (talk) 07:00, 29 August 2012 (UTC)[reply]

Priority for Shi[edit]

Since Shi's manuscript is unpublished, Wikipedia cannot make statements about it being the first to point out something. AshtonBenson (talk) 18:32, 8 November 2009 (UTC)[reply]

Other[edit]

"The impact of the law was the shift in research" --which law? Amdahl's or Gustafson's? 24.124.29.17 (talk) 14:52, 7 April 2009 (UTC)[reply]

Maybe change the "miles" and "mph" to "km" and "km/h"?

Whatever; km/h is only vaguely better. The appropriate SI unit is, and always will be ms^-1 —Preceding unsigned comment added by 219.89.194.176 (talk) 01:15, 7 April 2009 (UTC)[reply]

Driving Metaphor[edit]

I'm not convinced by this driving metaphor, it seems tenuous at best - driving at 150mph for an extra hour would indeed give a higher average speed... but the other city is only 60 miles away, and you're already half way there. Driving for another hour at 150mph would see you overshoot your destination... —Preceding unsigned comment added by 82.69.41.89 (talk) 09:07, 11 June 2008 (UTC)[reply]

That's why it begins "Given enough time and distance to travel". MoraSique (talk) 17:12, 6 April 2009 (UTC)[reply]

This so called metaphor breaks down if any of the following are also considered

maxumum speed (speed limits or max. practical 'processing' speed)
proximity to end of journey - where necessary speed increase approaches light speed and corresponding infinite mass/infinite energy requirement
'given enough time and distance to travel requirement' restricts validity of this so called 'law'

Copied from an in-article comment by 86.162.58.177. MoraSique (talk) 17:16, 6 April 2009 (UTC)[reply]

Except that that's the whole point. Perhaps a better metaphor would be that Ahmdal's law is about finding the quickest way to get a multitude of goods to a city in a day. Gustafson's law is about the realization that, if I can get the whole load delivered in a day, and then optimize the process so it only takes an hour, I'm going to deliver even more goods to the next city, because now I can.

In my opinion, Gustafson's law is a good answer to the question that people always ask about, "If computers are so much faster now than they were in 1995, why does it still take two minutes to load Windows?" The answer is simple: because people want Windows to take two minutes to load, and in that two minutes, they want to get as many features as possible. If Windows 7 were run on a machine designed for Windows 95 (even assuming no hardware compatibility issues), the startup time would be an hour or more -- completely unacceptable. And the features would be so maddeningly slow that the public would demand what it got in 1995: Windows 95. Jsharpminor (talk) 15:51, 11 June 2011 (UTC)[reply]

I think the point the metaphor is trying to make is that even though the current attempt failed (via Ahmdal's), if we speed up the rest of the current trip, the driver can attain the goal by averaging two consecutive trips (say a round trip, and a later trip to the same city.) The problem, of course, is would a second trip be needed at some point or not... --DLWormwood (talk) 18:07, 6 April 2009 (UTC)[reply]

To summarize, then, Gustafson's law focuses not on solving the problem you *actually* have, but instead on solving a larger, possibly inappropriate problem which you *dont have*. Great --Jim —Preceding unsigned comment added by 219.89.194.176 (talk) 01:18, 7 April 2009 (UTC)[reply]

Well, if you want to put it like that, yes. Amdahl law is "there is no free lunch"; Gustafson's is "if you're hungry enough, you can spend little per unit". Balabiot (talk) 13:29, 7 April 2009 (UTC)[reply]

Scaled speedup[edit]

I think using the term "speedup" here, without differentiating it from its use in Amdahl's law, is dangerously misleading.
Gustafson clearly shows in his paper that he does NOT use the same definition of speedup as Amdahl: he introduces the term "scaled speedup" to refer to his definition, the difference being -- as I understand it -- that:

"speedup" as used by Amdahl refers to the decrease in computation time for a parallelized application compared to a sequential one -- the reference is the sequential performance
"scaled speedup" as defined by Gustafson refers to the inverse of the increase in computation time for a sequential application compared to a parallel one -- the reference is the parallel performance

See fig. 2a & 2b in Gustafson's paper if you prefer to visualise things (I don't know if these figures can be used in the Wiki page?).

Also, IMO Gustafson's law does not contradict Amdahl's law: it's merely a different approach to parallelism, but both are valid.
As explained in the last paragraph of Gustafson's paper, Gustafson's point has to be replaced in the context of a misuse of Amdahl's law causing overrated skepticism over the practical value of massive parallelism (quoting Gustafson, a "mental block" in the research community).

I'm just pointing out things and not actually changing anything because I'm not an expert in the field, and above all I'm not an expert Wikipedian. However if someone more experienced deems what I said worthy of inclusion in the Wiki page, feel free to proceed -- or to confirm that I should proceed -- and update the page. — Preceding unsigned comment added by Laomai Weng (talk • contribs) 20:04, 15 April 2011 (UTC)[reply]

The "Gustafson's law" as presented in this article is wrong (actually, the presentation was misleading)[edit]

I can't believe that so many people can be so blind. Is

S(P)=P-\alpha \cdot (P-1)

really John L. Gustafson and his colleague Edwin H. Barsis law? Or is there a strange error by some wikipedian?

Note: later I saw that it is interpreted differently from would be a natural interpretation Wlod (talk) 01:46, 10 February 2012 (UTC)[reply]

Let T be the sequential time. We may assume a unit of time such that T=1. Let t be the time of the computation, when $1-\alpha$ part of the calculation was done P times faster (say by P processors which got the ideal speedup P times). The rest was done sequentially, the old way. Thus

{\frac {t}{T}}=t={\frac {1-\alpha }{P}}+\alpha ={\frac {\alpha \cdot (P-1)+1}{P}}

It follows that the speedup is equal to:

{\frac {T}{t}}={\frac {P}{\alpha \cdot (P-1)+1}}

For instance, for $P:=2$ and $\alpha :={\frac {1}{2}}$ the speedup according to the stated Gustafson's law would be $3/2$ , while in reality it's $4/3$ . Indeed, if the sequential algorithm took one unit of time, then the (partially) parallel took 1/2 on the sequential part, and 1/4 on the parallelizable part, hence 3/4 of the time unit for the total algorithm. Thus the speedup is indeed 4/3. (If $\alpha :=1/2$ it would take P=6 to obtain the 3/2 speedup). Wlod (talk) —Preceding undated comment added 08:56, 31 January 2012 (UTC).[reply]

Still a bit simpler derivation[edit]

We may assume that the sequntial time is T:=P. Thus

T=(1-\alpha )\cdot P+\alpha \cdot P

It follows that:

t=(1-\alpha )+\alpha \cdot P=\alpha \cdot (P-1)+1

hence the speedup is:

{\frac {T}{t}}={\frac {P}{\alpha \cdot (P-1)+1}}

Wlod (talk) —Preceding undated comment added 09:13, 31 January 2012 (UTC).[reply]

Two interpretations (the cause behind the confusion)[edit]

In the article $\alpha$ stands for the proportion of sequential time during the process in which the parallel portion is performed by $P$ processors. Thus actually it's not just a constant $\alpha$ but a function $\alpha :=\alpha (P)$ . However it was natural for me (perhaps for many readers who care at all?) that $\alpha$ was a fraction of the time spent on the sequential part when the whole computation is done sequentially--then indeed $\alpha$ is a constant (at least with respect to $P$ ) Wlod (talk) 02:03, 10 February 2012 (UTC)[reply]

Pitfalls of "scaling up the problem"[edit]

Among researchers in parallel computing, there seems to be a lack of understanding of computational complexity. Over and over again, we see people ditching a serial O(n log n) algorithm for a parallelizable O(n squared) algorithm because they can show greater speedup, relative to the WRONG BASELINE. For instance, Bellman-Ford is preferable to Dijktra's ONLY if you have negative weights in a graph. Otherwise, for even moderate problem sizes, Dijkstra's will always be faster on a single processor than Bellman-Ford on a supercomputer. See Dispelling the Myths of Parallel Computing and also Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers. Sure, there are lots of embarrassingly or sufficiently parallel problems, but researchers are fixated on toy problems that are not parallelizable in any way that has intellectual merit. — Preceding unsigned comment added by 69.204.203.196 (talk) 12:48, 20 September 2014 (UTC)[reply]

related to Lump of Labor Fallacy?[edit]

"Lump of labor fallacy is the contention that the amount of work available to labourers is fixed." — Preceding unsigned comment added by Intellec7 (talk • contribs) 17:47, 20 December 2015 (UTC)[reply]

Somehow. The basis of Gustafson's theory (calling anything as obvious a law is ridiculous) is the fact with more workers you can do more refined jobs, which would indeed fit our actual civilization, made of so many constructions purposed to underpin previous ones, allowing for a nation-wide Ponzi system (potentially able?) to fake an actual government by only basing it on (relatively) small scale companies I/O's, should they have real purposes or not (encoded government-directed management data in operation results, employees number, wages, sales, capital itself...). Big data has similar individuals' population profiling implications, by the way. Realistically, the example given is one of the worst ideas anyone could ever have, for the simplest of all reasons... this requires everyone to move from home to a distant working place, which at our population level and because of how politics have been done in the recent past means excessive energy use in transports leading to pollution (not only gaseous, asphalt and over-concentrated cities are too : try to dispatch your nation's population across the whole territory, it's simply disgusting here and actually this doesn't give us any benefit) and other non-obvious but evident consequences. Note that this kind of computer-driveable government could be considered democratic, as everyone would contribute to diverse cross-regulations (truly dependent people aside), which by itself is such a joke I can't stop contemplating the (totally stupid) idea. --2A01:CB11:13:D700:D186:445D:41E4:DA1D (talk) 13:26, 24 October 2018 (UTC)[reply]

Confusing[edit]

The article is terribly confusing: "speedup in latency of the execution of a task at fixed execution time". What is a "speedup in latency" if we are assuming the total execution time is fixed? Jess_Riedel (talk) 17:07, 1 November 2020 (UTC)[reply]

Serial runtime is wrong[edit]

The article currently reads:

> The execution time on the serial system is: $T'=s+Np$

which does not make any sense, because for a serial system $N=1$ . Also, $N$ typically designates the problem size, not the number of processors. Kosarev7 (talk) 21:21, 31 October 2023 (UTC)[reply]