History of supercomputing
The main credit to supercomputers goes to the inventor of CDC -6600, Seymour Cray. The history of supercomputing goes back to the early 1920s in the United States with the IBM tabulators at Columbia University and a series of computers at Control Data Corporation (CDC), designed by Seymour Cray to use innovative designs and parallelism to achieve superior computational peak performance. The CDC 6600, released in 1964, is generally considered the first supercomputer. However, some earlier computers were considered supercomputers for their day, such as the 1954 IBM NORC, the 1960 UNIVAC LARC, and the IBM 7030 Stretch and the Atlas, both in 1962.
While the supercomputers of the 1980s used only a few processors, in the 1990s, machines with thousands of processors began to appear both in the United States and in Japan, setting new computational performance records.
By the end of the 20th century, massively parallel supercomputers with thousands of "off-the-shelf" processors similar to those found in personal computers were constructed and broke through the teraflop computational barrier.
Progress in the first decade of the 21st century was dramatic and supercomputers with over 60,000 processors appeared, reaching petaflop performance levels.
Beginnings: 1950s and 1960s
In 1957, a group of engineers left Sperry Corporation to form Control Data Corporation (CDC) in Minneapolis, Minnesota. Seymour Cray left Sperry a year later to join his colleagues at CDC. In 1960, Cray completed the CDC 1604, one of the first solid-state computers, and the fastest computer in the world[dubious ] at a time when vacuum tubes were found in most large computers.
Around 1960, Cray decided to design a computer that would be the fastest in the world by a large margin. After four years of experimentation along with Jim Thornton, and Dean Roush and about 30 other engineers Cray completed the CDC 6600 in 1964. Cray switched from germanium to silicon transistors, built by Fairchild Semiconductor, that used the planar process. These did not have the drawbacks of the mesa silicon transistors. He ran them very fast, and the speed of light restriction forced a very compact design with severe overheating problems, which were solved by introducing refrigeration, designed by Dean Roush. Given that the 6600 outran all computers of the time by about 10 times, it was dubbed a supercomputer and defined the supercomputing market when two hundred computers were sold at $9 million each.
The 6600 gained speed by "farming out" work to peripheral computing elements, freeing the CPU (Central Processing Unit) to process actual data. The Minnesota FORTRAN compiler for the machine was developed by Liddiard and Mundstock at the University of Minnesota and with it the 6600 could sustain 500 kiloflops on standard mathematical operations. In 1968, Cray completed the CDC 7600, again the fastest computer in the world. At 36 MHz, the 7600 had 3.6 times the clock speed of the 6600, but ran significantly faster due to other technical innovations. They sold only about 50 of the 7600s, not quite a failure. Cray left CDC in 1972 to form his own company. Two years after his departure CDC delivered the STAR-100 which at 100 megaflops was three times the speed of the 7600. Along with the Texas Instruments ASC, the STAR-100 was one of the first machines to use vector processing - the idea having been inspired around 1964 by the APL programming language.
In 1956, a team at Manchester University in the United Kingdom, began development of MUSE — a name derived from microsecond engine — with the aim of eventually building a computer that could operate at processing speeds approaching one microsecond per instruction, about one million instructions per second. Mu (or µ) is a prefix in the SI and other systems of units denoting a factor of 10−6 (one millionth).
At the end of 1958, Ferranti agreed to begin to collaborate with Manchester University on the project, and the computer was shortly afterwards renamed Atlas, with the joint venture under the control of Tom Kilburn. The first Atlas was officially commissioned on 7 December 1962, nearly three years before the Cray CDC 6600 supercomputer was introduced, as one of the world's first supercomputers - and was considered to be the most powerful computer in England and for a very short time was considered to be one of the most powerful computers in the world, and equivalent to four IBM 7094s. It was said that whenever Atlas went offline half of the United Kingdom's computer capacity was lost. The Atlas pioneered the use of virtual memory and paging as a way to extend the Atlas's working memory by combining its 16,384 words of primary core memory with an additional 96K words of secondary drum memory. Atlas also pioneered the Atlas Supervisor, "considered by many to be the first recognizable modern operating system".
The Cray era: mid-1970s and 1980s
Four years after leaving CDC, Cray delivered the 80 MHz Cray-1 in 1976, and it became the most successful supercomputer in history. The Cray-1 used integrated circuits with two gates per chip and was a vector processor which introduced a number of innovations such as chaining in which scalar and vector registers generate interim results which can be used immediately, without additional memory references which reduce computational speed. The Cray X-MP (designed by Steve Chen) was released in 1982 as a 105 MHz shared-memory parallel vector processor with better chaining support and multiple memory pipelines. All three floating point pipelines on the X-MP could operate simultaneously. By 1983 Cray and Control Data were supercomputer leaders; despite its lead in the overall computer market, IBM was unable to produce a profitable competitor.
The Cray-2 released in 1985 was a 4 processor liquid cooled computer totally immersed in a tank of Fluorinert, which bubbled as it operated. It could perform to 1.9 gigaflops and was the world's second fastest supercomputer after M-13 (2.4 gigaflops) until 1990 when ETA-10G from CDC overtook both. The Cray-2 was a totally new design and did not use chaining and had a high memory latency, but used much pipelining and was ideal for problems that required large amounts of memory. The software costs in developing a supercomputer should not be underestimated, as evidenced by the fact that in the 1980s the cost for software development at Cray came to equal what was spent on hardware. That trend was partly responsible for a move away from the in-house, Cray Operating System to UNICOS based on Unix.
The Cray Y-MP, also designed by Steve Chen, was released in 1988 as an improvement of the X-MP and could have eight vector processors at 167 MHz with a peak performance of 333 megaflops per processor. In the late 1980s, Cray's experiment on the use of gallium arsenide semiconductors in the Cray-3 did not succeed. Seymour Cray began to work on a massively parallel computer in the early 1990s, but died in a car accident in 1996 before it could be completed. Cray Research did, however, produce such computers.
Massive processing: the 1990s
The Cray-2 which set the frontiers of supercomputing in the mid to late 1980s had only 8 processors. In the 1990s, supercomputers with thousands of processors began to appear. Another development at the end of the 1980s was the arrival of Japanese supercomputers, some of which were modeled after the Cray-1.
The SX-3/44R was announced by NEC Corporation in 1989 and a year later earned the fastest in the world title with a 4 processor model. However, Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994. It had a peak speed of 1.7 gigaflops per processor. The Hitachi SR2201 on the other hand obtained a peak performance of 600 gigaflops in 1996 by using 2048 processors connected via a fast three-dimensional crossbar network.
In the same timeframe the Intel Paragon could have 1000 to 4000 Intel i860 processors in various configurations, and was ranked the fastest in the world in 1993. The Paragon was a MIMD machine which connected processors via a high speed two-dimensional mesh, allowing processes to execute on separate nodes; communicating via the Message Passing Interface. By 1995, Cray was also shipping massively parallel systems, e.g. the Cray T3E with over 2,000 processors, using a three-dimensional torus interconnect.
The Paragon architecture soon led to the Intel ASCI Red supercomputer in the United States, which held the top supercomputing spot to the end of the 20th century as part of the Advanced Simulation and Computing Initiative. This was also a mesh-based MIMD massively-parallel system with over 9,000 compute nodes and well over 12 terabytes of disk storage, but used off-the-shelf Pentium Pro processors that could be found in everyday personal computers. ASCI Red was the first system ever to break through the 1 teraflop barrier on the MP-Linpack benchmark in 1996; eventually reaching 2 teraflops.
Petascale computing in the 21st century
Significant progress was made in the first decade of the 21st century. The efficiency of supercomputers continued to increase, but not dramatically so. The Cray C90 used 500 kilowatts of power in 1991, while by 2003 the ASCI Q used 3,000 kW while being 2,000 times faster, increasing the performance per watt 300 fold.
In 2004, the Earth Simulator supercomputer built by NEC at the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) reached 35.9 teraflops, using 640 nodes, each with eight proprietary vector processors. By comparison, As of 2020, 3 Nvidia RTX 2080 Ti graphics cards can deliver comparable performance, at 14 TFLOPS per card. 
The IBM Blue Gene supercomputer architecture found widespread use in the early part of the 21st century, and 27 of the computers on the TOP500 list used that architecture. The Blue Gene approach is somewhat different in that it trades processor speed for low power consumption so that a larger number of processors can be used at air cooled temperatures. It can use over 60,000 processors, with 2048 processors "per rack", and connects them via a three-dimensional torus interconnect.
Progress in China has been rapid, in that China placed 51st on the TOP500 list in June 2003, then 14th in November 2003, and 10th in June 2004 and then 5th during 2005, before gaining the top spot in 2010 with the 2.5 petaflop Tianhe-I supercomputer.
In July 2011, the 8.1 petaflop Japanese K computer became the fastest in the world using over 60,000 SPARC64 VIIIfx processors housed in over 600 cabinets. The fact that K computer is over 60 times faster than the Earth Simulator, and that the Earth Simulator ranks as the 68th system in the world seven years after holding the top spot demonstrates both the rapid increase in top performance and the widespread growth of supercomputing technology worldwide. By 2014, the Earth Simulator had dropped off the list and by 2018 K computer had dropped out of the top 10. By 2018, Summit had become the world's most powerful supercomputer, at 200 petaFLOPS.
Historical TOP500 table
The CoCom and its later replacement, the Wassenaar Arrangement, legally regulated - required licensing and approval and record-keeping; or banned entirely - the export of high-performance computers (HPCs) to certain countries. Such controls have become harder to justify, leading to loosening of these regulations. Some have argued these regulations were never justified.
- Green 500
- Instructions per second
- Quasi-opportunistic supercomputing
- Supercomputer architecture
- Supercomputing in China
- Supercomputing in Europe
- Supercomputing in India
- Supercomputing in Japan
- Supercomputing in Pakistan
- Chen, Sao-Jie; Lin, Guang-Huei; Hsiung, Pao-Ann; Hu, Yu-Hen (2009). Hardware software co-design of a multimedia SOC platform. Springer Science+Business Media. pp. 70–72. ISBN 9781402096235. Retrieved 20 February 2018.
- Impagliazzo, John; Lee, John A. N. (2004). History of computing in education. p. 172. ISBN 1-4020-8135-9. Retrieved 20 February 2018.
- Sisson, Richard; Zacher, Christian K. (2006). The American Midwest: an interpretive encyclopedia. p. 1489. ISBN 0-253-34886-2.
- Frank da Cruz (25 October 2013) . "IBM NORC". Retrieved 20 February 2018.
- Lundstrom, David E. (1984). A Few Good Men from UNIVAC. MIT Press. Retrieved 20 February 2018.
- David Lundstrom, A Few Good Men from UNIVAC, page 90, lists LARC and STRETCH as supercomputers.
- Hannan, Caryn (2008). Wisconsin Biographical Dictionary. pp. 83–84. ISBN 1-878592-63-7. Retrieved 20 February 2018.
- Murray, Charles J. (1997). The Supermen. Wiley & Sons.
- Ceruzzi, Paul E. (2003). A history of modern computing. MIT Press. p. 161. ISBN 978-0-262-53203-7. Retrieved 20 February 2018.
- Frisch, Michael J. (December 1972). "Remarks on algorithm 352 [S22], algorithm 385 [S13], algorithm 392 [D3]". Communications of the ACM. 15 (12): 1074. doi:10.1145/361598.361914.
- Fosdick, Lloyd Dudley (1996). An Introduction to high-performance scientific computing. MIT Press. p. 418. ISBN 0-262-06181-3.
- Hill, Mark Donald; Jouppi, Norman Paul; Sohi, Gurindar (1999). Readings in computer architecture. p. 41-48. ISBN 978-1-55860-539-8.
- "The Atlas". University of Manchester. Archived from the original on 28 July 2012. Retrieved 21 September 2010.
- Lavington, Simon Hugh (1998). A History of Manchester Computers (2 ed.). Swindon: The British Computer Society. pp. 41–52. ISBN 978-1-902505-01-5.
- Creasy, R. J. (September 1981), "The Origin of the VM/370 Time-Sharing System" (PDF), IBM Journal of Research & Development, 25 (5), p. 486
- Reilly, Edwin D. (2003). Milestones in computer science and information technology. p. 65. ISBN 1-57356-521-0.
- Tokhi, M. O.; Hossain, Mohammad Alamgir (2003). Parallel computing for real-time signal processing and control. pp. 201-202. ISBN 978-1-85233-599-1.
- Greenwald, John (1983-07-11). "The Colossus That Works". TIME. Archived from the original on 2008-05-14. Retrieved 2019-05-18.
- "Mikhail A.Kartsev - Developer of Super-Computers for Space Observation". ICFCST. 2018 . Retrieved 20 February 2018.
- MacKenzie, Donald (1998). Knowing machines: essays on technical change. p. 149-151. ISBN 0-262-63188-1.
- Glowinski, R.; Lichnewsky, A. Computing methods in applied sciences and engineering. pp. 353–360. ISBN 0-89871-264-5.
- "TOP500 Annual Report 1994". 1 October 1996.
- Hirose, N.; Fukuda, M. (1997). Numerical Wind Tunnel (NWT) and CFD Research at National Aerospace Laboratory. Proceedings of HPC-Asia '97. IEEE Computer Society. doi:10.1109/HPC.1997.592130.
- Fujii, H.; Yasuda, Y.; Akashi, H.; Inagami, Y.; Koga, M.; Ishihara, O.; Kashiyama, M.; Wada, H.; Sumimoto, T. (April 1997). Architecture and performance of the Hitachi SR2201 massively parallel processor system. Proceedings of 11th International Parallel Processing Symposium. pp. 233–241. doi:10.1109/IPPS.1997.580901.
- Iwasaki, Y. (January 1998). The CP-PACS project. Nuclear Physics B - Proceedings Supplements. 60 (1–2). pp. 246–254. arXiv:hep-lat/9709055. doi:10.1016/S0920-5632(97)00487-8.
- A.J. van der Steen, Overview of recent supercomputers, Publication of the NCF, Stichting Nationale Computer Faciliteiten, the Netherlands, January 1997.
- Reed, Daniel A. (2003). Scalable input/output: achieving system balance. p. 182. ISBN 978-0-262-68142-1.
- "Cray Sells First T3E-1350 Supercomputer to PhillipsPetroleum" (Press release). Seattle: Gale Group. Business Wire. 7 August 2000.
- Agida, N. R.; et al. (et al.) (March–May 2005). "Blue Gene/L Torus Interconnection Network" (PDF). IBM Journal of Research and Development. 45 (2–3): 265. Archived from the original (PDF) on 15 August 2011. Retrieved 9 February 2012.
- Greenberg, David S. (1998). Heath, Michael T. (ed.). "Enabling Department-Scale Supercomputing". Algorithms for parallel processing. 105: 323. ISBN 0-387-98680-4. Retrieved 20 February 2018.
- Feng, Wu-chun (1 October 2003). "Making a Case for Efficient Supercomputing" (PDF). ACM Queue Magazine. 1 (7). doi:10.1145/957717.957772. Archived from the original (PDF) on 30 March 2012. Retrieved 6 February 2016.
- Sato, Tetsuya (2004). "The Earth Simulator: Roles and Impacts". Nuclear Physics B: Proceedings Supplements. 129: 102. doi:10.1016/S0920-5632(03)02511-8.
- "Those Xbox Series X specs don't tell us much, do they? | TechRadar". www.techradar.com.
- Almasi, George; et al. (et al.) (2005). Cunha, José Cardoso; Medeiros, Pedro D. (eds.). Early Experience with Scientific Applications on the Blue Gene/L Supercomputer. Euro-Par 2005 parallel processing: 11th International Euro-Par Conference. pp. 560–567. ISBN 978-3-540-28700-1.
- Morgan, Timothy Prickett (22 November 2010). "IBM uncloaks 20 petaflops BlueGene/Q super". The Register.
- Graham, Susan L.; Snir, Marc; Patterson, Cynthia A. (2005). Getting up to speed: the future of supercomputing. p. 188. ISBN 0-309-09502-6.
- Vance, Ashlee (28 October 2010). "China Wrests Supercomputer Title From U.S." New York Times. Retrieved 20 February 2018.
- "Japanese supercomputer 'K' is world's fastest". The Telegraph. 20 June 2011. Retrieved 20 June 2011.
- "Japanese 'K' Computer Is Ranked Most Powerful". The New York Times. 20 June 2011. Retrieved 20 June 2011.
- "Supercomputer "K computer" Takes First Place in World". Fujitsu. Retrieved 20 June 2011.
- "Sublist Generator". Top500. 2017. Retrieved 20 February 2018.
- "Complexities of Setting Export Control Thresholds: Computers". Export controls and nonproliferation policy (PDF). DIANE Publishing. May 1994. ISBN 9781428920521.
- Wolcott, Peter; Goodman, Seymour; Homer, Patrick (November 1998). "High Performance Computing Export Controls: Navigating Choppy Waters". Communications of the ACM. New York, USA. 41 (11): 27–30. doi:10.1145/287831.287836.
- McLoughlin, Glenn J.; Fergusson, Ian F. (10 February 2003). High Performance Computers and Export Control Policy (PDF) (Report).
- Brugger, Seth (1 September 2000). "U.S. Revises Computer Export Control Regulations". Arms Control Association.
- "Export Controls for High Performance Computers". 24 June 2011.
- Blagdon, Jeff (30 May 2013). "US removes sanctions on computer exports to Iran".