Floating point operations per second

Computer Performance
megaflop	106
gigaflop	109
teraflop	1012
petaflop	1015
exaflop	1018
zettaflop	1021
yottaflop	1024

In computing, FLOPS (or flops or flop/s) is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second. Since the final S stands for "second", conservative speakers consider "FLOPS" as both the singular and plural of the term, although the singular "FLOP" is frequently encountered. Alternatively, the singular FLOP (or flop) is used as an abbreviation for "FLoating-point OPeration", and a flop count is a count of these operations (e.g., required by a given algorithm or computer program). In this context, "flops" is simply the plural rather than a rate.

NEC's SX-9 supercomputer has the world's first vector processor to exceed 100 gigaFLOPS per single core. IBM's supercomputer dubbed Blue Gene/P is designed to eventually operate at three petaFLOPS.^[1] However, the IBM Roadrunner is the first supercomputer to sustain one petaFLOPS.^[2]

A basic calculator performs relatively few FLOPS. Each calculation request to a typical calculator requires only a single operation, so there is rarely any need for its response time to exceed that needed by the operator. Any response time below 0.1 second is perceived as instantaneous by a human operator,^[3] so a simple calculator needs only about 10 FLOPS.

Measuring performance

In order for FLOPS to be useful as a measure of floating-point performance, a standard benchmark must be available on all computers of interest. One example is the LINPACK benchmark.

There are many factors in computer performance other than raw floating-point computation speed, such as I/O performance, interprocessor communication, cache coherence, and the memory hierarchy. This means that supercomputers are in general only capable of a small fraction of their "theoretical peak" FLOPS throughput (obtained by adding together the theoretical peak FLOPS performance of every element of the system). Even when operating on large highly parallel problems, their performance will be bursty, mostly due to the residual effects of Amdahl's law. Real benchmarks therefore measure both peak actual FLOPS performance as well as sustained FLOPS performance.

For ordinary (non-scientific) applications, integer operations (measured in MIPS) are far more common. Measuring floating point operation speed, therefore, does not predict accurately how the processor will perform on just any problem. However, for many scientific jobs such as analysis of data, a FLOPS rating is effective.

Historically, the earliest reliably documented serious use of the Floating Point Operation as a metric appears to be AEC justification to Congress for purchasing a Control Data CDC 6600 in the mid-1960s.

The terminology is currently so confusing that until April 24, 2006 U.S. export control was based upon measurement of "Composite Theoretical Performance" (CTP) in millions of "Theoretical Operations Per Second" or MTOPS. On that date, however, the U.S. Department of Commerce's Bureau of Industry and Security amended the Export Administration Regulations to base controls on Adjusted Peak Performance (APP) in Weighted TeraFLOPS (WT).

Records

On August 12, 2008 AMD released the ATI Radeon HD 4870X2 graphics card with two Radeon R770 GPUs totalling 2.4 teraFLOPs.

In June 2008, AMD released ATI Radeon HD4800 series, which are reported to be the first GPU's to achieve one teraFLOP scale.

On May 25, 2008, an American military supercomputer built by IBM reached the computing milestone of one petaflop by processing more than 1.026 quadrillion calculations per second.^[4] The computer has been named Roadrunner, referring to the state bird of New Mexico.^[5]

On February 4, 2008, the NSF and the University of Texas opened full scale research runs on an AMD, Sun supercomputer Ranger, the most powerful supercomputing system in the world for open science research, which operates at sustained speeds of half a petaflop.

On October 25, 2007, NEC Corporation of Japan issued a press release^[6] announcing its SX series model SX-9, claiming it to be the world's fastest vector supercomputer with a peak processing performance of 839 teraFLOPS. The SX-9 features the first CPU capable of a peak vector performance of 102.4 gigaFLOPS per single core.

On June 26, 2007, IBM announced the second generation of its top supercomputer, dubbed Blue Gene/P and designed to continuously operate at speeds exceeding one petaFLOPS. When configured to do so, it can reach speeds in excess of three petaFLOPS.

In June 2007, Top500.org reported the fastest computer in the world to be the IBM Blue Gene/L supercomputer, measuring a peak of 596 TFLOPS^[7]. The Cray XT4 hit second place with 101.7 TFLOPS.

In June 2006, a new computer was announced by Japanese research institute RIKEN, the MDGRAPE-3. The computer's performance tops out at one petaFLOPS, almost two times faster than the Blue Gene/L, but MDGRAPE-3 is not a general purpose computer, which is why it does not appear in the Top500.org list. It has special-purpose pipelines for simulating molecular dynamics.

Distributed computing uses the Internet to link personal computers to achieve a similar effect:

Folding@Home is now sustaining over 3.17 PFLOPS ^[8], the first computing project of any kind to cross the three petaFLOPS milestone. This level of performance is primarily enabled by the cumulative effort of a vast array of PlayStation 3 and powerful GPU units.
The entire BOINC averages over 1.1 PFLOPS as of August 04, 2008^[9].
SETI@Home computes data averages more than 528 TFLOPS^[10]
Einstein@Home is crunching more than 150 TFLOPS^[11]
As of August 2008, GIMPS is sustaining 27 TFLOPS.^[12]

Intel Corporation has recently unveiled the experimental multi-core POLARIS chip, which achieves 1 TFLOPS at 3.2 GHz. The 80-core chip can increase this to 1.8 TFLOPS at 5.6 GHz, although the thermal dissipation at this frequency exceeds 260 watts.

As of 2008, the fastest PC processors (quad-core) perform over 51 GFLOPS(QX9775)^[13]. GPUs in PCs are considerably more powerful in pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU.

As of June 2008, the TOP500 list of the most powerful supercomputers (excluding grid computers) is headed by the IBM Roadrunner system, with just over a petaflop of processing power.^[14]

Future developments

In May 2008 a collaboration was announced between NASA, SGI and Intel to build a 1 petaflop computer in 2009, scaling up to 10 PFLOPs by 2012.^[15]

Given the current speed of progress, Supercomputers are projected to reach 1 Exaflop in 2019.^[16] Erik P. DeBenedictis of Sandia National Laboratories theorizes that a Zettaflop computer is required to accomplish full weather modeling, which could cover a two week time span accurately.^[17] Such systems might be built around 2030.

Cost of computing

Hardware costs

1961: about US$1,100,000,000,000 ($1.1 trillion) per GFLOPS (=US$1,100 per FLOPS); with about 17 million IBM 1620 units @ $64,000 each and a multiplication operation taking 17.7ms^[18]
1984: about: US $15,000,000 per GFLOPS Cray X-MP
1997: about US$30,000 per GFLOPS; with two 16-Pentium-Pro–processor Beowulf cluster computers^[19]
2000, April: $1,000 per GFLOPS, Bunyip, Australian National University. First sub-US$1/MFlop and Gordon Bell Prize 2000.
2000, May: $640 per GFLOPS, KLAT2, University of Kentucky
2003, August: $82 per GFLOPS, KASY0, University of Kentucky
2006, February: about $1 per GFLOPS in ATI PC add-in graphics card (X1900 architecture) — these figures are disputed as they refer to highly parallelized GPU power
2007, March: about $0.42 per GFLOPS in Ambric AM2045^[20]
2007, October: about $0.20 per GFLOPS with the cheapest retail Sony PS3 console, at US$400, that runs at a claimed 2 teraFLOPS; these figures represent the processing power of the GPU. The seven CPUs run collectively at a lower 218 GFLOPS^[21].

This trend toward lower and lower cost for the same computing power follows Moore's law.

Operation costs

In energy cost, according to the Green500 list, as of 2007 the most efficient CPU runs at 357.23 MFLOPS per watt. This translates to an energy requirement of 2.8 watts per GFLOPS, however this energy requirement will be much greater for less efficient CPUs.

Hardware costs for low cost supercomputers may be less significant than energy costs when running continuously for several years. A Playstation 3 (PS3) 40 GiB (65 nm Cell) costs $399 and consumes 135 watts^[22] or $118 of electricity each year if operated 24 hours per day, conservatively assuming U.S. national average residential electric rates of $0.10/kWh^[23] (0.135 kW × 24 h × 365 d × 0.10 $/kWh = $118.26). The operating cost of electricity for 3.5 years ($413) is more than the cost of the PS3. Additional operating costs include air conditioning, space and lighting.

References

^ IBM Press Release (2007-06-26). "IBM Triples Performance of World's Fastest, Most Energy-Efficient Supercomputer" (HTML). IBM. Retrieved 2008-01-30.
^ "Military supercomputer sets record - CNET News.com".
^ "Response Times: The Three Important Limits" (HTML). Jakob Nielsen. Retrieved 2008-06-11.
^ Sharon Gaudin (2008-06-09). "IBM's Roadrunner smashes 4-minute mile of supercomputing". Computerworld. Retrieved 2008-06-10.
^ Fildes, Jonathan (2008-06-09). "Supercomputer sets petaflop pace". BBC News. Retrieved 2008-07-08.
^ "NEC Launches World's Fastest Vector Supercomputer, SX-9". NEC. 2007-10-25. Retrieved 2008-07-08.
^ "29th TOP500 List of World's Fastest Supercomputers Released". Top500.org. 2007-06-23. Retrieved 2008-07-08.
^ "Client statistics by OS". Folding@Home. 2008-07-08. Retrieved 2008-07-08.
^ "Credit overview". BOINC. Retrieved 2008-08-04.
^ "SETI@Home Credit overview". BOINC. Retrieved 2008-08-04.
^ "Server Status". Einstein@Home. Retrieved 2008-07-08.
^ Internet PrimeNet Server Parallel Technology for the Great Internet Mersenne Prime Search
^ "2007 CPU Charts". Tom's Hardware. 2007-07-16. Retrieved 2008-07-08.
^ "June 2008". TOP500. Retrieved 2008-07-08.
^ "NASA collaborates with Intel and SGI on forthcoming petaflops super computers". Heise online. 2008-05-09. {{cite news}}: Cite has empty unknown parameter: |coauthors= (help)
^ Thibodeau, Patrick (2008-06-10). "IBM breaks petaflop barrier". InfoWorld. {{cite news}}: Cite has empty unknown parameter: |coauthors= (help)
^ DeBenedictis, Erik P. (2005). "Reversible logic for supercomputing". Proceedings of the 2nd conference on Computing frontiers. pp. 391–402. ISBN 1595930191. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help); External link in |chapterurl= (help); Unknown parameter |chapterurl= ignored (|chapter-url= suggested) (help)
^ IBM 1961 BRL Report
^ Loki and Hyglac
^ Halfill, Tom R. (2006-10-10). 204101.qxd "Ambric's New Parallel Processor". Microprocessor Report. Reed Electronics Group: 1–9. Retrieved 2008-07-08. {{cite journal}}: Check |url= value (help)
^ Hermida, Alfred (2005-05-17). "Sony shows off new PlayStation 3". BBC News. Retrieved 2008-07-08.
^ Quilty-Harper, Conrad (2007-10-30). "40 GB PS3 features 65 nm chips, lower power consumption". Engadget. Retrieved 2008-07-08.
^ "Average Retail Price of Electricity to Ultimate Customers by End-Use Sector, by State". Energy Information Administration. 2008-06-10. Retrieved 2008-07-08.

External links

Current Einstein@Home benchmark
BOINC projects global benchmark
Current GIMPS throughput
Top500.org
LinuxHPC.org Linux High Performance Computing and Clustering Portal
WinHPC.org Windows High Performance Computing and Clustering Portal
Oscar Linux-cluster ranking list by CPUs/types and respective FLOPS
Information on how to calculate "Composite Theoretical Performance" (CTP)
Information on the Oak Ridge National Laboratory Cray XT system.
Infiscale Cluster Portal - Free GPL HPC
Source code, pre-compiled versions and results for PCs - Linpack, Livermore Loops, Whetstone MFLOPS
PC CPU Performance Comparisons %MFLOPS/MHz - CPU, Caches and RAM

[1] IBM Press Release (2007-06-26). "IBM Triples Performance of World's Fastest, Most Energy-Efficient Supercomputer" (HTML). IBM. Retrieved 2008-01-30.

[2] "Military supercomputer sets record - CNET News.com".

[3] "Response Times: The Three Important Limits" (HTML). Jakob Nielsen. Retrieved 2008-06-11.

[4] Sharon Gaudin (2008-06-09). "IBM's Roadrunner smashes 4-minute mile of supercomputing". Computerworld. Retrieved 2008-06-10.

[5] Fildes, Jonathan (2008-06-09). "Supercomputer sets petaflop pace". BBC News. Retrieved 2008-07-08.

[6] "NEC Launches World's Fastest Vector Supercomputer, SX-9". NEC. 2007-10-25. Retrieved 2008-07-08.

[7] "29th TOP500 List of World's Fastest Supercomputers Released". Top500.org. 2007-06-23. Retrieved 2008-07-08.

[8] "Client statistics by OS". Folding@Home. 2008-07-08. Retrieved 2008-07-08.

[9] "Credit overview". BOINC. Retrieved 2008-08-04.

[10] "SETI@Home Credit overview". BOINC. Retrieved 2008-08-04.

[11] "Server Status". Einstein@Home. Retrieved 2008-07-08.

[12] Internet PrimeNet Server Parallel Technology for the Great Internet Mersenne Prime Search

[13] "2007 CPU Charts". Tom's Hardware. 2007-07-16. Retrieved 2008-07-08.

[14] "June 2008". TOP500. Retrieved 2008-07-08.

[15] "NASA collaborates with Intel and SGI on forthcoming petaflops super computers". Heise online. 2008-05-09. {{cite news}}: Cite has empty unknown parameter: |coauthors= (help)

[16] Thibodeau, Patrick (2008-06-10). "IBM breaks petaflop barrier". InfoWorld. {{cite news}}: Cite has empty unknown parameter: |coauthors= (help)

[17] DeBenedictis, Erik P. (2005). "Reversible logic for supercomputing". Proceedings of the 2nd conference on Computing frontiers. pp. 391–402. ISBN 1595930191. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help); External link in |chapterurl= (help); Unknown parameter |chapterurl= ignored (|chapter-url= suggested) (help)

[18] IBM 1961 BRL Report

[19] Loki and Hyglac

[20] Halfill, Tom R. (2006-10-10). 204101.qxd "Ambric's New Parallel Processor". Microprocessor Report. Reed Electronics Group: 1–9. Retrieved 2008-07-08. {{cite journal}}: Check |url= value (help)

[21] Hermida, Alfred (2005-05-17). "Sony shows off new PlayStation 3". BBC News. Retrieved 2008-07-08.

[22] Quilty-Harper, Conrad (2007-10-30). "40 GB PS3 features 65 nm chips, lower power consumption". Engadget. Retrieved 2008-07-08.

[23] "Average Retail Price of Electricity to Ultimate Customers by End-Use Sector, by State". Energy Information Administration. 2008-06-10. Retrieved 2008-07-08.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

Computer Performance
Name	flops
megaflop	10⁶
gigaflop	10⁹
teraflop	10¹²
petaflop	10¹⁵
exaflop	10¹⁸
zettaflop	10²¹
yottaflop	10²⁴