Blue Gene
| This article may need to be rewritten entirely to comply with Wikipedia's quality standards. You can help. The discussion page may contain suggestions. (December 2011) |
Blue Gene is an IBM project aimed at designing a supercomputer that can reach an operating speeds in the PFLOPS (petaFLOPS) range, with low power consumption.The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. Blue Gene systems have led for several years the Top500 ranking of the most powerful supercomputers and have been deployed in many supercomputing centers. The project was awarded the 2008 National Medal of Technology and Innovation. U.S. President Barack Obama bestowed the award on October 7, 2009.[1].
Contents |
[edit] History
In December 1999, IBM announced a $100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding[2]. The project had two main goals: to advance our understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures. The initial design for Blue Gene was based on an early version of the Cyclops64 architecture, designed by Monty Denneau. The initial research and development work was pursued at IBM T.J. Watson Research Center.
In 1999 Alan Gara moved from Columbia University, were he had been leading work on the QCDOC architecture[3] to the IBM T.J. Watson Research Center. The QCDOC system was a special purpose computers for QCD computations; it used a chip with an embedded PowerPC core on it. At IBM, Alan Gara started working on an extension of the QCDOC architecture into a more general-purpose supercomputer: The 4D nearest-neighbor interconnection network was replaced by a network supporting routing of messages from any node to any other; and a parallel I/O subsystem was added. DOE started funding the development of this system and it became known as Blue Gene/L (L for Light); development of the original Blue Gene system continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64.
In November 2001, Lawrence Livermore National Laboratory (LLNL) joined IBM as a research partner for Blue Gene. Development proceded at IBM T.J. Watson Research Center and at IBM Rochester with the goal of delivering a system to LLNL
[edit] Blue Gene/L
In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a Linpack performance of 70.72 TFLOPS. It thereby overtook NEC's Earth Simulator, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL[4] gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based Roadrunner system at Los Alamos National Laboratory, which was the first system to surpass the 1 PetaFLOPS mark.
While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. In November 2006, there were 27 computers on the TOP500 list using the Blue Gene/L architecture. All these computers were listed as having an architecture of eServer Blue Gene Solution. For example, three racks of Blue Gene/L were housed at the San Diego Supercomputer Center.
While the TOP500 measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005 Gordon Bell Prize.
In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (Qbox).[5] At Supercomputing 2006,[6] Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards.[7] In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).[8]
[edit] Major features
The Blue Gene/L supercomputer was unique in the following aspects:[9]
- Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating point accelerators.While the performance of each chip was relatively low, the system could achieve better computer to energy ratio, for applications that could use larger numbers of nodes.
- Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.
- System-on-a-chip design. All node components were embedded on one chip, with the exception of 512 MB external DRAM.
- A large number of nodes (scalable in increments of 1024 up to at least 65,536)
- Three-dimensional torus interconnect with auxiliary networks for global communications (broadcast and reductions), I/O, and management
- Lightweight OS per node for minimum system overhead (system noise).
[edit] Architecture
The Blue Gene/L architecture was an evolution of the QCDSP and QCDOC architectures. Each Blue Gene/L Compute or I/O node was a single ASIC with associated DRAM memory chips. The ASIC integrated two 700 MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS (gigaFLOPS). The two CPUs were not cache coherent with one another.
Compute nodes were packaged two per compute card, with 16 compute cards plus up to 2 I/O nodes per node board. There were 32 node boards per cabinet/rack.[10] By integration the of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated low power (about 17 watts, including DRAMs). This allowed very aggressive packaging of up to 1024 compute nodes plus additional I/O nodes in the standard 19-inch rack, within reasonable limits of electrical power supply and air cooling. The performance metrics in terms of FLOPS per watt, FLOPS per m2 of floorspace and FLOPS per unit cost allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate a row of faulty components to allow the machine to continue to run.
Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network. The I/O nodes handled filesystem operations on behalf of the compute nodes. Finally, a separate and private Ethernet network provided access to any node for configuration, booting and diagnostics. To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive integer power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.
Blue Gene/L compute nodes used a minimal operating system supporting a single user program. Only a subset of POSIX calls was supported, and only one process could run at a time on node in co-processor mode -- or one process per CPU in virtual mode. Programmers needed to implement green threads in order to simulate local concurrency. Application development was usually performed in C, C++, or Fortran using MPI for communication. However, some scripting languages such as Ruby[11] and Python[12] have been ported to the compute nodes.
[edit] Blue Gene/P
In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and Argonne National Laboratory's Leadership Computing Facility.[13]
[edit] Design
The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850 MHz. The cores are cache coherent and the chip can operate as a 4-way symmetric multiprocessor (SMP). The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A rack contains 32 node boards (thus 1024 nodes, 4096 processor cores).[14] By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371 MFLOPS/W Blue Gene/P installations ranked at or near the top of the Green500 lists in 2007-2008.[15]
[edit] Installations
The following is an incomplete list of Blue Gene/P installations. Per November 2009, the TOP500 list contained 15 Blue Gene/P installations of 2-racks (2048 nodes, 8196 processor cores, 23.86 TFLOPS Linpack) and larger.[16]
- On November 12, 2007, the first Blue Gene/P installation, JUGENE, with 16 racks (16,384 nodes, 65,536 processors) was running at Forschungszentrum Jülich in Germany with a performance of 167 TFLOPS.[17] When inaugurated it was the fastest supercomputer in Europe and the sixth fastest in the world. In 2009, JUGENE was upgraded to 72 racks (73,728 nodes, 294,912 processor cores) with 144 terabytes of memory and 6 petabytes of storage, and achieved a peak performance of 1 PetaFLOPS. This configuration incorporated new air-to-water heat exchangers between the racks, reducing the cooling cost substantially.[18]
- The first laboratory in the United States to receive a Blue Gene/P was Argonne National Laboratory. At completion, the 40-rack (40960 nodes, 163840 processor cores) "Intrepid" system was ranked #3 on the June 2008 Top 500 list.[19] The Intrepid system is one of the major resources of the INCITE program, in which processor hours are awarded to "grand challenge" science and engineering projects in a peer-reviewed competition.
- Lawrence Livermore National Laboratory installed a 36-rack Blue Gene/P installation, "Dawn", in 2009.
- The King Abdullah University of Science and Technology (KAUST) installed a 16-rack Blue Gene/P installation, "Shaheen", in 2009.
- A Blue Gene/P system is the central processor for the Low Frequency Array for Radio astronomy (LOFAR) project in the Netherlands and surrounding European countries. This application uses the streaming data capabilities of the machine.
- A 2-rack Blue Gene/P has been installed on September 9, 2008 in Sofia, the capital of Bulgaria, and is operated by the Bulgarian Academy of Sciences and the Sofia University.[20]
- In 2010, a Blue Gene/P was installed at the University of Melbourne for the Victorian Life Sciences Computation Initiative.[21]
[edit] Applications
- Veselin Topalov, the challenger to the World Chess Champion title in 2010, confirmed in an interview that he had used a Blue Gene/P supercomputer during his preparation for the match.[22]
- The Blue Gene/P computer has been used to simulate approximately one percent of a human cerebral cortex, containing 1.6 billion neurons with approximately 9 trillion connections.[23]
- The IBM Kittyhawk project team has ported Linux to the compute nodes and demonstrated generic Web 2.0 workloads running at scale on a Blue Gene/P. Their paper published in the ACM Operating Systems Review describes a kernel driver that tunnels Ethernet over the tree network, which results in all-to-all TCP/IP connectivity.[24][25] Running standard Linux software like MySQL, their performance results on SpecJBB rank among the highest on record.[citation needed]
- In 2011 a Rutgers University / IBM / University of Texas team linked the KAUST Shaheen installation together with a Blue Gene/P installation at the IBM Watson Research Center into a "federated high performance computing cloud", winning the IEEE SCALE 2011 challenge with an oil reservoir optimization application.[26]
[edit] Blue Gene/Q
The third supercomputer design in the Blue Gene series, Blue Gene/Q aims to reach 20 Petaflops in the 2012 time frame. It continues to expand and enhance the Blue Gene/L and /P architectures.
[edit] Design
- The Blue Gene/Q Compute chip is an 18 core chip. The 64-bit PowerPC A2 processor cores are 4-way simultaneously multithreaded, and run at 1.6 GHz. Each processor core has a quad SIMD double precision floating point unit. The processor cores are linked by a crossbar switch to a 32 MB eDRAM L2 cache, operating at half core speed. The L2 cache is multi-versioned, supporting transactional memory and speculative execution, and has hardware support for atomic operations.[27] L2 cache misses are handled by two built-in DDR3 memory controllers running at 1.33 GHz. The chip also integrates logic for chip-to-chip communications in a 5D torus configuration, with 2GB/s chip-to-chip links. 16 Processor cores are used for computing, and a 17th core for operating system assist functions such as interrupts, asynchronous I/O, MPI pacing and RAS. The 18th core is used as a spare in case one of the other cores is permanently damaged, like in manufacturing, but is normally shut down. The Blue Gene/Q chip is manufactured on IBM's copper SOI process at 45 nm, and will deliver 205 GFLOPS at 1.6 GHz and draw 55 watts. It is 19×19 mm large (359.5 mm²) and comprises 1.47 billion transistors. The chip is mounted on a compute card along with 16 GB DDR3 DRAM (i.e., 1 GB for each user processor core).[28]
- A Q32[29] compute drawer will have 32 compute cards, each water cooled and connected into a 5D network torus.[30]
- Racks will have 32 compute drawers for a total of 1024 compute nodes, 16,384 user cores and 16 TB RAM.[30]
- Separate I/O drawers will be air cooled and contain 8 compute cards and 8 PCIe expansion slots for Infiniband or 10 Gigabit Ethernet networking.[30]
[edit] Performance
At the time of the Blue Gene/Q system announcement in November 2011, an initial 4-rack Blue Gene/Q system (4096 nodes, 65536 user processor cores) achieved #17 in the TOP500 list[31] with 677.1 TeraFLOPS Linpack, outperforming the original 2007 104-rack BlueGene/L installation described above. The same 4-rack system achieved the top position in the Graph500 list[32] with over 250 GTEPS (giga traversed edges per second). Blue Gene/Q systems also topped the Green500 list of most energy efficient supercomputers with about 2 GFLOPS/W.[33]
[edit] Installations
The archetypal Blue Gene/Q system called Sequoia will be installed at Lawrence Livermore National Laboratory in 2012 as a part of the Advanced Simulation and Computing Program running nuclear simulations and advanced scientific research. It will consist of 98,304 compute nodes comprising 1.6 million processor cores and 1.6 PB memory in 96 racks covering an area of about 3,000 square feet (280 m2), drawing 6 megawatts of power.[34]
A Blue Gene/Q system called Mira will be installed at Argonne National Laboratory in the Argonne Leadership Computing Facility early in 2012. It will consist of 49,152 compute nodes, with 70 PB of disk storage (470 GB/s I/O bandwidth).[35][36]
[edit] See also
[edit] References
- ^ Harris, Mark (September 18, 2009). "Obama honours IBM supercomputer". Techradar. http://www.techradar.com/news/computing/obama-honours-ibm-supercomputer-636869. Retrieved 2009-09-18.
- ^ "Blue Gene: A Vision for Protein Science using a Petaflop Supercomputer". IBM Systems Journal, Special Issue on Deep Computing for the Life Sciences, 40 (2). http://www.research.ibm.com/journal/sj/402/allen.pdf.
- ^ Boyle, P. A., Chen, D., Christ, N. H., Clark, M. A., Cohen, S. D., Cristian, C., Dong, Z., Gara, A., Joo, B., Jung, C., Kim, C., Levkova, L. A., Liao, X., Liu, G., Mawhinney, R. D., Ohta, S., Petrov, K., Wettig, T. and Yamaguchi, A. (march 2005). "Overview of the QCDSP and QCDOC computers". IBM Journal of Research and Development 49 (2.3): 351–365.
- ^ "Lawrence Livermore National Laboratory: BlueGene/L"]. http://asc.llnl.gov/computing_resources/bluegenel/.
- ^ hpcwire.com
- ^ SC06
- ^ hpcchallenge.org
- ^ bbc.co.uk
- ^ "Blue Gene". IBM Journal of Research and Development 49 (2/3). 2005. http://www.research.ibm.com/journal/rd49-23.html.
- ^ Bluegene/L Configuration https://asc.llnl.gov/computing_resources/bluegenel/configuration.html
- ^ ece.iastate.edu
- ^ William Scullin (March 12, 2011). "Python for High Performance Computing". Atlanta, GA. http://us.pycon.org/2011/home/.
- ^ "IBM Triples Performance of World's Fastest, Most Energy-Efficient Supercomputer". 2007-06-27. http://www-03.ibm.com/press/us/en/pressrelease/21791.wss. Retrieved 2011-12-24.
- ^ "Overview of the IBM Blue Gene/P project". IBM Journal of Research and Development. Jan 2008. http://dx.doi.org/10.1147/rd.521.0199.
- ^ "The Green500 List". http://www.green500.org.
- ^ "Top500 List, November 2009". http://www.top500.org/lists/2009/11.
- ^ "Supercomputing: Jülich Amongst World Leaders Again". IDG News Service. 2007-11-12. http://www.pressebox.de/pressemeldungen/ibm-deutschland-gmbh-4/boxid-136200.html.
- ^ "IBM Press room - 2009-02-10 New IBM Petaflop Supercomputer at German Forschungszentrum Juelich to Be Europe's Most Powerful". www-03.ibm.com. 2009-02-10. http://www-03.ibm.com/press/us/en/pressrelease/26657.wss. Retrieved 2011-03-11.
- ^ "Argonne's Supercomputer Named World’s Fastest for Open Science, Third Overall"
- ^ Вече си имаме и суперкомпютър, Dir.bg, 9 September 2008
- ^ "IBM Press room - 2010-02-11 IBM to Collaborate with Leading Australian Institutions to Push the Boundaries of Medical Research - Australia". 03.ibm.com. 2010-02-11. http://www-03.ibm.com/press/au/en/pressrelease/29383.wss. Retrieved 2011-03-11.
- ^ "Topalov training with super computer Blue Gene P". Chessdom. http://players.chessdom.com/veselin-topalov/topalov-blue-gene-p. Retrieved 21 May 2010.
- ^ Kaku, Michio. Physics of the Future (New York: Doubleday, 2011), 91.
- ^ "Project Kittyhawk: A Global-Scale Computer". http://www.research.ibm.com/kittyhawk/.
- ^ Project Kittyhawk: building a global-scale computer
- ^ "Rutgers-led Experts Assemble Globe-Spanning Supercomputer Cloud". http://news.rutgers.edu.+2011-07-06. http://news.rutgers.edu/medrel/special-content/summer-2011/rutgers-led-experts-20110706. Retrieved 2011-12-24.
- ^ "Memory Speculation of the Blue Gene/Q Compute Chip". http://wands.cse.lehigh.edu/IBM_BQC_PACT2011.ppt. Retrieved 2011-12-23.
- ^ "The Blue Gene/Q Compute chip". http://www.hotchips.org/archives/hc23/HC23-papers/HC23.18.1-manycore/HC23.18.121.BlueGene-IBM_BQC_HC23_20110818.pdf. Retrieved 2011-12-23.
- ^ IBM Blue Gene/Q supercomputer delivers petascale computing for high-performance computing applications
- ^ a b c "IBM uncloaks 20 petaflops BlueGene/Q super". The Register. 2010-11-22. http://www.theregister.co.uk/2010/11/22/ibm_blue_gene_q_super/. Retrieved 2010-11-25.
- ^ http://www.top500.org/list/2011/11/100
- ^ "The Graph500 List - November 2011". http://www.graph500.org/nov2011.html.
- ^ "The Green500 List - November 2011". http://www.green500.org/lists/2011/11/top/list.php.
- ^ Feldman, Michael (2009-02-03). "Lawrence Livermore Prepares for 20 Petaflop Blue Gene/Q". HPCwire. http://www.hpcwire.com/features/Lawrence-Livermore-Prepares-for-20-Petaflop-Blue-GeneQ-38948594.html. Retrieved 2011-03-11.
- ^ http://www.er.doe.gov/ascr/ASCAC/Meetings/Nov09/Nov09Minutes.pdf
- ^ http://workshops.alcf.anl.gov/gs10/files/2010/01/betsy_riley.pdf
[edit] External links
|
|
This article's use of external links may not follow Wikipedia's policies or guidelines. Please improve this article by removing excessive or inappropriate external links, and converting useful links where appropriate into footnote references. (August 2010) |