HPC Challenge Benchmark
|Original author(s)||Innovative Computing Laboratory, University of Tennessee|
The HPC Challenge Benchmark combines several benchmarks to test a number of independent attributes of the performance of high-performance computer (HPC) systems. The project has been co-sponsored by the DARPA High Productivity Computing Systems program, the United States Department of Energy and the National Science Foundation.
The performance of complex applications on HPC systems can depend on a variety of independent performance attributes of the hardware. The HPC Challenge Benchmark is an effort to improve visibility into this multidimensional space by combining the measurement of several of these attributes into a single program.
Although the performance attributes of interest are not specific to any particular computer architecture, the reference implementation of the HPC Challenge Benchmark in C and MPI assumes that the system under test is a cluster of shared memory multiprocessor systems connected by a network. Due to this assumption of a hierarchical system structure most of the tests are run in several different modes of operation. Following the notation used by the benchmark reports, results labeled "single" mean that the test was run on one randomly chosen processor in the system, results labeled "star" mean that an independent copy of the test was run concurrently on each processor in the system, and results labeled "global" mean that all the processors were working in coordination to solve a single problem (with data distributed across the nodes of the system).
The benchmark currently consists of 7 tests (with the modes of operation indicated for each):
- HPL (High Performance LINPACK) - measures performance of a solver for a dense system of linear equations (global).
- DGEMM - measures performance for matrix-matrix multiplication (single, star).
- STREAM - measures sustained memory bandwidth to/from memory (single, star).
- PTRANS - measures the rate at which the system can transpose a large array (global).
- RandomAccess - measures the rate of 64-bit updates to randomly selected elements of a large table (single, star, global).
- FFT - performs a Fast Fourier Transform on a large one-dimensional vector using the generalized Cooley-Tukey algorithm (single, star, global).
- Communication Bandwidth and Latency - MPI-centric performance measurements based on the b_eff bandwidth/latency benchmark.
At a high level, the tests are intended to provide coverage of four important attributes of performance: double-precision floating-point arithmetic (DGEMM and HPL), local memory bandwidth (STREAM), network bandwidth for "large" messages (PTRANS, RandomAccess, FFT, b_eff), and network bandwidth for "small" messages (RandomAccess, b_eff). Some of the codes are more complex than others and can have additional performance sensitivities. For example, in some systems HPL performance can be limited by network bandwidth and/or network latency.
The annual HPC Challenge Award Competition at the Supercomputing Conference focuses on four of the most challenging benchmarks in the suite:
- Global HPL
- Global RandomAccess (OR BSS Random Access Benchmark)
- EP STREAM (Triad) per system
- Global FFT
There are two classes of awards:
- Class 1: Best performance on a base or optimized run submitted to the HPC Challenge website.
- Class 2: Most "elegant" implementation of four or five computational kernels including three or more of the HPC Challenge benchmarks.
- "Cray X1 Supercomputer Has Highest Reported Scores on Government-Sponsored HPC Challenge Benchmark Tests". 2004-06-14. Retrieved 2010-01-22.
- "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". Innovative Computing Laboratory, University of Tennessee at Knoxville. Retrieved 2015-06-10.
- "STREAM: Sustainable Memory Bandwidth in High Performance Computers". Retrieved 2015-06-10.
- "Effective Bandwidth (b_eff) Benchmark". High Performance Computing Center Stuttgart. Retrieved 2015-06-10.
- The benchmark is designed to allow replacement of a limited set of functions with more highly optimized versions while remaining a "base" run. Additional (but still limited) modifications are allowed under the category of "optimized" runs.
- "HPC Challenge Award Competition". DARPA HPCS Program. Retrieved 2010-01-23.