HPC Challenge Benchmark

HPC Challenge Benchmark
Original author(s)	Innovative Computing Laboratory, University of Tennessee
Stable release	1.5.0a
Platform	Cross-platform
License	BSD
Website	http://icl.cs.utk.edu/hpcc/

The HPC Challenge Benchmark combines several benchmarks to test a number of independent attributes of the performance of high-performance computer (HPC) systems. The project has been co-sponsored by the DARPA High Productivity Computing Systems program, the United States Department of Energy and the National Science Foundation.^[1]

Context

The performance of complex applications on HPC systems can depend on a variety of independent performance attributes of the hardware. The HPC Challenge Benchmark is an effort to improve visibility into this multidimensional space by combining the measurement of several of these attributes into a single program.

Although the performance attributes of interest are not specific to any particular computer architecture, the reference implementation of the HPC Challenge Benchmark in C and MPI assumes that the system under test is a cluster of shared memory multiprocessor systems connected by a network. Due to this assumption of a hierarchical system structure most of the tests are run in several different modes of operation. Following the notation used by the benchmark reports, results labeled "single" mean that the test was run on one randomly chosen processor in the system, results labeled "star" mean that an independent copy of the test was run concurrently on each processor in the system, and results labeled "global" mean that all the processors were working in coordination to solve a single problem (with data distributed across the nodes of the system).

Components

The benchmark currently consists of 7 tests (with the modes of operation indicated for each):

HPL^[2] (High Performance LINPACK) - measures performance of a solver for a dense system of linear equations (global).
DGEMM - measures performance for matrix-matrix multiplication (single, star).
STREAM^[3] - measures sustained memory bandwidth to/from memory (single, star).
PTRANS - measures the rate at which the system can transpose a large array (global).
RandomAccess - measures the rate of 64-bit updates to randomly selected elements of a large table (single, star, global).
FFT - performs a Fast Fourier Transform on a large one-dimensional vector using the generalized Cooley-Tukey algorithm (single, star, global).
Communication Bandwidth and Latency - MPI-centric performance measurements based on the b_eff^[4] bandwidth/latency benchmark.

Performance Attributes

At a high level, the tests are intended to provide coverage of four important attributes of performance: double-precision floating-point arithmetic (DGEMM and HPL), local memory bandwidth (STREAM), network bandwidth for "large" messages (PTRANS, RandomAccess, FFT, b_eff), and network bandwidth for "small" messages (RandomAccess, b_eff). Some of the codes are more complex than others and can have additional performance sensitivities. For example, in some systems HPL performance can be limited by network bandwidth and/or network latency.

Competition

The annual HPC Challenge Award Competition at the Supercomputing Conference focuses on four of the most challenging benchmarks in the suite:

Global HPL
Global RandomAccess (OR BSS Random Access Benchmark)
EP STREAM (Triad) per system
Global FFT

There are two classes of awards:

Class 1: Best performance on a base or optimized run submitted to the HPC Challenge website.^[5]
Class 2: Most "elegant" implementation of four or five computational kernels including three or more of the HPC Challenge benchmarks.^[6]

References

^ "Cray X1 Supercomputer Has Highest Reported Scores on Government-Sponsored HPC Challenge Benchmark Tests". 2004-06-14. Retrieved 2010-01-22.
^ "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". Innovative Computing Laboratory, University of Tennessee at Knoxville. Retrieved 2015-06-10.
^ "STREAM: Sustainable Memory Bandwidth in High Performance Computers". Retrieved 2015-06-10.
^ "Effective Bandwidth (b_eff) Benchmark". High Performance Computing Center Stuttgart. Retrieved 2015-06-10.
^ The benchmark is designed to allow replacement of a limited set of functions with more highly optimized versions while remaining a "base" run. Additional (but still limited) modifications are allowed under the category of "optimized" runs.
^ "HPC Challenge Award Competition". DARPA HPCS Program. Retrieved 2010-01-23.

External links

HPC Challenge Benchmark Official Website
HPC Challenge Award Competition Official Website
BSS Random Access Benchmark Performance Evaluation and Optimization of Random Memory Access on Multicores with High Productivity (Best Paper Award) at ACM/IEEE HiPC 2010

[1] "Cray X1 Supercomputer Has Highest Reported Scores on Government-Sponsored HPC Challenge Benchmark Tests". 2004-06-14. Retrieved 2010-01-22.

[2] "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". Innovative Computing Laboratory, University of Tennessee at Knoxville. Retrieved 2015-06-10.

[3] "STREAM: Sustainable Memory Bandwidth in High Performance Computers". Retrieved 2015-06-10.

[4] "Effective Bandwidth (b_eff) Benchmark". High Performance Computing Center Stuttgart. Retrieved 2015-06-10.

[5] The benchmark is designed to allow replacement of a limited set of functions with more highly optimized versions while remaining a "base" run. Additional (but still limited) modifications are allowed under the category of "optimized" runs.

[6] "HPC Challenge Award Competition". DARPA HPCS Program. Retrieved 2010-01-23.

[1]

[2]

[3]

[4]

[5]

[6]