In the high-performance computing environment, burst buffer is a fast and intermediate storage layer positioned between the front-end computing processes and the back-end storage systems. It emerges as a timely storage solution to bridge the ever-increasing performance gap between the processing speed of the compute nodes and the Input/output (I/O) bandwidth of the storage systems. Burst buffer is built from arrays of high-performance storage devices, such as NVRAM and SSD. It typically offers from one to two orders of magnitude higher I/O bandwidth than the back-end storage systems.
The emergence of burst buffer fosters a wide variety of solutions to accelerate the scientific data movement on supercomputers. For example, scientific applications' life cycles typically alternate between computation phases and I/O phases. Namely, after each round of computation, all the computing processes concurrently write their intermediate data to the back-end storage systems. With the deployment of burst buffer, processes can quickly write their data to burst buffer after one round of computation, and immediately proceed to the next round of computation; the data are then asynchronously flushed from burst buffer to the storage systems at the same time with the next round of computation. In this way, I/O time is shortened by the overlapped computation and data flush operations. In addition, buffering data in burst buffer also gives applications plenty of opportunities to reshape the data traffic to the back-end storage systems for efficient bandwidth utilization of the storage systems. In another common use case, scientific applications can stage their intermediate data in and out of burst buffer without interacting with the slower storage systems. Bypassing the storage systems allows applications to relish most of the performance benefit from burst buffer.
Representative burst buffer architectures
There are two representative burst buffer architectures in the high-performance computing environment: node-local burst buffer and remote shared burst buffer. In the node-local burst buffer architecture, burst buffer storage is located on the individual compute node, so the aggregate burst buffer bandwidth grows linearly with the compute node count. This scalability benefit has been well-documented in recent literature.It also comes with the demand for a scalable metadata management strategy to maintain a global namespace for data distributed across all the burst buffers. In the remote shared burst buffer architecture, burst buffer storage resides on a fewer number of I/O nodes positioned between the compute nodes and the back-end storage systems. Data movement between the compute nodes and burst buffer needs to go through the network. Placing burst buffer on the I/O nodes facilitates the independent development, deployment and maintenance of the burst buffer service. Hence, several well-known commercialized software products have been developed to manage this type of burst buffer, such as DataWarp and Infinite Memory Engine.
Supercomputers deployed with burst buffer
Due to its importance, burst buffer has been widely deployed on the leadership-scale supercomputers. For example, node-local burst buffer has been installed on DASH supercomputer at the San Diego Supercomputer Center, Tsubame supercomputers at Tokyo Institute of Technology, Theta and Aurora supercomputers at the Argonne National Laboratory, Summit supercomputer at the Oak Ridge National Laboratory, and Sierra supercomputer at the Lawrence Livermore National Laboratory, etc. Remote shared burst buffer has been adopted by Tianhe-2 supercomputer at the National Supercomputer Center in Guangzhou, Trinity supercomputer at the Los Alamos National Laboratory, and Cori supercomputer at the Lawrence Berkeley National Laboratory, etc.
- "On the Role of Burst Buffers in Leadership-Class Storage systems" (PDF). IEEE. April 2012.
- "A Case of System-Wide Power Management for Scientific Applications" (PDF). IEEE. September 2013.
- "BurstMem: A High-Performance Burst Buffer System for Scientific Applications" (PDF). IEEE. October 2014.
- "Jitter-Free Co-Processing on a Prototype Exascale Storage Stack" (PDF). IEEE. April 2012.
- "TRIO: Burst Buffer Based I/O Orchestration" (PDF). IEEE. September 2015.
- "Leveraging Burst Buffer Coordination to Prevent I/O Interference" (PDF). IEEE. March 2017.
- "An Ephemeral Burst-Buffer File System for Scientific Applications" (PDF). IEEE. November 2016.
- "BurstFS: A Distributed Burst Buffer File System for Scientific Applications" (PDF). November 2015.
- "Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System" (pdf). ACM. November 2010.
- "A 1 PB/s File System to Checkpoint Three Million MPI Tasks" (PDF). ACM. June 2013.
- "FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems" (PDF). IEEE. October 2014.
- "MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers" (PDF). IEEE. May 2017.
- "ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table" (PDF). IEEE. May 2013.
- "DASH: a Recipe for a Flash-based Data Intensive Supercomputer" (PDF). ACM. November 2010.
- Cray DataWarp, a production burst buffer system developed by Cray.
- Infinite Memory Engine, a production burst buffer system developed by Data Direct Network.
- Theta supercomputer, a supercomputer hosted in the Argonne National Laboratory.
- Summit supercomputer, a supercomputer hosted in the Oak Ridge National Laboratory.
- Sierra supercomputer, a supercomputer hosted in the Lawrence National National Laboratory.
- Trinity supercomputer, a supercomputer hosted in the Los Alamos National Laboratory.
- Cori supercomputer, a supercomputer hosted in the Lawrence Berkeley National Laboratory.