Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.
Memory bandwidth that is advertised for a given memory or system is usually the maximum theoretical bandwidth. In practice the observed memory bandwidth will be less than (and is guaranteed not to exceed) the advertised bandwidth. A variety of computer benchmarks exist to measure sustained memory bandwidth using a variety of access patterns. These are intended to provide insight into the memory bandwidth that a system should sustain on various classes of real applications.
Perhaps surprisingly, there are three different conventions for measuring the quantity of data transferred to and from memory.
- The bcopy convention: counts the amount of data copied from one location in memory to another location per unit time. For example, copying 1 million bytes from one location in memory to another location in memory in one second would be counted as 1 million bytes per second. The bcopy convention is self-consistent, but is not easily extended to cover cases with more complex access patterns, for example three reads and one write.
- The stream convention: sums the amount of data that the application code explicitly reads plus the amount of data that the application code explicitly writes. Using the previous 1 million byte copy example, the stream bandwidth would be counted as 1 million bytes read plus 1 million bytes written in one second, for a total of 2 million bytes per second. The stream convention is most directly tied to the user code, but may not count all the data traffic that the hardware is actually required to perform.
- The hardware convention: counts the actual amount of data read or written by the hardware, whether the data motion was explicitly requested by the user code or not. Using the same 1 million byte copy example, the hardware bandwidth on computer systems with a write allocate cache policy would include an additional 1 million bytes of traffic because the hardware reads the target array from memory into cache before performing the stores. This gives a total of 3 million bytes per second actually transferred by the hardware. The hardware convention is most directly tied to the hardware, but may not represent the minimum amount of data traffic required to implement the user's code.
- For example, some computer systems have the ability to avoid write allocate traffic using special instructions, leading to the possibility of misleading comparisons of bandwidth based on different amounts of data traffic performed.
Bandwidth Computation and Nomenclature
- Base DRAM clock frequency
- Number of lines per clock: Two, in the case of "double data rate" (DDR, DDR2, DDR3) memory.
- Memory bus (interface) width: Each DDR, DDR2, or DDR3 memory interface is 64 bits wide. Those 64 bits are sometimes referred to as a "line."
- Number of interfaces: Modern personal computers typically use two memory interfaces (dual-channel mode) for an effective 128-bit bus width.
For example, a computer with dual-channel memory and one DDR2-800 module per channel running at 400 MHz would have a theoretical maximum memory bandwidth of:
- 400,000,000 clocks per second × 2 lines per clock × 64 bits per line × 2 interfaces =
- 102,400,000,000 (102.4 billion) bits per second (in bytes, 12,800 MB/s or 12.8 GB/s)
This theoretical maximum memory bandwidth is referred to as the "burst rate," which may not be sustainable.
The naming convention for DDR, DDR2 and DDR3 modules specifies either a maximum speed (e.g., DDR2-800) or a maximum bandwidth (e.g., PC2-6400). The speed rating (800) is not the maximum clock speed, but twice that (because of the doubled data rate). The specified bandwidth (6400) is the maximum megabytes transferred per second using a 64-bit width. In a dual-channel mode configuration, this is effectively a 128-bit width. Thus, the memory configuration in the example can be simplified as: two DDR2-800 modules running in dual-channel mode.
Two memory interfaces per module is a common configuration for PC system memory, but single-channel configurations are common in older, low-end, or low-power devices. Some personal computers and most modern graphics cards use more than two memory interfaces (e.g., four in the Mac Pro and six in the nVidia 8800 GTX). High-performance graphics cards running many interfaces in parallel can attain very high total memory bus width (e.g., 384 bits in the Nvidia Titan).
Multi-channel configurations with a standard 64-bit line size require at least one physical memory chip for each 64 bits of bus width. Thus, a 256-bit memory bus requires at least four separate memory chips per module.
In systems with error-correcting (ECC) memory, the additional width of the interfaces (typically 72 rather than 64 bits) is not counted in bandwidth specifications because the extra bits are unavailable to store user data. ECC bits are better thought of as part of the memory hardware rather than as information stored in that hardware.
- Major factors in actual DRAM bandwidth:
- STREAM Benchmark FAQ: Counting Bytes and FLOPS: http://www.cs.virginia.edu/stream/ref.html#counting