|This article does not cite any references or sources. (January 2010)|
The prefetch buffer takes advantage of the specific characteristics of memory accesses to DRAM. Typical DRAM memory operations involve three phases: bitline precharge, row access, column access. Row access is the heart of a read operation, as it involves the careful sensing of the tiny signals in DRAM memory cells; it is the slowest phase of memory operation. However, once a row is read, subsequent column accesses to that same row can be very quick, as the sense amplifiers also act as latches. For reference, a row of a 1Gb DDR3 device is 2,048 bits wide, so internally 2,048 bits are read into 2,048 separate sense amplifiers during the row access phase. Row accesses might take 50 ns, depending on the speed of the DRAM, whereas column accesses off an open row are less than 10 ns.
Traditional DRAM architectures have long supported fast column access to bits on an open row. For an 8-bit-wide memory chip with a 2,048 bit wide row, accesses to any of the 256 datawords (2048/8) on the row can be very quick, provided no intervening accesses to other rows occur.
The drawback of the older fast column access method was that a new column address had to be sent for each additional dataword on the row. The address bus had to operate at the same frequency as the data bus. A prefetch buffer simplifies this process by allowing a single address request to result in multiple data words.
In a prefetch buffer architecture, when a memory access occurs to a row the buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants adjacent datawords in memory, which in practice is very often the case. For instance, when a 64 bit CPU accesses a 16-bit-wide DRAM chip, it will need 4 adjacent 16 bit datawords to make up the full 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip; it is multiplied by the burst depth "4" to give the size in bits of the full burst sequence). An 8n prefetch buffer on an 8 bit wide DRAM would also accomplish a 64 bit transfer.
The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core (each memory access results in a burst of 8 datawords on the IOs). Thus a 200 MHz memory core is combined with IOs that each operate eight times faster (1600 megabits/second). If the memory has 16 IOs, the total read bandwidth would be 200 MHz x 8 datawords/access x 16 IOs = 25.6 gigabits/second (Gbps), or 3.2 gigabytes/second (GBps). Modules with multiple DRAM chips can provide correspondingly higher bandwidth.
Each generation of SDRAM has a different prefetch buffer size:
- DDR SDRAM's prefetch buffer size is 2n (two datawords per memory access)
- DDR2 SDRAM's prefetch buffer size is 4n (four datawords per memory access)
- DDR3 SDRAM's prefetch buffer size is 8n (eight datawords per memory access)
The speed of memory has not historically increased inline with CPU improvements. In order to increase the bandwidth of memory modules the prefetch buffer reads data from multiple memory chips simultaneously. This is similar to a RAID array in the storage world. Also it is similar to the concept of Dual Channel memory - but the extra channels are internal to each module. Sequential access bandwidth is markedly improved using prefetch buffers, but random access is mostly unchanged.