Disk buffer

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Not to be confused with page cache.
The disk buffer sits on the controller board of the hard drive.
500 GB hard disk with a 16 MB buffer

In computer storage, disk buffer (often ambiguously called disk cache or cache buffer) is the embedded memory in a hard drive acting as a buffer between the rest of the computer and the physical hard disk platter that is used for storage. Modern hard disks come with 8 to 128 MiB of such memory, and solid-state drives come with up to 512 MB or 1 GB of cache memory.

Since the late 1980s, nearly all disks sold have embedded microcontrollers and either an ATA, Serial ATA, SCSI, or Fibre Channel interface. The drive circuitry usually has a small amount of memory, used to store the bits going to and coming from the disk platter.

The disk buffer is physically distinct from and is used differently from the page cache typically kept by the operating system in the computer's main memory. The disk buffer is controlled by the microcontroller in the hard disk drive, and the page cache is controlled by the computer to which that disk is attached. The disk buffer is usually quite small, from 8 to 64 MiB, and the page cache is generally all unused physical memory. While data in the page cache is reused multiple times, the data in the disk buffer is rarely reused.[citation needed] In this sense, the terms disk cache and cache buffer are misnomers; the embedded controller's memory is more appropriately called the disk buffer.

Note that disk array controllers, as opposed to disk controllers, usually have normal cache memory of around 0.5–8 GiB.

Uses[edit]

Read-ahead/read-behind[edit]

When executing a read from the disk, the disk arm moves the read/write head to (or near) the correct track, and after some settling time the read head begins to pick up bits. Usually, the first sectors to be read are not the ones that have been requested by the operating system. The disk's embedded computer typically saves these unrequested sectors in the disk buffer, in case the operating system requests them later.

Speed matching[edit]

The speed of the disk's I/O interface to the computer almost never matches the speed at which the bits are transferred to and from the hard disk platter. The disk buffer is used so that both the I/O interface and the disk read/write head can operate at full speed.

Write acceleration[edit]

The disk's embedded microcontroller may signal the main computer that a disk write is complete immediately after receiving the write data, before the data are actually written to the platter. This early signal allows the main computer to continue working even though the data has not actually been written yet. This can be somewhat dangerous, because if power is lost before the data are permanently fixed in the magnetic media, the data will be lost from the disk buffer, and the file system on the disk may be left in an inconsistent state.

On some disks, this vulnerable period between signaling the write complete and fixing the data can be arbitrarily long, as the write can be deferred indefinitely by newly arriving requests. For this reason, the use of write acceleration can be controversial. Consistency can be maintained, however, by using a battery-backed memory system for caching data, although this is typically only found in high-end RAID controllers.

Alternatively, the caching can simply be turned off when the integrity of data is deemed more important than write performance. Another option is to send data to disk in a carefully managed order and to issue "cache flush" commands in the right places, what is usually referred to as the implemenation of write barriers.

Command queuing[edit]

Newer SATA and most SCSI disks can accept multiple commands while any one command is in operation through "command queuing" (see NCQ and TCQ). These commands are stored by the disk's embedded controller until they are completed. One benefit is that the commands can be re-ordered to be processed more efficiently, so that commands affecting the same area of a disk are grouped together. Should a read reference the data at the destination of a queued write, the to-be-written data will be returned. Command queuing is different from write acceleration in that the main computer's operating system is notified when data is actually written onto the magnetic media. The OS can use this information to keep the file system consistent through rescheduled writes.

Cache control from the host[edit]

Cache flushing[edit]

Data that was accepted in write cache of a disk device will be eventually written to disk platters, provided that no starvation condition occurs as a result of firmware flaw, and that disk power supply is not interrupted before cached writes are forced to disk platters. In order to control write cache, ATA specification included FLUSH CACHE (E7h) and FLUSH CACHE EXT (EAh) commands. These commands cause the disk to complete writing data from its cache, and disk will return good status after data in the write cache is written to disk media.[1] In addition, flushing the cache can be initiated at least to some disks by issuing Soft reset or Standby (Immediate) command.[1]

Mandatory cache flushing is used in Linux for implementation of write barriers in some filesystems (for example, ext4), together with Force Unit Access write command for journal commit blocks.[2]

Force Unit Access (FUA)[edit]

Force Unit Access (FUA) is an I/O write command option that forces written data all the way to stable storage. FUA write commands (WRITE DMA FUA EXT – 3Dh, WRITE DMA QUEUED FUA EXT – 3Eh, WRITE MULTIPLE FUA EXT – CEh), in contrast to corresponding commands without FUA, write data directly to the media, regardless of whether write caching in the device is enabled or not. FUA write command will not return until data is written to media, thus data written by a completed FUA write command is on permanent media even if the device is powered off before issuing a FLUSH CACHE command.[3][4]

FUA appeared in the SCSI command set, and was later adopted by SATA with NCQ. FUA is more fine-grained as it allows a single write operation to be forced to stable media and thus has smaller overall performance impact when compared to commands that flush the entire disk cache, such as the ATA FLUSH CACHE family of commands.[4][5]

Windows (Vista and up) supports FUA as part of Transactional NTFS, but only for SCSI or Fibre Channel disks where support for FUA is common.[6] It is not known whether a SATA drive that supports FUA write commands will actually honor the command and write data to disk platters as instructed;[citation needed] thus, Windows 8 and Windows Server 2012 instead send commands to flush the disk write cache after certain write operations.[7]

Although Linux kernel gained support fot NCQ around 2007, it did not enable SATA/NCQ FUA until 2012, citing lack of support in the early SATA drives.[8][9] Linux kernel supports FUA at the block layer level.[10]

See also[edit]

References[edit]

  1. ^ a b Hitachi (2006). Deskstar 7K80 Disk Drive Specification, 4th Edition (Revision 1.6)(12 September 2006) Final. Hitachi Global Storage Technologies. pp. 130–131. 
  2. ^ "Does ext4 send FUA to flush disk cache". Christoph Hellwig, Theodore Ts'o. spinics.net. Retrieved 2014-03-18. 
  3. ^ "Information technology-AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS)". T13/1699-D, revision 6-a, 2008-09-06. American National Standards Institute, Inc. Retrieved 2014-03-18. 
  4. ^ a b Gregory Smith (2010). PostgreSQL 9.0: High Performance. Packt Publishing Ltd. p. 78. ISBN 978-1-84951-031-8. 
  5. ^ Bruce Jacob; Spencer Ng; David Wang (2010). Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann. p. 734. ISBN 978-0-08-055384-9. 
  6. ^ "Deploying Transactional NTFS (Windows)". Msdn.microsoft.com. 2013-12-05. Retrieved 2014-01-24. 
  7. ^ "Forced Unit Access | Working Hard In IT". Workinghardinit.wordpress.com. 2012-10-12. Retrieved 2014-01-24. 
  8. ^ "Enabling FUA for SATA drives (was Re: [RFC][PATCH] libata: enable SATA disk fua detection on default) (Linux SCSI)". Spinics.net. 2012-08-17. Retrieved 2014-01-24. 
  9. ^ Robert Hancock <hancockr@shaw.ca>. "Linux-Kernel Archive: [PATCH RFC] libata: FUA updates". Lkml.indiana.edu. Retrieved 2014-01-24. 
  10. ^ "Documentation/block/writeback_cache_control.txt". Linux kernel documentation. kernel.org. 2013-08-12. Retrieved 2014-01-24.