|Developer(s)||Kent Overstreet and others|
|Type||Linux kernel features|
bcache (abbreviated from block cache) is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.
Designed around the nature and performance characteristics of SSDs, bcache also minimizes write amplification by avoiding random writes and turning them into sequential writes instead. This merging of I/O operations is performed for both the cache and the primary storage, helping in extending the lifetime of flash-based devices used as caches, and in improving the performance of write-sensitive primary storages, such as RAID 5 sets.
bcache is licensed under the GNU General Public License (GPL), with Kent Overstreet as its primary developer.
Using bcache makes it possible to have SSDs as another level of indirection within the data storage access paths, resulting in improved overall performance by utilizing fast flash-based SSDs as caches for slower mechanical hard disk drives (HDDs) with rotational magnetic media. That way, the gap between SSDs and HDDs can be bridged – the costly speed of SSDs gets combined with the cheap storage capacity of traditional HDDs.
Caching is implemented by using SSDs for storing data associated with performed random reads and random writes, utilizing near-zero seek times as the most prominent feature of SSDs. Sequential I/O is not cached, in order to avoid rapid SSD cache invalidation on such operations that are already suitable enough for HDDs; going around the cache for big sequential writes is known as the write-around policy. Not caching the sequential I/O also helps in extending the lifetime of SSDs used as caches. Write amplification is avoided by not performing random writes to SSDs; instead, all random writes to SSD caches are always combined into block-level writes, ending up with rewriting only the complete erase blocks on SSDs.
Both write-back and write-through (which is the default) policies are supported for caching write operations. In case of the write-back policy, written data is stored inside the SSD caches first, and propagated to the HDDs later in a batched way while performing seek-friendly operations – making bcache to act also as an I/O scheduler. For the write-through policy, which ensures that no write operation is marked as finished until the data requested to be written has reached both SSDs and HDDs, performance improvements are reduced by effectively performing only caching of the written data.
Write-back policy with batched writes to HDDs provides additional benefits to write-sensitive redundant array of independent disks (RAID) layouts such as RAID 5 and RAID 6, which perform actual write operations as atomic read-modify-write sequences. That way, performance penalties of small random writes are reduced or avoided for such RAID layouts, by grouping them together and performing as batched sequential writes.
Caching performed by bcache operates at the block device level, making itself file system–agnostic as long as the file system provides an embedded universally unique identifier (UUID); this requirement is satisfied by virtually all standard Linux file systems, as well as by swap partitions. Sizes of the logical blocks used internally by bcache as caching extents can go down to the size of a single HDD sector.
The bcache was first announced by Kent Overstreet in July 2010, as a completely working Linux kernel module, though at its early beta stage. The development continued for almost two years, until May 2012, at which point bcache reached its production-ready state.
As of version 3.10 of the Linux kernel, the following features are provided by bcache:
- The same cache device can be used for caching an arbitrary number of the primary storage devices
- Runtime attaching and detaching of primary storage devices from their caches, while mounted and in use (running in passthrough mode when not cached)
- Automated recovery from unclean shutdowns – writes are not completed until the cache is consistent with respect to the primary storage device; internally, bcache makes no differences between clean and unclean shutdowns
- Transparent handling of I/O errors generated by the cache devices
- Write barriers and associated cache flushes are properly handled
- Write-through (which is the default), write-back and write-around policies
- Sequential I/O is detected and bypassed, with configurable thresholds; bypassing can also be disabled
- Throttling of the I/O to the SSD if it becomes congested, as detected by measured latency of the SSD's I/O operations exceeding a configurable threshold; useful for configurations having one SSD providing caching for many HDDs
- Readahead on a cache miss (disabled by default)
- Highly efficient write-back implementation – dirty data is always written out in sorted order, and optionally background write-back is smoothly throttled down to keeping configured percentage of the cache dirty
- High-performance B+ trees are used internally – bcache is capable of around 1,000,000 IOPS on random reads, if the hardware is fast enough
- Various runtime statistics and configuration options are exposed through sysfs
- Awareness of data striping in RAID 5 and RAID 6 layouts – adding awareness of the stripe layout to the write-back policy, so decisions on caching will be giving preference to already "dirty" stripes, and actual background flushes will be writing out complete stripes first
- Handling cache misses with already full B+ tree nodes – as of the bcache version in Linux kernel 3.10, splits of the internally used B+ tree nodes happen on writes, making initial cache warm-ups hardly achievable
- Multiple SSDs in a cache set – only dirty data (for the write-back policy) and metadata would be mirrored, without wasting SSD space for the clean data and read caches
- Data checksumming
- dm-cache – a Linux kernel's device mapper target that allows creation of hybrid volumes
- Flashcache – a disk cache component for the Linux kernel, initially developed by Facebook
- Hybrid drive – a storage device that combines flash-based and spinning magnetic media storage technologies
- ReadyBoost – a disk caching software component of Windows Vista and later Microsoft operating systems
- Smart Response Technology (SRT) – a proprietary disk storage caching mechanism, developed by Intel for its chipsets
- Petros Koutoupis (November 25, 2013). "Advanced Hard Drive Caching Techniques". Linux Journal. Retrieved December 2, 2013.
- "Linux kernel documentation: Documentation/bcache.txt". kernel.org. August 12, 2013. Retrieved January 24, 2014.
- Kent Overstreet. "bcache: Linux kernel block layer cache". bcache.evilpiepirate.org. Retrieved December 2, 2013.
- Jonathan Corbet (May 12, 2012). "A bcache update". LWN.net. Retrieved October 4, 2013.
- "Basic RAID Organizations". ecs.umass.edu. Retrieved October 4, 2013.
- William Stearns; Kent Overstreet (July 2, 2010). "Bcache: Caching beyond just RAM". LWN.net. Retrieved October 4, 2013.
- Kent Overstreet (July 4, 2010). "Bcache: Version 6". LWN.net. Retrieved October 4, 2013.
- "Linux kernel 3.10, Section 1.2. Bcache, a block layer cache for SSD caching". kernelnewbies.org. June 30, 2013. Retrieved October 4, 2013.
- Libby Clark (June 11, 2013). "All About the Linux Kernel: Bcache". linux.com. Retrieved October 9, 2013.
|Wikimedia Commons has media related to Linux kernel.|
- Official website
- Design overview and details (bcache documentation), May 9, 2013
- LSFMM: Caching – dm-cache and bcache, LWN.net, May 1, 2013, by Jake Edge
- Linux Block Caching Choices in Stable Upstream Kernel (PDF), Dell, December 2013
- Testing bcache series: Throughput, IOPS, Metadata, and Large Files and a Wrap-Up, Linux Magazine, August–September 2010, by Jeffrey B. Layton
- Performance Comparison among EnhanceIO, bcache and dm-cache, LKML, June 11, 2013
- EnhanceIO, Bcache & DM-Cache Benchmarked, Phoronix, June 11, 2013, by Michael Larabel