Jump to content

RAID: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
It really sucks when the OWNER of a page proudly (and betcha persistently) puts back mistakes, typo, and cruft. See Template:Cite_web#Examples for usage of website=.
Article-level consistency is more important, please see Talk:RAID § Reference websites and other stuff; Improved the wording a bit and fixed a small grammar mistake
Line 13: Line 13:
| date = October 2010 | accessdate = 2015-01-18
| date = October 2010 | accessdate = 2015-01-18
| author = Randy H. Katz | publisher = IEEE Computer Society
| author = Randy H. Katz | publisher = IEEE Computer Society
| website = EECS @ University of Michigan | format = PDF
| website = eecs.umich.edu | format = PDF
| quote = We were not the first to think of the idea of replacing what Patterson described as a slow large expensive disk (SLED) with an array of inexpensive disks. For example, the concept of disk mirroring, pioneered by Tandem, was well known, and some storage products had already been constructed around arrays of small disks.
| quote = We were not the first to think of the idea of replacing what Patterson described as a slow large expensive disk (SLED) with an array of inexpensive disks. For example, the concept of disk mirroring, pioneered by Tandem, was well known, and some storage products had already been constructed around arrays of small disks.
}}</ref> including the following:
}}</ref> including the following:
Line 92: Line 92:
Some advanced [[file system]]s are designed to organize data across multiple storage devices directly (without needing the help of a third-party logical volume manager):
Some advanced [[file system]]s are designed to organize data across multiple storage devices directly (without needing the help of a third-party logical volume manager):


* [[ZFS]] supports equivalents of RAID&nbsp;0, RAID&nbsp;1, RAID&nbsp;5 (RAID-Z), RAID&nbsp;6 (RAID-Z2) and a triple-parity version RAID-Z3. As it always stripes over top-level vdevs, it supports equivalents of the 1+0, 5+0, and 6+0 nested RAID levels (as well as striped triple-parity sets) but not other nested combinations. ZFS is the native file system on [[Solaris (operating system)|Solaris]] and also available on FreeBSD and Linux.<ref>{{cite web|url=http://docs.oracle.com/cd/E23823_01/html/819-5461/gaypw.html |title=Creating and Destroying ZFS Storage Pools - Oracle Solaris ZFS Administration Guide |publisher=[[Oracle Corporation]] |date=2012-04-01 |accessdate=2014-07-27}}</ref><ref>{{cite web|url=http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/filesystems-zfs.html |title=20.2. The Z File System (ZFS) |website=The FreeBSD Project |accessdate=2014-07-27}}</ref><ref>{{cite web|url=http://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gcviu/index.html |title=Double Parity RAID-Z (raidz2) (Solaris ZFS Administration Guide) |publisher=[[Oracle Corporation]] |accessdate=2014-07-27}}</ref><ref>{{cite web|url=http://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/givdn/index.html |title=Triple Parity RAIDZ (raidz3) (Solaris ZFS Administration Guide) |publisher=[[Oracle Corporation]] |accessdate=2014-07-27}}</ref>
* [[ZFS]] supports equivalents of RAID&nbsp;0, RAID&nbsp;1, RAID&nbsp;5 (RAID-Z), RAID&nbsp;6 (RAID-Z2) and a triple-parity version RAID-Z3. As it always stripes over top-level vdevs, it supports equivalents of the 1+0, 5+0, and 6+0 nested RAID levels (as well as striped triple-parity sets) but not other nested combinations. ZFS is the native file system on [[Solaris (operating system)|Solaris]] and also available on FreeBSD and Linux.<ref>{{cite web|url=http://docs.oracle.com/cd/E23823_01/html/819-5461/gaypw.html |title=Creating and Destroying ZFS Storage Pools - Oracle Solaris ZFS Administration Guide |publisher=[[Oracle Corporation]] |date=2012-04-01 |accessdate=2014-07-27}}</ref><ref>{{cite web|url=http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/filesystems-zfs.html |title=20.2. The Z File System (ZFS) |website=freebsd.org |accessdate=2014-07-27}}</ref><ref>{{cite web|url=http://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gcviu/index.html |title=Double Parity RAID-Z (raidz2) (Solaris ZFS Administration Guide) |publisher=[[Oracle Corporation]] |accessdate=2014-07-27}}</ref><ref>{{cite web|url=http://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/givdn/index.html |title=Triple Parity RAIDZ (raidz3) (Solaris ZFS Administration Guide) |publisher=[[Oracle Corporation]] |accessdate=2014-07-27}}</ref>
* [[IBM General Parallel File System|GPFS]], initially developed by IBM for media streaming and scalable analytics, supports declustered RAID protection schemes up to n+3. A particularity is the dynamic rebuilding priority which runs with low impact in the background until a data chunk hits n+0 redundancy, in which case this chunk is quickly rebuilt to at least n+1. On top, GPFS supports metro-distance RAID&nbsp;1.<ref>{{Cite web|title=General Parallel File System (GPFS) Native RAID|url=http://www.usenix.org/events/lisa11/tech/slides/deenadhayalan.pdf|first=Veera|last=Deenadhayalan|publisher=[[IBM]]|website=UseNix.org|year=2011|accessdate=2014-09-28}}</ref>
* [[IBM General Parallel File System|GPFS]], initially developed by IBM for media streaming and scalable analytics, supports declustered RAID protection schemes up to n+3. A particularity is the dynamic rebuilding priority which runs with low impact in the background until a data chunk hits n+0 redundancy, in which case this chunk is quickly rebuilt to at least n+1. On top, GPFS supports metro-distance RAID&nbsp;1.<ref>{{Cite web|title=General Parallel File System (GPFS) Native RAID|url=http://www.usenix.org/events/lisa11/tech/slides/deenadhayalan.pdf|first=Veera|last=Deenadhayalan|publisher=[[IBM]]|website=UseNix.org|year=2011|accessdate=2014-09-28}}</ref>
* [[Btrfs]] supports RAID&nbsp;0, RAID&nbsp;1 and RAID&nbsp;10 (RAID&nbsp;5 and 6 are under development).<ref>{{cite web|title = Btrfs Wiki: Feature List|date = 2012-11-07|accessdate = 2012-11-16|url = https://btrfs.wiki.kernel.org/index.php/Main_Page#Features}}</ref><ref>{{cite web|title = Btrfs Wiki: Changelog|date = 2012-10-01|accessdate = 2012-11-14|url = https://btrfs.wiki.kernel.org/index.php/Changelog}}</ref><!--Although in Wiki format, this is documentation and changelog used by btrfs, a GPL project-->
* [[Btrfs]] supports RAID&nbsp;0, RAID&nbsp;1 and RAID&nbsp;10 (RAID&nbsp;5 and 6 are under development).<ref>{{cite web|title = Btrfs Wiki: Feature List|date = 2012-11-07|accessdate = 2012-11-16|url = https://btrfs.wiki.kernel.org/index.php/Main_Page#Features}}</ref><ref>{{cite web|title = Btrfs Wiki: Changelog|date = 2012-10-01|accessdate = 2012-11-14|url = https://btrfs.wiki.kernel.org/index.php/Changelog}}</ref><!--Although in Wiki format, this is documentation and changelog used by btrfs, a GPL project-->
Line 111: Line 111:
Software-implemented RAID is not always compatible with the system's boot process, and it is generally impractical for desktop versions of Windows. However, hardware RAID controllers are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do not contain a dedicated RAID controller chip, but simply a standard drive controller chip with proprietary firmware and drivers. During early bootup, the RAID is implemented by the firmware and, once the operating system has been more completely loaded, the drivers take over control. Consequently, such controllers may not work when driver support is not available for the host operating system.<ref>{{cite web|title=SATA RAID FAQ|url=https://ata.wiki.kernel.org/index.php/SATA_RAID_FAQ|publisher=Ata.wiki.kernel.org |date=2011-04-08 |accessdate=2012-08-26}}</ref> An example is [[Intel Matrix RAID]], implemented on many consumer-level motherboards.<ref>[https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s1-raid-approaches.html Red Hat Enterprise Linux - Storage Administrator Guide - RAID Types]</ref><ref name="RusselCrawford2011">{{cite book|author1=Charlie Russel|author2=Sharon Crawford|author3=Andrew Edney|title=Working with Windows Small Business Server 2011 Essentials|url=http://books.google.com/books?id=R2gJ9kcX2ywC&pg=PA90|year=2011|publisher=O'Reilly Media, Inc.|isbn=978-0-7356-5670-3|page=90}}</ref>
Software-implemented RAID is not always compatible with the system's boot process, and it is generally impractical for desktop versions of Windows. However, hardware RAID controllers are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do not contain a dedicated RAID controller chip, but simply a standard drive controller chip with proprietary firmware and drivers. During early bootup, the RAID is implemented by the firmware and, once the operating system has been more completely loaded, the drivers take over control. Consequently, such controllers may not work when driver support is not available for the host operating system.<ref>{{cite web|title=SATA RAID FAQ|url=https://ata.wiki.kernel.org/index.php/SATA_RAID_FAQ|publisher=Ata.wiki.kernel.org |date=2011-04-08 |accessdate=2012-08-26}}</ref> An example is [[Intel Matrix RAID]], implemented on many consumer-level motherboards.<ref>[https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s1-raid-approaches.html Red Hat Enterprise Linux - Storage Administrator Guide - RAID Types]</ref><ref name="RusselCrawford2011">{{cite book|author1=Charlie Russel|author2=Sharon Crawford|author3=Andrew Edney|title=Working with Windows Small Business Server 2011 Essentials|url=http://books.google.com/books?id=R2gJ9kcX2ywC&pg=PA90|year=2011|publisher=O'Reilly Media, Inc.|isbn=978-0-7356-5670-3|page=90}}</ref>


Because some minimal hardware support is involved, this implementation approach is also called "hardware-assisted software RAID",<ref>{{cite web|author=Warren Block |url=http://www.freebsd.org/doc/handbook/geom-graid.html |title=19.5. Software RAID Devices |website=The FreeBSD Project |accessdate=2014-07-27}}</ref><ref name="KrutzConley2007">{{cite book|author1=Ronald L. Krutz|author2=James Conley|title=Wiley Pathways Network Security Fundamentals|url=http://books.google.com/books?id=Gdux_6ckDYwC&pg=PA422|year=2007|publisher=John Wiley & Sons|isbn=978-0-470-10192-6|page=422}}</ref><ref name="AdaptecWP" /> "hybrid model" RAID,<ref name="AdaptecWP" /> or even "fake RAID".<ref name="Smith2010">{{cite book|author=Gregory Smith|title=PostgreSQL 9.0: High Performance|url=http://books.google.com/books?id=OWOAu0GcsqoC&pg=PT72|year=2010|publisher=Packt Publishing Ltd|isbn=978-1-84951-031-8|page=31}}</ref> If RAID&nbsp;5 is supported, the hardware may provide a hardware XOR accelerator. An advantage of this model over the pure software RAID is that—if using a redundancy mode—the boot drive is protected from failure (due to the firmware) during the boot process even before the operating systems drivers take over.<ref name="AdaptecWP">[http://www.adaptec.com/nr/rdonlyres/14b2fd84-f7a0-4ac5-a07a-214123ea3dd6/0/4423_sw_hwraid_10.pdf Hardware RAID vs. Software RAID: Which Implementation is Best for my Application? Adaptec Whitepaper]</ref>
Because some minimal hardware support is involved, this implementation approach is also called "hardware-assisted software RAID",<ref>{{cite web|author=Warren Block |url=http://www.freebsd.org/doc/handbook/geom-graid.html |title=19.5. Software RAID Devices |website=freebsd.org |accessdate=2014-07-27}}</ref><ref name="KrutzConley2007">{{cite book|author1=Ronald L. Krutz|author2=James Conley|title=Wiley Pathways Network Security Fundamentals|url=http://books.google.com/books?id=Gdux_6ckDYwC&pg=PA422|year=2007|publisher=John Wiley & Sons|isbn=978-0-470-10192-6|page=422}}</ref><ref name="AdaptecWP" /> "hybrid model" RAID,<ref name="AdaptecWP" /> or even "fake RAID".<ref name="Smith2010">{{cite book|author=Gregory Smith|title=PostgreSQL 9.0: High Performance|url=http://books.google.com/books?id=OWOAu0GcsqoC&pg=PT72|year=2010|publisher=Packt Publishing Ltd|isbn=978-1-84951-031-8|page=31}}</ref> If RAID&nbsp;5 is supported, the hardware may provide a hardware XOR accelerator. An advantage of this model over the pure software RAID is that—if using a redundancy mode—the boot drive is protected from failure (due to the firmware) during the boot process even before the operating systems drivers take over.<ref name="AdaptecWP">[http://www.adaptec.com/nr/rdonlyres/14b2fd84-f7a0-4ac5-a07a-214123ea3dd6/0/4423_sw_hwraid_10.pdf Hardware RAID vs. Software RAID: Which Implementation is Best for my Application? Adaptec Whitepaper]</ref>


== {{Anchor|SCRUBBING}}Integrity ==
== {{Anchor|SCRUBBING}}Integrity ==
Line 125: Line 125:


===Correlated failures===
===Correlated failures===
In practice, the drives are often the same age (with similar wear) and subject to the same environment. Since many drive failures are due to mechanical issues (which are more likely on older drives), this violates the assumptions of independent, identical rate of failure amongst drives; failures are in fact statistically correlated.<ref name="Patterson_1994" /> In practice, the chances of a second failure before the first has been recovered (causing data loss) is higher than for random failures. In a study of about 100,000 drives, the probability of two drives in the same cluster failing within one hour was four times larger than predicted by the [[exponential distribution|exponential statistical distribution]]—which characterizes processes in which events occur continuously and independently at a constant average rate. The probability of two failures in the same 10-hour period was twice as large as predicted by an exponential distribution.<ref name="schroeder">[http://www.usenix.org/events/fast07/tech/schroeder.html Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?] Bianca Schroeder and [[Garth A. Gibson]]</ref>
In practice, the drives are often the same age (with similar wear) and subject to the same environment. Since many drive failures are due to mechanical issues (which are more likely on older drives), this violates the assumptions of independent, identical rate of failure amongst drives; failures are in fact statistically correlated.<ref name="Patterson_1994" /> In practice, the chances for a second failure before the first has been recovered (causing data loss) are higher than the chances for random failures. In a study of about 100,000 drives, the probability of two drives in the same cluster failing within one hour was four times larger than predicted by the [[exponential distribution|exponential statistical distribution]]—which characterizes processes in which events occur continuously and independently at a constant average rate. The probability of two failures in the same 10-hour period was twice as large as predicted by an exponential distribution.<ref name="schroeder">[http://www.usenix.org/events/fast07/tech/schroeder.html Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?] Bianca Schroeder and [[Garth A. Gibson]]</ref>


=== {{Anchor|URE|UBE|LSE}}Unrecoverable read errors during rebuild ===
=== {{Anchor|URE|UBE|LSE}}Unrecoverable read errors during rebuild ===

Revision as of 22:26, 20 April 2015

RAID (originally redundant array of inexpensive disks; now commonly redundant array of independent disks) is a data storage virtualization technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy or performance improvement.[1]

Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on the specific level of redundancy and performance required. The different schemes or architectures are named by the word RAID followed by a number (e.g. RAID 0, RAID 1). Each scheme provides a different balance between the key goals: reliability, availability, performance, and capacity. RAID levels greater than RAID 0 provide protection against unrecoverable (sector) read errors, as well as whole disk failure.

History

The term "RAID" was invented by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. In their June 1988 paper "A Case for Redundant Arrays of Inexpensive Disks (RAID)", presented at the SIGMOD conference, they argued that the top performing mainframe disk drives of the time could be beaten on performance by an array of the inexpensive drives that had been developed for the growing personal computer market. Although failures would rise in proportion to the number of drives, by configuring for redundancy, the reliability of an array could far exceed that of any large single drive.[2]

Although not yet using that terminology, the technologies of the five levels of RAID named in the paper were used in various products prior to the paper's publication,[3] including the following:

  • Around 1983, DEC began shipping subsystem mirrored RA8X disk drives (now known as RAID 1) as part of its HSC50 subsystem.[4]
  • Around 1988, the Thinking Machines' DataVault used error correction codes (now known as RAID 2) in an array of disk drives.[5] A similar approach was used in the early 1960s on the IBM 353.[6][7]
  • In 1977, Norman Ken Ouchi at IBM filed a patent disclosing what was subsequently named RAID 4.[8]
  • In 1986, Clark et al. at IBM filed a patent disclosing what was subsequently named RAID 5.[9]

Industry RAID manufacturers later tended to interpret the acronym as standing for "redundant array of independent disks".[10][11][12][13]

Overview

Many RAID levels employ an error protection scheme called "parity", a widely used method in information technology to provide fault tolerance in a given set of data. Most use simple XOR, but RAID 6 uses two separate parities based respectively on addition and multiplication in a particular Galois field or Reed–Solomon error correction.[14]

RAID can also provide data security with solid-state drives (SSDs) without the expense of an all-SSD system. For example, a fast SSD can be mirrored with a mechanical drive. For this configuration to provide a significant speed advantage an appropriate controller is needed that uses the fast SSD for all read operations. Adaptec calls this "hybrid RAID".[15]

Standard levels

A number of standard schemes have evolved. These are called levels. Originally, there were five RAID levels, but many variations have evolved—notably several nested levels and many non-standard levels (mostly proprietary). RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard:[16][17]

RAID 0
RAID 0 consists of striping, without mirroring or parity. The capacity of a RAID 0 volume is the sum of the capacities of the disks in the set, the same as with a spanned volume. There is no added redundancy for handling disk failures, just as with a spanned volume. Thus, failure of one disk causes the loss of the entire RAID 0 volume, with reduced possibilities of data recovery when compared to a broken spanned volume. Striping distributes the contents of files roughly equally among all disks in the set, which makes concurrent read or write operations on the multiple disks almost inevitable and results in performance improvements. The concurrent operations make the throughput of most read and write operations equal to the throughput of one disk multiplied by the number of disks. Increased throughput is the big benefit of RAID 0 versus spanned volume.[11]
RAID 1
RAID 1 consists of data mirroring, without parity or striping. Data is written identically to two (or more) drives, thereby producing a "mirrored set" of drives. Thus, any read request can be serviced by any drive in the set. If a request is broadcast to every drive in the set, it can be serviced by the drive that accesses the data first (depending on its seek time and rotational latency), improving performance. Sustained read throughput, if the controller or software is optimized for it, approaches the sum of throughputs of every drive in the set, just as for RAID 0. Actual read throughput of most RAID 1 implementations is slower than the fastest drive. Write throughput is always slower because every drive must be updated, and the slowest drive limits the write performance. The array continues to operate as long as at least one drive is functioning.[11]
RAID 2
RAID 2 consists of bit-level striping with dedicated Hamming-code parity. All disk spindle rotation is synchronized and data is striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive.[11] This level is of historical significance only; although it was used on some early machines (for example, the Thinking Machines CM-2),[18] as of 2014 it is not used by any of the commercially available systems.[19]
RAID 3
RAID 3 consists of byte-level striping with dedicated parity. All disk spindle rotation is synchronized and data is striped such that each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive.[11] Although implementations exist,[20] RAID 3 is not commonly used in practice.
RAID 4
RAID 4 consists of block-level striping with dedicated parity. This level was previously used by NetApp, but has now been largely replaced by a proprietary implementation of RAID 4 with two parity disks, called RAID-DP.[21]
RAID 5
RAID 5 consists of block-level striping with distributed parity. Unlike in RAID 4, parity information is distributed among the drives. It requires that all drives but one be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks.[11] RAID 5 is seriously affected by the general trends regarding array rebuild time and the chance of drive failure during rebuild.[22] Rebuilding an array requires reading all data from all disks, opening a chance for a second drive failure and the loss of entire array. In August 2012, Dell posted an advisory against the use of RAID 5 in any configuration and RAID 50 with "Class 2 7200 RPM drives of 1 TB and higher capacity" for business-critical data.[23]
RAID 6
RAID 6 consists of block-level striping with double distributed parity. Double parity provides fault tolerance up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems, as large-capacity drives take longer to restore. RAID 6 requires a minimum of four disks. As with RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced.[11] With a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5.[24] RAID 10 also minimizes these problems.[25]

Nested (hybrid) RAID

In what was originally termed hybrid RAID,[26] many storage controllers allow RAID levels to be nested. The elements of a RAID may be either individual drives or arrays themselves. Arrays are rarely nested more than one level deep.[27]

The final array is known as the top array. When the top array is RAID 0 (such as in RAID 1+0 and RAID 5+0), most vendors omit the "+" (yielding RAID 10 and RAID 50, respectively).

  • RAID 0+1: creates a second striped set to mirror a primary striped set. The array continues to operate with one or more drives failed in the same mirror set, but if drives fail on both sides of the mirror the data on the RAID system is lost.
  • RAID 1+0: creates a striped set from a series of mirrored drives. The array can sustain multiple drive losses so long as no mirror loses all its drives.[28]

Non-standard levels

Many configurations other than the basic numbered RAID levels are possible, and many companies, organizations, and groups have created their own non-standard configurations, in many cases designed to meet the specialized needs of a small niche group. Such configurations include the following:

  • Linux MD RAID 10 provides a general RAID driver that in its "near" layout defaults to a standard RAID 1 with two drives, and a standard RAID 1+0 with four drives; though, it can include any number of drives, including odd numbers. With its "far" layout, MD RAID 10 can run both striped and mirrored, even with only two drives in f2 layout; this runs mirroring with striped reads, giving the read performance of RAID 0. Regular RAID 1, as provided by Linux software RAID, does not stripe reads, but can perform reads in parallel.[28][29][30]
  • Hadoop has a RAID system that generates a parity file by xor-ing a stripe of blocks in a single HDFS file.[31]

Implementations

The distribution of data across multiple drives can be managed either by dedicated computer hardware or by software. A software solution may be part of the operating system, part of the firmware and drivers supplied with a standard drive controller (so-called "hardware-assisted software RAID"), or it may reside entirely within the hardware RAID controller.

Software-based

Software RAID implementations are now provided by many operating systems. Software RAID can be implemented as:

  • A layer that abstracts multiple devices, thereby providing a single virtual device (e.g. Linux's md)
  • A more generic logical volume manager (provided with most server-class operating systems, e.g. Veritas or LVM)
  • A component of the file system (e.g. ZFS, GPFS or Btrfs)
  • A layer that sits above any file system and provides parity protection to user data (e.g. RAID-F)[32]

Some advanced file systems are designed to organize data across multiple storage devices directly (without needing the help of a third-party logical volume manager):

  • ZFS supports equivalents of RAID 0, RAID 1, RAID 5 (RAID-Z), RAID 6 (RAID-Z2) and a triple-parity version RAID-Z3. As it always stripes over top-level vdevs, it supports equivalents of the 1+0, 5+0, and 6+0 nested RAID levels (as well as striped triple-parity sets) but not other nested combinations. ZFS is the native file system on Solaris and also available on FreeBSD and Linux.[33][34][35][36]
  • GPFS, initially developed by IBM for media streaming and scalable analytics, supports declustered RAID protection schemes up to n+3. A particularity is the dynamic rebuilding priority which runs with low impact in the background until a data chunk hits n+0 redundancy, in which case this chunk is quickly rebuilt to at least n+1. On top, GPFS supports metro-distance RAID 1.[37]
  • Btrfs supports RAID 0, RAID 1 and RAID 10 (RAID 5 and 6 are under development).[38][39]

Many operating systems include basic RAID implementations:

  • Apple's OS X and OS X Server support RAID 0, RAID 1, and RAID 1+0.[40][41]
  • FreeBSD supports RAID 0, RAID 1, RAID 3, and RAID 5, and all nestings via GEOM modules and ccd.[42][43][44]
  • Linux's md supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6, and all nestings.[45] Certain reshaping/resizing/expanding operations are also supported.[46]
  • Microsoft's server operating systems support RAID 0, RAID 1, and RAID 5. Some of the Microsoft desktop operating systems support RAID. For example, Windows XP Professional supports RAID level 0, in addition to spanning multiple drives, but only if using dynamic disks and volumes. Windows XP can be modified to support RAID 0, 1, and 5.[47] Windows 8 and Windows Server 2012 introduces a RAID-like feature known as Storage Spaces, which also allows users to specify mirroring, parity, or no redundancy on a folder-by-folder basis.[48]
  • NetBSD supports RAID 0, 1, 4, and 5 via its software implementation, named RAIDframe.[49]

If a boot drive fails, the system has to be sophisticated enough to be able to boot off the remaining drive or drives. For instance, consider a computer whose disk is configured as RAID 1 (mirrored drives); if the first drive in the array fails, then a first-stage boot loader might not be sophisticated enough to attempt loading the second-stage boot loader from the second drive as a fallback. The second-stage boot loader for FreeBSD is capable of loading a kernel from such an array.[50]

Firmware- and driver-based

A SATA 3.0 controller, which provides RAID functionality through proprietary firmware and drivers

Software-implemented RAID is not always compatible with the system's boot process, and it is generally impractical for desktop versions of Windows. However, hardware RAID controllers are expensive and proprietary. To fill this gap, cheap "RAID controllers" were introduced that do not contain a dedicated RAID controller chip, but simply a standard drive controller chip with proprietary firmware and drivers. During early bootup, the RAID is implemented by the firmware and, once the operating system has been more completely loaded, the drivers take over control. Consequently, such controllers may not work when driver support is not available for the host operating system.[51] An example is Intel Matrix RAID, implemented on many consumer-level motherboards.[52][53]

Because some minimal hardware support is involved, this implementation approach is also called "hardware-assisted software RAID",[54][55][56] "hybrid model" RAID,[56] or even "fake RAID".[57] If RAID 5 is supported, the hardware may provide a hardware XOR accelerator. An advantage of this model over the pure software RAID is that—if using a redundancy mode—the boot drive is protected from failure (due to the firmware) during the boot process even before the operating systems drivers take over.[56]

Integrity

Data scrubbing (referred to in some environments as patrol read) involves periodic reading and checking by the RAID controller of all the blocks in an array, including those not otherwise accessed. This detects bad blocks before use.[58] Data scrubbing checks for bad blocks on each storage device in an array, but also uses the redundancy of the array to recover bad blocks on a single drive and to reassign the recovered data to spare blocks elsewhere on the drive.[59]

Frequently, a RAID controller is configured to "drop" a component drive (that is, to assume a component drive has failed) if the drive has been unresponsive for eight seconds or so; this might cause the array controller to drop a good drive because that drive has not been given enough time to complete its internal error recovery procedure. Consequently, using RAID for consumer-marketed drives can be risky, and so-called "enterprise class" drives limit this error recovery time to reduce risk.[citation needed] Western Digital's desktop drives used to have a specific fix. A utility called WDTLER.exe limited a drive's error recovery time. The utility enabled TLER (time limited error recovery), which limits the error recovery time to seven seconds. Around September 2009, Western Digital disabled this feature in their desktop drives (e.g. the Caviar Black line), making such drives unsuitable for use in RAID configurations.[60] However, Western Digital enterprise class drives are shipped from the factory with TLER enabled. Similar technologies are used by Seagate, Samsung, and Hitachi. Of course, for non-RAID usage, an enterprise class drive with a short error recovery timeout that cannot be changed is therefore less suitable than a desktop drive.[60] In late 2010, the Smartmontools program began supporting the configuration of ATA Error Recovery Control, allowing the tool to configure many desktop class hard drives for use in RAID setups.[60]

While RAID may protect against physical drive failure, the data is still exposed to operator, software, hardware, and virus destruction. Many studies cite operator fault as the most common source of malfunction,[61] such as a server operator replacing the incorrect drive in a faulty RAID, and disabling the system (even temporarily) in the process.[62]

An array can be overwhelmed by catastrophic failure that exceeds its recovery capacity and, of course, the entire array is at risk of physical damage by fire, natural disaster, and human forces, while backups can be stored off site. An array is also vulnerable to controller failure because it is not always possible to migrate it to a new, different controller without data loss.[63]

Weaknesses

Correlated failures

In practice, the drives are often the same age (with similar wear) and subject to the same environment. Since many drive failures are due to mechanical issues (which are more likely on older drives), this violates the assumptions of independent, identical rate of failure amongst drives; failures are in fact statistically correlated.[11] In practice, the chances for a second failure before the first has been recovered (causing data loss) are higher than the chances for random failures. In a study of about 100,000 drives, the probability of two drives in the same cluster failing within one hour was four times larger than predicted by the exponential statistical distribution—which characterizes processes in which events occur continuously and independently at a constant average rate. The probability of two failures in the same 10-hour period was twice as large as predicted by an exponential distribution.[64]

Unrecoverable read errors during rebuild

Unrecoverable read errors (URE) present as sector read failures, also known as latent sector errors (LSE). The associated media assessment measure, unrecoverable bit error (UBE) rate, is typically specified at one bit in 1015 for enterprise-class drives (SCSI, FC or SAS), and one bit in 1014 for desktop-class drives (IDE/ATA/PATA or SATA). Increasing drive capacities and large RAID 5 instances have led to an increasing inability to successfully rebuild a RAID set after a drive failure and occurrence of an unrecoverable sector on the remaining drives.[11][65] When rebuilding, parity-based schemes such as RAID 5 are particularly prone to the effects of UREs as they affect not only the sector where they occur, but also reconstructed blocks using that sector for parity computation. Thus, an URE during a RAID 5 rebuild typically leads to a complete rebuild failure.[66]

Double-protection parity-based schemes, such as RAID 6, attempt to address this issue by providing redundancy that allows double-drive failures; as a downside, such schemes suffer from elevated write penalty. Schemes that duplicate (mirror) data in a drive-to-drive manner, such as RAID 1 and RAID 10, have a lower risk from UREs than those using parity computation or mirroring between striped sets.[25][67] Data scrubbing, as a background process, can be used to detect and recover from UREs, effectively reducing the risk of them happening during RAID rebuilds and causing double-drive failures. The recovery of UREs involves remapping of affected underlying disk sectors, utilizing the drive's sector remapping pool; in case of UREs detected during background scrubbing, data redundancy provided by a fully operational RAID set allows the missing data to be reconstructed and rewritten to a remapped sector.[68][69]

Increasing rebuild time and failure probability

Drive capacity has grown at a much faster rate than transfer speed, and error rates have only fallen a little in comparison. Therefore, larger capacity drives may take hours, if not days, to rebuild. The rebuild time is also limited if the entire array is still in operation at reduced capacity.[70] Given an array with only one drive of redundancy (RAIDs 3, 4, and 5), a second failure would cause complete failure of the array. Even though individual drives' mean time between failure (MTBF) have increased over time, this increase has not kept pace with the increased storage capacity of the drives. The time to rebuild the array after a single drive failure, as well as the chance of a second failure during a rebuild, have increased over time.[22]

Some commentators have declared that RAID 6 is only a "band aid" in this respect, because it only kicks the problem a little further down the road.[22] However, according to a 2006 NetApp study of Berriman et al., the chance of failure decreases by a factor of about 3,800 (relative to RAID 5) for a proper implementation of RAID 6, even when using commodity drives.[71] Nevertheless, if the currently observed technology trends remain unchanged, in 2019 a RAID 6 array will have the same chance of failure as its RAID 5 counterpart had in 2010.[65][71]

Mirroring schemes such as RAID 10 have a bounded recovery time as they require the copy of a single failed drive, compared with parity schemes such as RAID 6, which require the copy of all blocks of the drives in an array set. Triple parity schemes, or triple mirroring, have been suggested as one approach to improve resilience to an additional drive failure during this large rebuild time.[71]

Atomicity: including parity inconsistency due to system crashes

A system crash or other interruption of a write operation can result in states where the parity is inconsistent with the data due to non-atomicity of the write process, such that the parity cannot be used for recovery in the case of a disk failure (the so-called RAID 5 write hole).[11] The RAID write hole is a known data corruption issue in older and low-end RAIDs, caused by interrupted destaging of writes to disk.[72]

This is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple" during the early days of relational database commercialization.[73]

Write-cache reliability

A concern about write-cache reliability exists, specifically regarding devices equipped with a write-back cache—a caching system that reports the data as written as soon as it is written to cache, as opposed to the non-volatile medium.[74]

See also

References

  1. ^ Arpaci-Dusseau, Remzi H.; Arpaci-Dusseau, Andrea C. (2014), Operating Systems: Three Easy Pieces [Chapter: RAID] (PDF), Arpaci-Dusseau Books
  2. ^ Patterson, David (1988). A Case for Redundant Arrays of Inexpensive Disks (RAID) (PDF). SIGMOD Conferences. Retrieved 2006-12-31. {{cite conference}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  3. ^ Randy H. Katz (October 2010). "RAID: A Personal Recollection of How Storage Became a System" (PDF). eecs.umich.edu. IEEE Computer Society. Retrieved 2015-01-18. We were not the first to think of the idea of replacing what Patterson described as a slow large expensive disk (SLED) with an array of inexpensive disks. For example, the concept of disk mirroring, pioneered by Tandem, was well known, and some storage products had already been constructed around arrays of small disks.
  4. ^ "HSC50/70 Hardware Technical Manual" (PDF). DEC. July 1986. pp. 29, 32. Retrieved 2014-01-03.
  5. ^ US patent 4899342, David Potter et. al., "Method and Apparatus for Operating Multi-Unit Array of Memories", issued 1990-02-06  See also The Connection Machine (1988)
  6. ^ "IBM 7030 Data Processing System: Reference Manual" (PDF). bitsavers.trailing-edge.com. IBM. 1960. p. 157. Retrieved 2015-01-17. Since a large number of bits are handled in parallel, it is practical to use error checking and correction (ECC) bits, and each 39 bit byte is composed of 32 data bits and seven ECC bits. The ECC bits accompany all data transferred to or from the high-speed disks, and, on reading, are used to correct a single bit error in a byte and detect double and most multiple errors in a byte.
  7. ^ "IBM Stretch (aka IBM 7030 Data Processing System)". brouhaha.com. 2009-06-18. Retrieved 2015-01-17. A typical IBM 7030 Data Processing System might have been comprised of the following units: [...] IBM 353 Disk Storage Unit – similar to IBM 1301 Disk File, but much faster. 2,097,152 (2^21) 72-bit words (64 data bits and 8 ECC bits), 125,000 words per second
  8. ^ US patent 4092732, Norman Ken Ouchi, "System for Recovering Data Stored in Failed Memory Unit", issued 1978-05-30 
  9. ^ US patent 4761785, Brian E. Clark, et al., "Parity Spreading to Enhance Storage Access", issued 1988-08-02 
  10. ^ "Originally referred to as Redundant Array of Inexpensive Disks, the concept of RAID was first developed in the late 1980s by Patterson, Gibson, and Katz of the University of California at Berkeley. (The RAID Advisory Board has since substituted the term Inexpensive with Independent.)" Storage Area Network Fundamentals; Meeta Gupta; Cisco Press; ISBN 978-1-58705-065-7; Appendix A.
  11. ^ a b c d e f g h i j Chen, Peter; Lee, Edward; Gibson, Garth; Katz, Randy; Patterson, David (1994). "RAID: High-Performance, Reliable Secondary Storage". ACM Computing Surveys. 26: 145–185.
  12. ^ Donald, L. (2003). "MCSA/MCSE 2006 JumpStart Computer and Network Basics" (2nd ed.). Glasgow: SYBEX. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |lastauthoramp= ignored (|name-list-style= suggested) (help)
  13. ^ Howe, Denis (ed.). Redundant Arrays of Independent Disks from FOLDOC. Imperial College Department of Computing. Retrieved 2011-11-10. {{cite book}}: |work= ignored (help); External link in |publisher= (help)
  14. ^ Dawkins, Bill and Jones, Arnold. "Common RAID Disk Data Format Specification" [Storage Networking Industry Association] Colorado Springs, 28 July 2006. Retrieved on 22 February 2011.
  15. ^ "Adaptec Hybrid RAID Solutions" (PDF). Adaptec.com. Adaptec. 2012. Retrieved 2013-09-07.
  16. ^ "Common RAID Disk Drive Format (DDF) standard". SNIA.org. SNIA. Retrieved 2012-08-26.
  17. ^ "SNIA Dictionary". SNIA.org. SNIA. Retrieved 2010-08-24.
  18. ^ Andrew S. Tanenbaum. Structured Computer Organization 6th ed. p. 95.
  19. ^ Hennessy, John; Patterson, David (2006). Computer Architecture: A Quantitative Approach, 4th ed. p. 362. ISBN 978-0123704900.
  20. ^ "FreeBSD Handbook, Chapter 20.5 GEOM: Modular Disk Transformation Framework". Retrieved 2012-12-20.
  21. ^ White, Jay; Lueth, Chris (May 2010). "RAID-DP:NetApp Implementation of Double Parity RAID for Data Protection. NetApp Technical Report TR-3298". Retrieved 2013-03-02.
  22. ^ a b c Newman, Henry (2009-09-17). "RAID's Days May Be Numbered". EnterpriseStorageForum. Retrieved 2010-09-07.
  23. ^ Peltoniemi, Mikko (2012-08-07). "New RAID level recommendations from Dell". Retrieved 2012-12-01.
  24. ^ "Why RAID 6 stops working in 2019". ZDNet. 22 February 2010.
  25. ^ a b Scott Lowe (2009-11-16). "How to protect yourself from RAID-related Unrecoverable Read Errors (UREs). Techrepublic". Retrieved 2012-12-01.
  26. ^ Vijayan, S. (1995). "Dual-Crosshatch Disk Array: A Highly Reliable Hybrid-RAID Architecture". Proceedings of the 1995 International Conference on Parallel Processing: Volume 1. CRC Press. pp. I–146ff. ISBN 0-8493-2615-X. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  27. ^ "What is RAID (Redundant Array of Inexpensive Disks)?". StoneFly.com. Retrieved 2014-11-20.
  28. ^ a b Jeffrey B. Layton: "Intro to Nested-RAID: RAID-01 and RAID-10", Linux Magazine, January 6, 2011
  29. ^ "Performance, Tools & General Bone-Headed Questions". tldp.org. Retrieved 2013-12-25.
  30. ^ "Main Page - Linux-raid". osdl.org. 2010-08-20. Retrieved 2010-08-24.
  31. ^ "Hdfs Raid". Hadoopblog.blogspot.com. 2009-08-28. Retrieved 2010-08-24.
  32. ^ "RAID over File System". Retrieved 2014-07-22.
  33. ^ "Creating and Destroying ZFS Storage Pools - Oracle Solaris ZFS Administration Guide". Oracle Corporation. 2012-04-01. Retrieved 2014-07-27.
  34. ^ "20.2. The Z File System (ZFS)". freebsd.org. Retrieved 2014-07-27.
  35. ^ "Double Parity RAID-Z (raidz2) (Solaris ZFS Administration Guide)". Oracle Corporation. Retrieved 2014-07-27.
  36. ^ "Triple Parity RAIDZ (raidz3) (Solaris ZFS Administration Guide)". Oracle Corporation. Retrieved 2014-07-27.
  37. ^ Deenadhayalan, Veera (2011). "General Parallel File System (GPFS) Native RAID" (PDF). UseNix.org. IBM. Retrieved 2014-09-28.
  38. ^ "Btrfs Wiki: Feature List". 2012-11-07. Retrieved 2012-11-16.
  39. ^ "Btrfs Wiki: Changelog". 2012-10-01. Retrieved 2012-11-14.
  40. ^ "Mac OS X: How to combine RAID sets in Disk Utility". Retrieved 2010-01-04.
  41. ^ "Apple Mac OS X Server File Systems". Retrieved 2008-04-23.
  42. ^ "FreeBSD System Manager's Manual page for GEOM(8)". Retrieved 2009-03-19.
  43. ^ "freebsd-geom mailing list - new class / geom_raid5". Retrieved 2009-03-19.
  44. ^ "FreeBSD Kernel Interfaces Manual for CCD(4)". Retrieved 2009-03-19.
  45. ^ "The Software-RAID HowTo". Retrieved 2008-11-10.
  46. ^ "mdadm(8) - Linux man page". Linux.Die.net. Retrieved 2014-11-20.
  47. ^ "Using Windows XP to Make RAID 5 Happen". Tomshardware.com. Retrieved 2010-08-24.
  48. ^ Sinofsky, Steven. "Virtualizing storage for scale, resiliency, and efficiency". Microsoft.
  49. ^ Metzger, Perry (1999-05-12). "NetBSD 1.4 Release Announcement". NetBSD.org. The NetBSD Foundation. Retrieved 2013-01-30.
  50. ^ "FreeBSD Handbook". Chapter 19 GEOM: Modular Disk Transformation Framework. Retrieved 2009-03-19.
  51. ^ "SATA RAID FAQ". Ata.wiki.kernel.org. 2011-04-08. Retrieved 2012-08-26.
  52. ^ Red Hat Enterprise Linux - Storage Administrator Guide - RAID Types
  53. ^ Charlie Russel; Sharon Crawford; Andrew Edney (2011). Working with Windows Small Business Server 2011 Essentials. O'Reilly Media, Inc. p. 90. ISBN 978-0-7356-5670-3.
  54. ^ Warren Block. "19.5. Software RAID Devices". freebsd.org. Retrieved 2014-07-27.
  55. ^ Ronald L. Krutz; James Conley (2007). Wiley Pathways Network Security Fundamentals. John Wiley & Sons. p. 422. ISBN 978-0-470-10192-6.
  56. ^ a b c Hardware RAID vs. Software RAID: Which Implementation is Best for my Application? Adaptec Whitepaper
  57. ^ Gregory Smith (2010). PostgreSQL 9.0: High Performance. Packt Publishing Ltd. p. 31. ISBN 978-1-84951-031-8.
  58. ^ Ulf Troppens, Wolfgang Mueller-Friedt, Rainer Erkens, Rainer Wolafka, Nils Haustein. Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS, ISCSI, InfiniBand and FCoE. John Wiley and Sons, 2009. p.39
  59. ^ Dell Computers, Background Patrol Read for Dell PowerEdge RAID Controllers, By Drew Habas and John Sieber, Reprinted from Dell Power Solutions, February 2006 http://www.dell.com/downloads/global/power/ps1q06-20050212-Habas.pdf
  60. ^ a b c "Error Recovery Control with Smartmontools". Retrieved 2011. {{cite web}}: Check date values in: |accessdate= (help)
  61. ^ These studies are: Gray, J (1990), Murphy and Gent (1995), Kuhn (1997), and Enriquez P. (2003).
  62. ^ Patterson, D., Hennessy, J. (2009), 574.
  63. ^ "The RAID Migration Adventure". Retrieved 2010-03-10.
  64. ^ Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? Bianca Schroeder and Garth A. Gibson
  65. ^ a b Harris, Robin (2010-02-27). "Does RAID 6 stop working in 2019?". StorageMojo.com. TechnoQWAN. Retrieved 2013-12-17.
  66. ^ J.L. Hafner, V. Deenadhaylan, K. Rao, and J.A. Tomlin. "Matrix methods for lost data reconstruction in erasure codes. USENIX Conference on File and Storage Technologies, p15-30, Dec. 13-16, 2005.
  67. ^ Art S. Kagel (March 2, 2011). "RAID 5 versus RAID 10 (or even RAID 3, or RAID 4)". miracleas.com. Retrieved October 30, 2014.
  68. ^ M.Baker, M.Shah, D.S.H. Rosenthal, M.Roussopoulos, P.Maniatis, T.Giuli, and P.Bungale. 'A fresh look at the reliability of long-term digital storage." EuroSys2006, Apr. 2006.
  69. ^ "L.N. Bairavasundaram, GR Goodson, S. Pasupathy, J.Schindler. "An analysis of latent sector errors in disk drives". Proceedings of SIGMETRICS'07, June 12-16,2007" (PDF).
  70. ^ Patterson, D., Hennessy, J. (2009). Computer Organization and Design. New York: Morgan Kaufmann Publishers. pp 604-605.
  71. ^ a b c Leventhal, Adam (2009-12-01). "Triple-Parity RAID and Beyond. ACM Queue, Association of Computing Machinery". Retrieved 2012-11-30.
  72. ^ ""Write hole" in RAID5, RAID6, RAID1, and other arrays". ZAR team. Retrieved 15 February 2012.
  73. ^ Jim Gray: The Transaction Concept: Virtues and Limitations (Invited Paper) VLDB 1981: 144-154
  74. ^ "Definition of write-back cache at SNIA dictionary".