Jump to content

Oracle ZFS

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 99.236.14.188 (talk) at 13:58, 19 December 2013 (Limitations: Change external link to a reference.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

ZFS
Developer(s)Oracle Corporation
Full nameZFS
IntroducedNovember 2005 with OpenSolaris
Structures
Directory contentsExtensible hash table
Limits
Max volume size256 zebibytes (278 bytes)
Max file size16 exbibytes (264 bytes)
Max no. of files248
Max filename length255 bytes
Features
ForksYes (called "extended attributes", but they are full-fledged streams)
AttributesPOSIX
File system
permissions
POSIX, NFSv4 ACLs
Transparent
compression
Yes
Transparent
encryption
Yes[1]
Data deduplicationYes
Other
Supported
operating systems
Solaris, OpenSolaris, illumos distributions, OpenIndiana, FreeBSD, Mac OS X Server 10.5, NetBSD, Linux via 3rd party kernel module[2] or ZFS-FUSE

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

ZFS is implemented as open-source software, licensed under the Common Development and Distribution License (CDDL). The ZFS name is registered as a trademark of Oracle Corporation.[3][4]

OpenZFS is an umbrella project, serving as the open source successor to the ZFS project.[5][6]

History

ZFS was designed and implemented by a team at Sun led by Jeff Bonwick and Matthew Ahrens. It was announced on September 14, 2004,[7] but development started in 2001.[8] Source code for ZFS was integrated into the main trunk of Solaris development on October 31, 2005[9] and released as part of build 27 of OpenSolaris on November 16, 2005. Sun announced that ZFS was included in the 6/06 update to Solaris 10 in June 2006, one year after the opening of the OpenSolaris community.[10]

The name originally stood for "Zettabyte File System",[11] but today does not stand for anything.[12] A ZFS file system can store up to 256 zebibytes (ZiB).

OS X

The first indication of Apple Inc.'s interest in ZFS was an April 2006 post on the opensolaris.org zfs-discuss mailing list where an Apple employee mentioned being interested in porting ZFS to their OS X operating system.[13] In the release version of Mac OS X 10.5, ZFS was available in read-only mode from the command line, which lacks the possibility to create zpools or write to them.[14] Before the 10.5 release, Apple released the "ZFS Beta Seed v1.1", which allowed read-write access and the creation of zpools,;[15] however, the installer for the "ZFS Beta Seed v1.1" has been reported to only work on version 10.5.0, and has not been updated for version 10.5.1 and above.[16] In August 2007, Apple opened a ZFS project on their Mac OS Forge web site. On that site, Apple provided the source code and binaries of their port of ZFS which includes read-write access, but there was no installer available[17] until a third-party developer created one.[18] In October 2009, Apple announced a shutdown of the ZFS project on Mac OS Forge. That is to say that their own hosting and involvement in ZFS was summarily discontinued. No explanation was given, just the following statement: "The ZFS project has been discontinued. The mailing list and repository will also be removed shortly." Apple would eventually release the legally required, CDDL-derived, portion of the source code of their final public beta of ZFS, code named "10a286". Complete ZFS support was once advertised as a feature of Snow Leopard Server (Mac OS X Server 10.6).[19] However, by the time the operating system was released, all references to this feature had been silently removed from its features page.[20] Apple has not commented regarding the omission.

Apple's "10a286" source code release, and versions of the previously released source and binaries, have been preserved and new development has been adopted by a group of enthusiasts.[21][22] The MacZFS project acted quickly to mirror the public archives of Apple's project before the materials would have disappeared from the internet, and then to resume its development elsewhere. The MacZFS community has curated and matured the project, supporting ZFS for all Mac OS releases since 10.5. The project has an active mailing list. As of July 2012, MacZFS implements zpool version 8 and ZFS version 2, from the October 2008 release of Solaris. Additional historical information and commentary can be found on the MacZFS web site and Frequently Asked Questions page.

Ten's Complement, LLC was founded in 2010.[23][24] Principal Software Engineer/Developer/Architect Don Brady[25] was formerly a Senior Software Engineer at Apple (twenty years with the company; technical lead on the original HFS Plus team; co-founder of Apple's original skunkworks ZFS project;[26] Primary Engineer/Architect for ZFS on Mac OS X from August 2006 until August 2009). In January 2012, Ten's Complement released ZEVO Silver Edition, with a GUI for basic features and command line use of some other features. Other product plans for 2012 included Gold and Platinum editions[27] and a more advanced Developer edition. In June, around the time of WWDC 2012, Ten's Complement ceased selling ZEVO and published a "making changes" banner at the products page.

In July 2012, Ten's Complement announced the transfer of ZEVO to GreenBytes, Inc.[28] GreenBytes announced the addition of Don Brady to their development team[29] and set a 2012-09-15 launch date for the Community Edition of ZEVO.[30] Customers who purchased the Silver edition before the transfer were encouraged to install the freely available update to 1.0.3 (2012-03-21), which will load in Mountain Lion;[31] and to continue with support in the Ten's Complement area.[32] ZEVO Community Edition 1.1 (Build 2012.09.14) was released on schedule, CDDL-related source code for that build was shared on github[33] and announced[34] around five days later. In June 2013 it was reported that GreenBytes sought a new home for ZEVO.[35] On 10th September 2013 GreenBytes announced[36] that the product will be updated for compatibility with forthcoming OS X 10.9, Mavericks.

The 17th September 2013 launch of OpenZFS included ZFS-OSX, which will become a new version of MacZFS, as the distribution for Darwin.[37]

Release history

With ZFS in Oracle Solaris: as new features are introduced, the version numbers of the pool and file system are incremented to designate the format and features available. Features that are available in specific file system versions require a specific pool version.[38][39]

Distributed development of OpenZFS involves feature flags[40] and pool version 1000, an unchanging number that is expected to never conflict with version numbers given by Oracle. Legacy version numbers still exist for pool versions 1–28, implied by the version 1000.[41] Illumos uses pool version 5000 for this purpose.[42][43] Future on-disk format changes are enabled / disabled independently via feature flags.

Legend:
Old release
Latest FOSS stable release
Latest Proprietary stable release
ZFS Filesystem Version Number Release date Significant changes
1 OpenSolaris Nevada[44] build 36 First release
2 OpenSolaris Nevada b69 Enhanced directory entries. In particular, directory entries now store the object type. For example, file, directory, named pipe, and so on, in addition to the object number.
3 OpenSolaris Nevada b77 Support for sharing ZFS file systems over SMB. Case insensitivity support. System attribute support. Integrated anti-virus support.
4 OpenSolaris Nevada b114 Properties: userquota, groupquota, userused and groupused
5 OpenSolaris Nevada b137 System attributes; symlinks now their own object type
6 Solaris 11.1 Multilevel file system support
ZFS Pool Version Number Release date Significant changes
1 OpenSolaris Nevada[44] b36 First release
2 OpenSolaris Nevada b38 Ditto Blocks
3 OpenSolaris Nevada b42 Hot spares, double-parity RAID-Z (raidz2), improved RAID-Z accounting
4 OpenSolaris Nevada b62 zpool history
5 OpenSolaris Nevada b62 gzip compression for ZFS datasets
6 OpenSolaris Nevada b62 "bootfs" pool property
7 OpenSolaris Nevada b68 ZIL: adds the capability to specify a separate Intent Log device or devices
8 OpenSolaris Nevada b69 ability to delegate zfs(1M) administrative tasks to ordinary users
9 OpenSolaris Nevada b77 CIFS server support, dataset quotas
10 OpenSolaris Nevada b77 Devices can be added to a storage pool as "cache devices"
11 OpenSolaris Nevada b94 Improved zpool scrub / resilver performance
12 OpenSolaris Nevada b96 Snapshot properties
13 OpenSolaris Nevada b98 Properties: usedbysnapshots, usedbychildren, usedbyrefreservation, and usedbydataset
14 OpenSolaris Nevada b103 passthrough-x aclinherit property support
15 OpenSolaris Nevada b114 Properties: userquota, groupquota, usuerused and groupused; also required FS v4
16 OpenSolaris Nevada b116 STMF property support
17 OpenSolaris Nevada b120 triple-parity RAID-Z
18 OpenSolaris Nevada b121 ZFS snapshot holds
19 OpenSolaris Nevada b125 ZFS log device removal
20 OpenSolaris Nevada b128 zle compression algorithm that is needed to support the ZFS deduplication properties in ZFS pool version 21, which were released concurrently
21 OpenSolaris Nevada b128 Deduplication
22 OpenSolaris Nevada b128 zfs receive properties
23 OpenSolaris Nevada b135 slim ZIL
24 OpenSolaris Nevada b137 System attributes. Symlinks now their own object type. Also requires FS v5.
25 OpenSolaris Nevada b140 Improved pool scrubbing and resilvering statistics
26 OpenSolaris Nevada b141 Improved snapshot deletion performance
27 OpenSolaris Nevada b145 Improved snapshot creation performance (particularly recursive snapshots)
28 OpenSolaris Nevada b147 Multiple virtual device replacements
29 Solaris Nevada b148 RAID-Z/mirror hybrid allocator
30 Solaris Nevada b149 ZFS encryption
31 Solaris Nevada b150 Improved 'zfs list' performance
32 Solaris Nevada b151 One MB block support
33 Solaris Nevada b163 Improved share support
34 Solaris 11.1 (0.5.11-0.175.1.0.0.24.2) Sharing with inheritance

Note: The Solaris version under development by Sun since the release of Solaris 10 in 2005 was codenamed 'Nevada', and was derived from what was the OpenSolaris codebase. 'Solaris Nevada' is the codename for the next-generation Solaris OS to eventually succeed Solaris 10 and this new code was then pulled successively into new OpenSolaris 'Nevada' snapshot builds.[44] OpenSolaris is now discontinued and OpenIndiana forked from it.[45][46] A final build (b134) of OpenSolaris was published by Oracle (2010-Nov-12) as an upgrade path to Solaris 11 Express.

Features

Data integrity

One major feature that distinguishes ZFS from other file systems is that ZFS is designed with a focus on data integrity. That is, it is designed to protect the user's data on disk against silent data corruption caused by bit rot, current spikes, bugs in disk firmware, phantom writes (the write is dropped on the floor), misdirected reads/writes (the disk accesses the wrong block), DMA parity errors between the array and server memory or from the driver (since the checksum validates data inside the array), driver errors (data winds up in the wrong buffer inside the kernel), accidental overwrites (such as swapping to a live file system), etc.

Data integrity is a high priority in ZFS because recent research shows that none of the currently widespread file systems — such as UFS, Ext,[47] XFS, JFS, or NTFS — nor hardware RAID provide sufficient protection against such problems.[48][49][50][51][52] It is well known that hardware RAID has some issues with data integrity. Initial research indicates that ZFS clearly protects data better than earlier efforts.[53][54] While it is also faster than UFS or DragonFly BSD's HAMMER file system, it can be seen as the successor to UFS.[55][56]

Error rates in hard disks

A modern hard disk devotes a significant portion of its capacity to the error detection data. For example, a typical 1 TB hard disk with 512-byte sectors provides additional capacity of about 93 GB for the ECC data.[57] Many errors occur during normal usage, but are corrected by the disk's firmware, and thus are not visible to the host software. Only a tiny fraction of the detected errors ends up as not correctable.

For example, specification for an enterprise SAS disk (a model from 2013) estimates this fraction to be one uncorrected error in every 1016 bits,[58] and another SAS enterprise disk from 2013 specifies similar error rates.[59] Another modern (as of 2013) enterprise SATA disk specifies an error rate of less than 10 non-recoverable read errors in every 1016 bits.[60] An enterprise disk with a Fibre Channel interface, which uses 520 byte sectors to support the Data Integrity Field standard to combat data corruption, specifies similar error rates in 2005.[61]

Silent data corruption

The worst type of errors are those that go unnoticed, and are not even detected by the disk firmware or the host operating system. This is known as "silent corruption". A real life study of 1.5 million HDDs in the NetApp database found that, on average, 1 in 90 SATA drives will have silent corruption which is not caught by hardware RAID verification process; for a RAID-5 system, that works out to one undetected error for every 67 TB of data read.[62][63] However, there are many other external error sources other than the disk itself. For instance, the disk cable might be slightly loose, the power supply might be unreliable,[64] external vibrations such as a loud sound,[65] the Fibre Channel switch might be faulty,[66] cosmic radiation and many other types of soft errors, etc. In 39,000 storage systems that were analyzed, firmware bugs accounted for 5–10% of storage failures.[67] All in all, the error rates as observed by a CERN study on silent corruption are far higher than one in every 1016 bits.[68] Webshop Amazon.com confirms these high data corruption rates.[69]

The main problem is that hard disks have become a lot bigger, but their error rates remain unchanged. The data corruption rate has always been roughly constant in time, meaning that modern disks are not much safer than old disks. In old disks the probability of data corruption was very small because they stored tiny amounts of data. In modern disks the probability is much larger because they store much more data, whilst not being safer. That way, silent data corruption has not been a serious concern while storage devices remained relatively small and slow. Hence, the users of small disks very rarely faced silent corruption, so the data corruption was not considered a problem that required a solution. But in modern times and with the advent of larger drives and very fast RAID setups, users are capable of transferring 1016 bits in a reasonably short time, thus easily reaching the data corruption thresholds.[70]

In particular, ZFS creator Jeff Bonwick stated that the fast database at Greenplum — a database software company located in San Mateo, California specializing in enterprise data cloud products for large-scale data warehousing and analytics — faces silent corruption every 15 minutes,[71] which is one of the reasons that Greenplum now base their fast database offering on ZFS. These large and fast RAID setups require new file systems that focus on data integrity. This is one of the design goals of ZFS, as explained by Jeff Bonwick.[71]

ZFS data integrity

For ZFS, data integrity is achieved by using a (Fletcher-based) checksum or a (SHA-256) hash throughout the file system tree.[72] Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system's data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree.[72] In-flight data corruption or phantom reads/writes (the data written/read checksums correctly but is actually wrong) are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so the entire pool self-validates.[73]

When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it "should" be. If the checksums match, the data are passed up the programming stack to the process that asked for it. If the values do not match, then ZFS can heal the data if the storage pool has redundancy via ZFS mirroring or RAID.[74] If the storage pool consists of a single disk, it is possible to provide such redundancy by specifying "copies=2" (or "copies=3"), which means that data will be stored twice (thrice) on the disk, effectively halving (or, for "copies=3", reducing to one third) the storage capacity of the disk.[75] If redundancy exists, ZFS will fetch a copy of the data (or recreate it via a RAID recovery mechanism), and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update the faulty copy with known-good data so that redundancy can be restored.

ZFS and hardware RAID

If the disks are connected to a RAID controller, it is most efficient to configure it in JBOD mode (i.e. turn off RAID functionality). If there is a hardware RAID card used, ZFS always detects all data corruption but cannot always repair data corruption because the hardware RAID card will interfere. Therefore the recommendation is to not use a hardware RAID card, or to flash a hardware RAID card into JBOD/IT mode. For ZFS to be able to guarantee data integrity, it needs to either have access to a RAID set (so all data is copied to at least two disks), or if one single disk is used, ZFS needs to enable redundancy (copies) which duplicates the data on the same logical drive. Using ZFS copies is a good feature to use on notebooks and desktop computers, since the disks are large and it at least provides some limited redundancy with just a single drive.

There are several reasons as to why it is better to rely solely on ZFS by using several independent disks and RAID-Z or mirroring. For example, a ZFS volume with RAID-0 volumes even with "copies=2" can be failure prone, as the RAID-0 volumes will fail in the event of any disk failures. Thus, storing data on RAID-0 with a ZFS volume and "copies=2" enabled doesn't increase data reliability, instead, it reduces it.

When using hardware RAID, the controller usually adds controller-dependent data to the drives which prevents software RAID from accessing the user data. While it is possible to read the data with a compatible hardware RAID controller, this inconveniences consumers as a compatible controller usually isn't readily available. Using the JBOD/RAID-Z combination, any disk controller can be used to resume operation after a controller failure.

Note that hardware RAID configured as JBOD may still detach disks that do not respond in time (like green hard drives), and as such, may require TLER/CCTL/ERC-enabled disks to prevent drive dropouts.[76]

Software RAID using ZFS

ZFS offers software RAID through its RAID-Z and mirroring organization schemes. RAID-Z is invulnerable to the write hole error, which other types of RAIDs suffer from. There are three different RAID modes: RAID-Z1 is similar to RAID 5 (allows one disk to fail), RAID-Z2 is similar to RAID 6 (allows two disks to fail) and RAID-Z3 (allows three disks to fail). The need for RAID-Z3 arose recently because RAID configurations with future disks (say 6–10 TB) may take a long time to repair, the worst case being weeks. During those weeks, the rest of the disks in the RAID are stressed more because of the additional intensive repair process and might subsequently fail, too. By using RAID-Z3, the risk involved with disk replacement is reduced.

Mirroring, the other ZFS RAID option, is essentially the same as RAID 1. The difference is that ZFS allows any number of discs in the mirror, for instance, you could create a mirror consisting of three disks, or even eleven disks.[77]

Resilvering and scrub

ZFS has no fsck repair tool equivalent, common on Unix filesystems, which does file system validation and file system repair.[78] Instead, ZFS has a repair tool called "scrub" which examines and repairs silent corruption and other problems. Some differences are:

  • fsck must be run on an offline filesystem, which means the filesystem must be unmounted and is not usable while being repaired.
  • scrub does not need the ZFS filesystem to be taken offline; scrub is designed to be used on a mounted, live filesystem.
  • fsck usually only checks metadata (such as the journal log) but never checks the data itself. This means, after an fsck, the data might still be corrupt.
  • scrub checks everything, including metadata and the data. The effect can be observed by comparing fsck to scrub times — sometimes a fsck on a large RAID completes in a few minutes, which means only the metadata was checked. Traversing all metadata and data on a large RAID takes many hours, which is exactly what scrub does.

The official recommendation from Sun/Oracle is to scrub once every month with Enterprise disks, because they have much higher reliability than cheap commodity disks. If using cheap commodity disks, scrub every week.[79][80]

Snapshots

However, no system is immune to bugs or hardware that does not follow standards.

"...For example: FLUSH CACHE should only return, when the cache is flushed. But there are dirt cheap converter chips that sends the FLUSH CACHE to disk, but returns a successful FLUSH CACHE in the same moment back to the OS (of course without having NVRAM on disk or in a controller as this would allow to ignore CACHE FLUSH). Or interface converters reordering commands in really funny ways. By such reordering it may happen, that the uberblock is written to disk, before the rest of the structure has been written to disk..."[81]

Thus, there are known cases where ZFS has had problems. As an extra safety measure, it is therefore possible to go back in time by using the "-F" flag with the "zpool" command. ZFS uses copy on write, which means old data is not altered. Whenever data is edited and updated, the old data is always left intact and the edits are stored at a new (unused) location on the disk. This means that every change can be traced back in time, allowing the user to discard the latest change which caused the problem and revert to an earlier, functional state. This is also how ZFS Snapshots works.

Storage pools

Unlike traditional file systems which reside on single devices and thus require a volume manager to use more than one device, ZFS filesystems are built on top of virtual storage pools called zpools. A zpool is constructed of virtual devices (vdevs), which are themselves constructed of block devices: files, hard drive partitions, or entire drives, with the latter being the recommended usage.[82] Block devices within a vdev may be configured in different ways, depending on needs and space available: non-redundantly (similar to RAID 0), as a mirror (RAID 1) of two or more devices, as a RAID-Z (similar to RAID-5) group of three or more devices, or as a RAID-Z2 (similar to RAID-6) group of four or more devices.[83] In July 2009, triple-parity RAID-Z3 was added to OpenSolaris.[84][85] RAID-Z is a data-protection technology featured by ZFS in order to reduce the block overhead in mirroring.[86]

Thus, a zpool (ZFS storage pool) is vaguely similar to a computer's RAM. The total RAM pool capacity depends on the number of RAM memory sticks and the size of each stick. Likewise, a zpool consists of one or more vdevs. Each vdev can be viewed as a group of hard disks (or partitions, or files, etc.). Each vdev should have redundancy, because if a vdev is lost, then the whole zpool is lost. Thus, each vdev should be configured as RAID-Z1, RAID-Z2, mirror, etc. It is not possible to change the number of drives in an existing vdev (Block Pointer Rewrite will allow this, and also allow defragmentation), but it is always possible to increase storage capacity by adding a new vdev to a zpool. It is possible to swap a drive to a larger drive and resilver (repair) the zpool. If this procedure is repeated for every disk in a vdev, then the zpool will grow in capacity when the last drive is resilvered. A vdev will have the same base capacity as the smallest drive in the group. For instance, a vdev consisting of three 500 GB and one 700 GB drive, will have a capacity of 4×500 GB.

In addition, pools can have hot spares to compensate for failing disks. When mirroring, block devices can be grouped according to physical chassis, so that the filesystem can continue in the case of the failure of an entire chassis.

Storage pool composition is not limited to similar devices, but can consist of ad-hoc, heterogeneous collections of devices, which ZFS seamlessly pools together, subsequently doling out space to diverse filesystems as needed. Arbitrary storage device types can be added to existing pools to expand their size at any time.[87]

The storage capacity of all vdevs is available to all of the file system instances in the zpool. A quota can be set to limit the amount of space a file system instance can occupy, and a reservation can be set to guarantee that space will be available to a file system instance.

ZFS cache: ARC (L1), L2ARC, ZIL

ZFS uses different layers of disk cache to speed up read and write operations. Ideally, all data should be stored in RAM, but that is too expensive. Therefore, data is automatically cached in a hierarchy to optimize performance vs cost.[88] Frequently accessed data is stored in RAM, and less frequently accessed data can be stored on slower media, such as SSD disks. Data that is not often accessed is not cached and left on the slow hard drives. If old data is suddenly read a lot, ZFS will automatically move it to SSD disks or to RAM.

The first level of disk cache is RAM, which uses a variant of the ARC algorithm. It is similar to a level 1 CPU cache. RAM will always be used for caching, thus this level is always present. There are claims that ZFS servers must have huge amounts of RAM, but that is not true. It is a misinterpretation of the desire to have large ARC disk caches. The ARC is very clever and efficient, which means disks will often not be touched at all, provided the ARC size is sufficiently large. In the worst case, if the RAM size is very small (say, 1 GB), there will hardly be any ARC at all; in this case, ZFS always needs to reach for the disks. This means read performance degrades to disk speed.

The second level of disk cache are SSD disks. This level is optional, and is easy to add or remove during live usage, as there is no need to shut down the zpool. There are two different caches; one cache for reads, and one for writes.

  • The read SSD cache is called L2ARC and is similar to a level 2 CPU cache. The L2ARC will also considerably speed up Deduplication if the entire Dedup table can be cached in L2ARC. It can take several hours to fully populate the L2ARC (before it has decided which data are "hot" and should be cached). If the L2ARC device is lost, all reads will go out to the disks which slows down performance, but nothing else will happen (no data will be lost).
  • The write SSD cache is called the Log Device, and it is used by the ZIL (ZFS Intent Log). ZIL basically turns synchronous writes into asynchronous writes, which helps e.g. NFS or databases.[89] All data is written to the ZIL like a journal log, but only read after a crash. Thus, the ZIL data is normally never read. Every once in a while, the ZIL will flush the data to the zpool; this is called Transaction Group Commit. In case there is no separate log device added to the zpool, a part of the zpool will automatically be used as ZIL, thus there is always a ZIL on every zpool. It is important that the log device use a disk with low latency. For superior performance, a disk consisting of battery backed up RAM such as the ZeusRAM should be used. Because the log device is written to often, an SSD disk will eventually be worn out, but a RAM disk will not. If the log device is lost, it is possible to lose the latest writes, therefore the log device should be mirrored. In earlier versions of ZFS, loss of the log device could result in loss of the entire zpool, therefore one should upgrade ZFS if planning to use a separate log device.

Capacity

ZFS is a 128-bit file system,[90] so it can address 1.84 × 1019 times more data than 64-bit systems such as Btrfs. The limitations of ZFS are designed to be so large that they should not be encountered in the foreseeable future.

Some theoretical limits in ZFS are:

  • 248: number of entries in any individual directory[91]
  • 16 exbibytes (264 bytes): maximum size of a single file
  • 16 exbibytes: maximum size of any attribute
  • 256 zebibytes (278 bytes): maximum size of any zpool
  • 256: number of attributes of a file (actually constrained to 248 for the number of files in a ZFS file system)
  • 264: number of devices in any zpool
  • 264: number of zpools in a system
  • 264: number of file systems in a zpool

Copy-on-write transactional model

ZFS uses a copy-on-write transactional object model. All block pointers within the filesystem contain a 256-bit checksum or 256-bit hash (currently a choice between Fletcher-2, Fletcher-4, or SHA-256)[92] of the target block, which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, then any metadata blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and ZIL (intent log) write cache is used when synchronous write semantics are required. The blocks are arranged in a tree, as are their checksums (see Merkle signature scheme).

Snapshots and clones

An advantage of copy-on-write is that, when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored. They are also space efficient, since any unchanged data is shared among the file system and its snapshots.

Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is an implementation of the Copy-on-write principle.

Sending and receiving snapshots

ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a stream representation of the file system's state. This stream can either describe complete contents of the file system at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g. for synchronizing offsite backups or high availability mirrors of a pool.

Dynamic striping

Dynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them; thus, all disks in a pool are used, which balances the write load across them.

Variable block sizes

ZFS uses variable-sized blocks of up to 1024 kibibyte. The currently available code allows the administrator to tune the maximum block size used, as certain workloads do not perform well with large blocks. If data compression (LZJB) is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations).

Lightweight filesystem creation

In ZFS, filesystem manipulation within a storage pool is easier than volume manipulation within a traditional filesystem; the time and effort required to create or expand a ZFS filesystem is closer to that of making a new directory than it is to volume manipulation in some other systems.

Cache management

ZFS also uses the Adaptive Replacement Cache (ARC), a new method for read cache management, instead of the traditional Solaris virtual memory page cache. For write caching, ZFS employs the ZFS Intent Log (ZIL). ZFS makes allowances for both of these methods to incorporate separate virtual devices to improve the total IOPS. For read operations it is the "cache" vdev and for write operations it is the "log" vdev.[93]

Adaptive endianness

Pools and their associated ZFS file systems can be moved between different platform architectures, including systems implementing different byte orders. The ZFS block pointer format stores filesystem metadata in an endian-adaptive way; individual metadata blocks are written with the native byte order of the system writing the block. When reading, if the stored endianness does not match the endianness of the system, the metadata is byte-swapped in memory.

This does not affect the stored data; as is usual in POSIX systems, files appear to applications as simple arrays of bytes, so applications creating and reading data remain responsible for doing so in a way independent of the underlying system's endianness.

Deduplication

Data deduplication capabilities were added to the ZFS source repository at the end of October 2009,[94] and relevant OpenSolaris ZFS development packages have been available since December 3, 2009 (build 128).

Effective use of deduplication may require large RAM capacity. Recommendations range from 1 GB for every extra 1 TB of storage[95] to 2 GB for every TB of storage.[96][97] Insufficient physical memory or lack of ZFS cache can result in virtual memory thrashing, which can either lower performance or result in complete memory starvation.[citation needed] Solid-state drives (SSDs) can be used to cache deduplication tables, thereby speeding up deduplication performance.[citation needed]

Other storage vendors use modified versions of ZFS to achieve very high compression ratios. Two examples in 2012 were GreenBytes[98] and Tegile.[99]

Encryption

With Oracle Solaris, the encryption capability in ZFS[100] is embedded into the I/O pipeline. During writes, a block may be compressed, encrypted, checksummed and then deduplicated, in that order. The policy for encryption is set at the dataset level when datasets (file systems or ZVOLs) are created. The wrapping keys provided by the user/administrator can be changed at any time without taking the file system offline. The default behaviour is for the wrapping key to be inherited by any child data sets. The data encryption keys are randomly generated at dataset creation time. Only descendant datasets (snapshots and clones) share data encryption keys.[101] A command to switch to a new data encryption key for the clone or at any time is provided — this does not re-encrypt already existing data, instead utilising an encrypted master-key mechanism.

Clustering and high availability

ZFS is not a clustered filesystem. However, clustered ZFS is available from third parties.

Additional capabilities

  • Explicit I/O priority with deadline scheduling.
  • Claimed globally optimal I/O sorting and aggregation.
  • Multiple independent prefetch streams with automatic length and stride detection.
  • Parallel, constant-time directory operations.
  • End-to-end checksumming, using a kind of "Data Integrity Field", allowing data corruption detection (and recovery if you have redundancy in the pool).
  • Transparent filesystem compression. Supports LZJB and gzip.[102]
  • Intelligent scrubbing and resilvering (resyncing).[103]
  • Load and space usage sharing among disks in the pool.[104]
  • Ditto blocks: Configurable data replication per filesystem, with zero, one or two extra copies requested per write for user data, and with that same base number of copies plus one or two for metadata (according to metadata importance).[105] If the pool has several devices, ZFS tries to replicate over different devices. Ditto blocks are primarily an additional protection against corrupted sectors, not against total disk failure.[106]
  • ZFS design (copy-on-write + superblocks) is safe when using disks with write cache enabled, if they honor the write barriers. This feature provides safety and a performance boost compared with some other filesystems.
  • On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, like UFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation
  • Per-user and per-group quotas support.[107]
  • Filesystem encryption since Solaris 11 Express[1]
  • Pools can be imported in read-only mode
  • It is possible to recover data by rolling back entire transactions at the time of importing the zpool.
  • Planned features:
    • The so-called Block Pointer Rewrite functionality is due to be added in the same time frame, paving the way for resizing pools, defragmentation, (re-)applying compression on filesystems and so on.[108]

Limitations

Note: Most of these limitations are due to RAID, not ZFS per se. However, they also apply to ZFS.

  • Capacity expansion is normally achieved by adding groups of disks as a top-level vdev: simple device, RAID-Z, RAID-Z2, RAID-Z3, or mirrored. Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to heal itself — the heal time will depend on the amount of stored information, not the disk size. The new free space will not be available until all the disks have been swapped.[citation needed]
  • As of Solaris 10 Update 9, it is not possible to reduce the number of top-level vdevs in a pool nor otherwise reduce pool capacity.[109] This functionality was said to be in development already in 2007.[110]
  • It is not possible to add a disk as a column to a RAID-Z, RAID-Z2, or RAID-Z3 vdev. This feature depends on the block pointer rewrite functionality due to be added soon. One can however create a new RAID-Z vdev and add it to the zpool.[111]
  • Vdevs cannot be nested, so a mirror or RAID-Z top-level vdev can only contain files or disks. Mirrors of mirrors (or other combinations) are not allowed.[citation needed]
  • Reconfiguring the number of devices in a top-level vdev requires copying data offline, destroying the pool, and recreating the pool with the new top-level vdev configuration, except for adding extra redundancy to an existing mirror, which can be done at any time or if all top level vdevs are mirrors with sufficient redundancy the zpool split[112] command can be used to remove a vdev from each top level vdev in the pool, creating a 2nd pool with identical data.
  • Resilver (repair) of a crashed disk in a ZFS raid takes a long time. This applies to all types of RAID, in one way or another. This means that future large disks, say 5 TB or 6 TB, can take several days to repair. This means that raidz1 (similar to RAID-5) should be avoided, because repairing a raid puts additional stress on the other disks which might cause them to crash, losing all data in the storage pool if configured as raidz1. Therefore, with large disks, one should use raidz2 (allow two disks to crash) or raidz3 (allow three disks to crash)[113]. It should be noted however, that ZFS RAID differs from conventional RAID solutions by only reconstructing live data and metadata when replacing a disk, not the entirety of the disk including blank and garbage blocks, which means that replacing a member disk on a ZFS pool that is only partially full will take proportionately less time compared to conventional RAID.[114]
  • IOPS performance of a ZFS storage pool can suffer if the ZFS raid is not appropriately configured. This applies to all types of RAID, in one way or another. If the zpool consists of only one group of disks configured as, say, eight disks in raidz2, then the write IOPS performance will be that of a single disk. However, read IOPS will be the sum of eight individual disks. This means, to get high write IOPS performance, the zpool should consist of several vdevs, because one vdev gives the write IOPS of a single disk. However, there are ways to mitigate this IOPS performance problem, for instance add SSDs as ZIL cache — which can boost IOPS into 100.000s.[115] In short, a zpool should consist of several groups of vdevs, each vdev consisting of 8–12 disks. It is not recommended to create a zpool with a single large vdev, say 20 disks, because write IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).

Platforms

Open-source collaboration

OpenZFS brings together developers from the illumos, Linux, FreeBSD and OS X platforms, and a wide range of companies.[5][6] High-level goals are:

  • to raise awareness of the quality, utility, and availability of open source implementations of ZFS
  • to encourage open communication about ongoing efforts to improve open source ZFS
  • to ensure consistent reliability, functionality, and performance of all distributions of ZFS.

Illumos, which derived from OpenSolaris, provides upstream source code for other ZFS implementations.[5] The OpenZFS site lists differences between the Illumos ZFS codebase and other open source implementations of ZFS.[116] To ease sharing of code, OpenZFS is strategically reducing the platform-related differences.

Founder members of OpenZFS include Matt Ahrens, one of the main architects of ZFS.

Solaris

Solaris 10

ZFS is part of Sun's own Solaris operating system and is thus available on both SPARC and x86-based systems. Since the code for ZFS is open source, a port to other operating systems and platforms can be produced without Sun's involvement.

Solaris 11

After Oracle's Solaris 11 Express release, the OS/Net consolidation (the main OS code) was made proprietary and closed-source, and further ZFS upgrades and implementations inside Solaris (such as encryption) are not compatible with other non-proprietary implementations which use previous versions of ZFS.

When creating a new ZFS pool, to retain the ability to use access the pool from other non-proprietary Solaris-based distributions, it is recommended to upgrade to Solaris 11 Express from OpenSolaris (snv_134b), and thereby stay at ZFS version 28.

OpenSolaris

OpenSolaris 2008.05 and 2009.06 use ZFS as their default filesystem. There are over a dozen 3rd-party distributions, of which nearly a dozen are mentioned here. (OpenIndiana and illumos are two new distributions not included on the OpenSolaris distribution reference page.)

OpenIndiana

OpenIndiana 148 and 151 use ZFS version 28, as implemented in Illumos.

By upgrading from OpenSolaris snv_134 to both OpenIndiana and Solaris 11 Express, one also has the ability to upgrade and separately boot Solaris 11 Express on the same ZFS pool, but one should not install Solaris 11 Express first because of ZFS incompatibilities introduced by Oracle past ZFS version 28.[117]

BSD

OS X

Two implementations of ZFS are community-maintained and free:

  • MacZFS stable (from open source provided by Apple)
    • ZFS file system version 2
    • ZFS pool version 8
  • ZFS-OSX (ported from ZFS on Linux) – an OpenZFS development towards the next generation of MacZFS
    • ZFS pool version 5000
    • feature flags.[40]

A proprietary implementation of ZFS is available at no cost from GreenBytes, Inc.:

  • ZEVO Community Edition 1.1.1
    • ZFS file system version 5
    • ZFS pool version 28.[118]

NetBSD

The NetBSD ZFS port was started as a part of the 2007 Google Summer of Code and in August 2009, the code was merged into NetBSD's source tree.[119]

FreeBSD

Paweł Jakub Dawidek ported ZFS to FreeBSD, and it has been part of FreeBSD since version 7.0.[120] This includes zfsboot, which allows booting FreeBSD directly from a ZFS volume.[121][122]

FreeBSD's ZFS implementation is fully functional; the only missing features are kernel CIFS server and iSCSI, but at least the latter can be added using externally available packages.[123] Samba can be used to provide a userspace CIFS server.

FreeBSD 7-STABLE (where updates to the series of versions 7.x are committed to) uses zpool version 6.

FreeBSD version 8 includes a much-updated implementation of ZFS, and zpool version 13 is supported in FreeBSD release 8.0.[124] zpool version 14 support was added to the 8-STABLE branch on January 11, 2010,[125] and is included in FreeBSD release 8.1. zpool version 15 is supported in release 8.2.[126] The 8-STABLE branch gained support for zpool version v28 and zfs version 5 in early June 2011.[127] These changes were released mid-April 2012 with FreeBSD 8.3.[128]

FreeBSD 9.0-RELEASE uses ZFS Pool version 28.[129][130]

MidnightBSD

MidnightBSD, a desktop operating system derived from FreeBSD, supports ZFS storage pool version 6 as of 0.3-RELEASE. This was derived from code included in FreeBSD 7.0-RELEASE. An update to storage pool 28 is in progress in 0.4-CURRENT and based on 9-STABLE sources around FreeBSD 9.1-RELEASE code.

PC-BSD

PC-BSD is a desktop version of FreeBSD, which inherits FreeBSD's ZFS support, similarly to FreeNAS. It also allows installation with disk encryption using geli. Its graphical installer can handle even / (root) on ZFS and RAID-Z pool and Gnome installs right from the start in an easy convenient way (GUI). The current PC-BSD 9.0+ "Isotope Edition" has ZFS filesystem version 5 and ZFS storage pool version 28.

FreeNAS

FreeNAS, an embedded open source network-attached storage (NAS) distribution based on FreeBSD, has the same ZFS support as FreeBSD and PC-BSD.

NAS4Free

NAS4Free, an embedded open source network-attached storage (NAS) distribution based on FreeBSD 9.0, has the same ZFS support as FreeBSD 9.0, ZFS storage pool version 28. This project is a continuation of FreeNAS 7 series project.[131]

Debian GNU/kFreeBSD

Being based on the FreeBSD kernel, Debian GNU/kFreeBSD has ZFS support from the kernel. However, additional userland tools are required,[132] while it is possible to have ZFS as root or /boot file system[133] in which case required GRUB configuration is performed by the Debian installer since the Wheezy release.[134]

As of 31 January 2013, the ZPool version available is 14 for the Squeeze release, and 28 for the Wheezy-9 release.[135]

Linux

ZFS has several Linux implementations, despite the fact that the GNU General Public License (GPL), which governs the Linux kernel, is incompatible with the Sun's Common Development and Distribution License (CDDL) under which ZFS is distributed.[136] According to the used licensing models, a single derived work of both projects cannot be legally distributed, as it is not possible to simultaneously meet both licenses' requirements.[137] To include ZFS in the Linux kernel, it would have to be cleanly reimplemented, and patents may hamper this.[138]

This problem is being worked around by providing the kernel facilities through a separate kernel module, a technical solution for a legal problem that is also being employed by vendors and distributors of proprietary hardware drivers.

Native ZFS on Linux

A native port of ZFS for Linux produced by the Lawrence Livermore National Laboratory (LLNL) was released in March 2013,[139][140] with the following key events:[141]

  • 2008: prototype to determine viability
  • 2009: initial ZVOL and Lustre support
  • 2010: development moved to Github
  • 2011: POSIX layer added
  • 2011: community of early adopters
  • 2012: production usage of ZFS
  • 2013: stable GA release.

Of the major distributions, Ubuntu and Gentoo have very good support for ZFS on Linux, meaning that required packages can be installed from their own package repositories, and configuring a ZFS root filesystem is well documented.[142][143]

Linux FUSE

Another solution to this problem was to port ZFS to Linux's FUSE system so the filesystem runs in userspace instead, where it is not considered a derived work of the kernel. A project to do this was sponsored by Google's Summer of Code program in 2006.[144]

KQ InfoTech

Another native port for Linux was developed by KQ InfoTech in 2010.[145][146] This port used the zvol implementation from the Lawrence Livermore National Laboratory as a starting point. A release supporting zpool v28 was announced in January 2011.[147] In April 2011, KQ Infotech was acquired by sTec, Inc., and their work on ZFS ceased.[148] Source code of this port can be found on GitHub.[149]

The work of KQ InfoTech was pulled back into the native port of ZFS for Linux, produced by the Lawrence Livermore National Laboratory.[148]

Comparisons

List of Operating Systems, Distros and add-ons that support ZFS, the zpool version it supports, and the Solaris build they are based on (if any):

OS Zpool version Sun/Oracle Build # Comments
Oracle Solaris 11 2011.11 34 b175
Oracle Solaris Express 11 2010.11 31 b151a licensed for testing only
OpenSolaris 2009.06 14 b111b
OpenSolaris (last dev) 22 b134
OpenIndiana 5000 b147 OpenIndiana creates a name clash with naming their code b151a
Nexenta Core 3.0.1 26 b134+ GNU userland
NexentaStor Community 3.0.1 26 b134+ up to 18 TB, web admin
NexentaStor Community 3.1.0 28 b134+ GNU userland
NexentaStor Enterprise 28 b134 + not free, web admin
GNU/kFreeBSD "Squeeze" (as of 1/31/2013) 14 Requires package "zfsutils"
GNU/kFreeBSD "Wheezy-9" (as of 2/21/2013) 28 Requires package "zfsutils"
FreeBSD 8.3-RELEASE / 9.1-RELEASE 28
FreeBSD 8.4-RELEASE / 9.1-STABLE / 10.0-CURRENT 5000
zfs-fuse 0.7.0 23 low performance
LLNL's ZFS on Linux 0.6.1 5000 0.6.0 release candidate has POSIX layer
KQ Infotech's ZFS on Linux 28 Now dead and has been rolled into LLNL.
Belenix 0.8b1 14 b111
Schillix 0.7.2 28 b147
StormOS "hail" based on Nexenta
Jaris Japanese
MilaX 0.5 20 b128a small size
FreeNAS 8.0.2 / 8.2 15
FreeNAS 8.3.0 28 based on FreeBSD 8.3
FreeNAS 9.1.0 5000 based on FreeBSD 9.1
NAS4Free 9.1.0.1 28 based on FreeBSD 9.1
Korona 4.5.0 22 b134 KDE
EON NAS (v0.6) 22 b130 embedded NAS
EON NAS (v1.0beta) 28 b151a embedded NAS
napp-it 28/5000 Illumos/ Solaris Storage appliance with Web-UI build on Nexenta Illumian, OpenIndiana, OmniOS or Solaris 11.1
OmniOS 28/5000 Illumos b151+ minimal storage server distribution based on Illumos
SmartOS 28/5000 Illumos b151+ minimal live distribution based on Illumos (boots from USB/CD) suited for cloud and hypervisor use (KVM)
OS X 10.5, 10.6, 10.7, and 10.8 8 MacZFS
OS X 10.6, 10.7 and 10.8 28 ZEVO
NetBSD 13
MidnightBSD 6

See also

References

  1. ^ a b "What's new in Solaris 11 Express 2010.11" (PDF). Oracle. Retrieved November 17, 2010.
  2. ^ "1.1 What about the licensing issue?". Retrieved November 18, 2010.
  3. ^ "Sun Trademarks – ZFS". Sun Microsystems.
  4. ^ "Status Information for Serial Number 85901629 (ZFS)". United States Patent and Trademark Office. Retrieved October 21, 2013.
  5. ^ a b c "OpenZFS". OpenZFS. Retrieved September 19, 2013.
  6. ^ a b "OpenZFS Announcement". OpenZFS. September 17, 2013. Retrieved September 19, 2013.
  7. ^ "ZFS: the last word in file systems". Sun Microsystems. September 14, 2004. Archived from the original on April 28, 2006. Retrieved April 30, 2006.
  8. ^ Matthew Ahrens (November 1, 2011). "ZFS 10 year anniversary". Retrieved July 24, 2012.
  9. ^ Jeff Bonwick (October 31, 2005). "ZFS: The Last Word in Filesystems". Jeff Bonwick's Blog. Retrieved June 13, 2013.
  10. ^ "Sun Celebrates Successful One-Year Anniversary of OpenSolaris". Sun Microsystems. June 20, 2006.
  11. ^ "ZFS FAQ at OpenSolaris.org". Sun Microsystems. Retrieved May 18, 2011. The largest SI prefix we liked was 'zetta' ('yotta' was out of the question)
  12. ^ Jeff Bonwick (May 3, 2006). "You say zeta, I say zetta". Jeff Bonwick's Blog. Retrieved April 23, 2012.
  13. ^ "Porting ZFS to OSX". zfs-discuss. April 27, 2006. Retrieved April 30, 2006.
  14. ^ "Apple: Leopard offers limited ZFS read-only". MacNN. June 12, 2007. Retrieved June 23, 2007.
  15. ^ "Apple delivers ZFS Read/Write Developer Preview 1.1 for Leopard". Ars Technica. October 7, 2007. Retrieved October 7, 2007.
  16. ^ Ché Kristo (November 18, 2007). "ZFS Beta Seed v1.1 will not install on Leopard.1 (10.5.1) " ideas are free". Retrieved December 30, 2007.[dead link]
  17. ^ ZFS.macosforge.org
  18. ^ http://alblue.blogspot.com/2008/11/zfs-119-on-mac-os-x.html |title=Alblue.blogspot.com
  19. ^ "Snow Leopard (archive.org cache)". July 21, 2008.
  20. ^ "Snow Leopard". June 9, 2009. Retrieved June 10, 2008.
  21. ^ "maczfs – Official Site for the Free ZFS for Mac OS – Google Project Hosting". Google. Retrieved July 30, 2012.
  22. ^ "zfs-macos | Google Groups". Google. Retrieved November 4, 2011.
  23. ^ "Press Center « Ten's Complement LLC". Retrieved August 10, 2012.
  24. ^ "Ten's Complement LLC: Overview | LinkedIn". Retrieved August 10, 2012.
  25. ^ "Don J Brady (TensComplement) on Twitter". Retrieved August 10, 2012.
  26. ^ "How ZFS is slowly making its way to Mac OS X". March 18, 2011. Retrieved March 19, 2011.
  27. ^ "Z E V O: ZFS For Mac is Slowly Coming Back · Col's Tech". Retrieved August 10, 2012.
  28. ^ url = http://tenscomplement.com/our-products | title = Our Products « Ten's Complement LLC
  29. ^ url = http://www.getgreenbytes.com/blog/bid/80758/GreenBytes-Welcomes-ZEVO-and-Don-Brady | title = GreenBytes Welcomes ZEVO and Don Brady
  30. ^ url = http://www.getgreenbytes.com/blog/bid/80923/GreenBytes-and-the-Future-of-ZEVO | title = #GreenBytes and the Future of #ZEVO
  31. ^ "Please upgrade to 1.0.3 : ZEVO Support". Retrieved July 27, 2012. {{cite web}}: no-break space character in |title= at position 24 (help)
  32. ^ "Our Products « Ten's Complement LLC". Retrieved August 10, 2012.
  33. ^ "roddi / ZCE-CDDL-FILES". Retrieved September 22, 2012.
  34. ^ "zevo.getgreenbytes.com • View topic - Source Code?".
  35. ^ Harris, Robin (June 20, 2013). "Can Mac ZFS be saved?". StorageMojo. TechnoQWAN LLC. Retrieved June 20, 2013.
  36. ^ Greenbytes, Inc. (September 10, 2013). "Twitter / GetGreenBytes: @ylluminate George, we're happy…". Twitter. Retrieved September 10, 2013.
  37. ^ "Distribution – OpenZFS". OpenZFS. Retrieved September 17, 2013.
  38. ^ "Solaris ZFS Administration Guide, Appendix A ZFS Version Descriptions". Oracle Corporation. 2010. Retrieved February 11, 2011.
  39. ^ "Oracle Solaris ZFS Version Descriptions". Oracle Corporation. Retrieved September 23, 2013.
  40. ^ a b "Features – OpenZFS – Feature flags". OpenZFS. Retrieved September 22, 2013.
  41. ^ Siden, Christopher (2012). "ZFS Feature Flags" (PDF). Illumos Meetup. Delphix. p. 4. Retrieved September 22, 2013. {{cite web}}: Unknown parameter |month= ignored (help)
  42. ^ "/usr/src/uts/common/sys/fs/zfs.h (line 338)". illumos (GitHub). Retrieved November 16, 2013.
  43. ^ "/usr/src/uts/common/fs/zfs/zfeature.c (line 89)". illumos (GitHub). Retrieved November 16, 2013.
  44. ^ a b c "While under Sun Microsystems' control, there were bi-weekly snapshots of Solaris Nevada (the codename for the next-generation Solaris OS to eventually succeed Solaris 10) and this new code was then pulled into new OpenSolaris preview snapshots available at Genunix.org. The stable releases of OpenSolaris are based off of these Nevada builds." Larabel, Michael. "It Looks Like Oracle Will Stand Behind OpenSolaris". Phoronix Media. Retrieved November 21, 2012.
  45. ^ Ljubuncic, Igor (May 23, 2011). "OpenIndiana — there's still hope". DistroWatch.
  46. ^ "Welcome to Project OpenIndiana!". Project OpenIndiana. September 10, 2010. Retrieved September 14, 2010.
  47. ^ The Extended file system (Ext) has metadata structure copied from UFS. "Rémy Card (Interview, April 1998)". April Association. April 19, 1999. Retrieved February 8, 2012. (In French)
  48. ^ Vijayan Prabhakaran (2006). "IRON FILE SYSTEMS" (PDF). Doctor of Philosophy in Computer Sciences. University of Wisconsin-Madison. Retrieved June 9, 2012.
  49. ^ "Parity Lost and Parity Regained".
  50. ^ "An Analysis of Data Corruption in the Storage Stack" (PDF).
  51. ^ "Impact of Disk Corruption on Open-Source DBMS" (PDF).
  52. ^ "Baarf.com". Baarf.com. Retrieved November 4, 2011.
  53. ^ Kadav, Asim; Rajimwale, Abhishek. "Reliability Analysis of ZFS" (PDF).
  54. ^ Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. "End-to-end Data Integrity for File Systems: A ZFS Case Study" (PDF). Madison: Computer Sciences Department, University of Wisconsin. p. 14. Retrieved December 6, 2010.{{cite web}}: CS1 maint: multiple names: authors list (link)
  55. ^ Larabel, Michael. "Benchmarking ZFS and UFS On FreeBSD vs. EXT4 & Btrfs On Linux". Phoronix Media 2012. Retrieved November 21, 2012.
  56. ^ Larabel, Michael. "Can DragonFlyBSD's HAMMER Compete With Btrfs, ZFS?". Phoronix Media 2012. Retrieved November 21, 2012.
  57. ^ Curtis E. Stevens (2011). "Advanced Format in Legacy Infrastructures: More Transparent than Disruptive" (PDF). idema.org. Retrieved November 5, 2013.
  58. ^ "Enterprise Performance 15K HDD: Data Sheet" (PDF). Seagate. 2013. Retrieved October 24, 2013.
  59. ^ "WD Xe: Datacenter hard drives" (PDF). Western Digital. 2013. Retrieved October 24, 2013.
  60. ^ "WD Re: Datacenter Capacity HDD" (PDF). Western Digital. 2013. Retrieved November 14, 2013.
  61. ^ "Cheetah 10K.7 FC Product Manual" (PDF). Seagate. August 5, 2005. Archived from the original (PDF) on June 12, 2009. Retrieved August 29, 2013. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  62. ^ Lakshmi N. Bairavasundaram (2007). "An Analysis of Latent Sector Errors in Disk Drives". Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS'07). San Diego, California, United States: ACM: 289–300. doi:10.1145/1254882.1254917. Retrieved June 9, 2012. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
  63. ^ David S. H. Rosenthal (October 1, 2010). "Keeping Bits Safe: How Hard Can It Be?". ACM Queue. Retrieved August 29, 2013.
  64. ^ Eric Lowe (16). "ZFS saves the day(-ta)!" (Blog). Oracle - Core Dumps of a Kernel Hacker's Brain - Eric Lowe's Blog. Oracle. Retrieved 9 June 2012. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |month= ignored (help)
  65. ^ bcantrill (31). "Shouting in the Datacenter" (Video file). YouTube. Google. Retrieved 9 June 2012. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |month= ignored (help)
  66. ^ jforonda (31). "Faulty FC port meets ZFS" (Blog). Blogger - Outside the Box. Google. Retrieved 9 June 2012. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |month= ignored (help)
  67. ^ http://www.usenix.org/event/fast08/tech/full_papers/jiang/jiang.pdf
  68. ^ Bernd Panzer-Steindel (8). "Draft 1.3". Data integrity. CERN. Retrieved 9 June 2012. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |month= ignored (help)
  69. ^ "Observations on Errors, Corrections, & Trust of Dependent Systems".
  70. ^ "Silent data corruption in disk arrays: A solution" (PDF). NEC. 2009. Retrieved October 24, 2013.
  71. ^ a b "A Conversation with Jeff Bonwick and Bill Moore". Association for Computing Machinery. November 15, 2007. Retrieved December 6, 2010.
  72. ^ a b Bonwich, Jeff (December 9, 2005). "ZFS End-to-End Data Integrity".
  73. ^ Bonwick, Jeff (December 8, 2005). "ZFS End-to-End Data Integrity". Retrieved September 19, 2013.
  74. ^ Cook, Tim (November 16, 2009). "Demonstrating ZFS Self-Healing".
  75. ^ Ranch, Richard (May 4, 2007). "ZFS, copies, and data protection".
  76. ^ "Difference between Desktop edition and RAID (Enterprise) edition drives".
  77. ^ "Actually it's a n-way mirror". c0t0d0s0.org. September 4, 2013. Retrieved November 19, 2013.
  78. ^ "No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, those of file system repair and file system validation." "Checking ZFS File System Integrity". Oracle. Retrieved November 25, 2012.
  79. ^ "If you have consumer-quality drives, consider a weekly scrubbing schedule. If you have datacenter-quality drives, consider a monthly scrubbing schedule. " "ZFS Scrubs". freenas.org. Retrieved November 25, 2012.
  80. ^ "You should also run a scrub prior to replacing devices or temporarily reducing a pool's redundancy to ensure that all devices are currently operational." "ZFS Best Practices Guide". solarisinternals.com. Retrieved November 25, 2012.
  81. ^ David Cummins (4). "No, ZFS really doesn't need a fsck" (Blog). No, ZFS really doesn't need... C0T0D0S0.ORG. Oracle. Retrieved 9 June 2012. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |month= ignored (help)
  82. ^ "Solaris ZFS Administration Guide". Oracle Corporation. Retrieved February 11, 2011.
  83. ^ "ZFS Best Practices Guide". Solaris Performance Wiki. Retrieved October 2, 2007.
  84. ^ Leventhal, Adam. "Bug ID: 6854612 triple-parity RAID-Z". Sun Microsystems. Retrieved July 17, 2009.
  85. ^ Leventhal, Adam (July 16, 2009). "6854612 triple-parity RAID-Z". zfs-discuss (Mailing list). Retrieved July 17, 2009. {{cite mailing list}}: Unknown parameter |mailinglist= ignored (|mailing-list= suggested) (help)
  86. ^ "WHEN TO (AND NOT TO) USE RAID-Z". Oracle. Retrieved May 13, 2013.
  87. ^ "Solaris ZFS Enables Hybrid Storage Pools—Shatters Economic and Performance Barriers" (PDF). Sun.com. September 7, 2010. Retrieved November 4, 2011.
  88. ^ "Brendan's blog » ZFS L2ARC". Dtrace.org. Retrieved October 5, 2012.
  89. ^ "Solaris ZFS Performance Tuning: Synchronous Writes and the ZIL". Constantin.glez.de. July 20, 2010. Retrieved October 5, 2012.
  90. ^ Jeff Bonwick (October 31, 2005). "ZFS: The Last Word in Filesystems". Jeff Bonwick's Blog. Retrieved June 22, 2013.
  91. ^ "Solaris ZFS Administration Guide". Oracle Corporation. Retrieved February 11, 2011.
  92. ^ "ZFS On-Disk Specification" (PDF). Sun Microsystems, Inc. 2006. See section 2.4.
  93. ^ "Unix.com". Unix.com. November 13, 2007. Retrieved November 4, 2011.
  94. ^ "ZFS Deduplication".
  95. ^ Gary Sims (4). "Building ZFS Based Network Attached Storage Using FreeNAS 8" (Blog). TrainSignal Training. TrainSignal, Inc. Retrieved 9 June 2012. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help); Unknown parameter |month= ignored (help)
  96. ^ Ray Van Dolson (May 2011). "[zfs-discuss] Summary: Deduplication Memory Requirements". zfs-discuss mailing list.
  97. ^ "ZFSTuningGuide".
  98. ^ Chris Mellor (October 12, 2012). "GreenBytes brandishes full-fat clone VDI pumper". The Register. Retrieved August 29, 2013.
  99. ^ Chris Mellor (June 1, 2012). "Newcomer gets out its box, plans to sell it cheaply to all comers". The Register. Retrieved August 29, 2013.
  100. ^ "Encrypting ZFS File Systems".
  101. ^ "Having my secured cake and Cloning it too (aka Encryption + Dedup with ZFS)".
  102. ^ "Solaris ZFS Administration Guide". Chapter 6 Managing ZFS File Systems. Retrieved March 17, 2009.[dead link]
  103. ^ "Smokin' Mirrors". Jeff Bonwick's Weblog. May 2, 2006. Retrieved February 23, 2007.
  104. ^ "ZFS Block Allocation". Jeff Bonwick's Weblog. November 4, 2006. Retrieved February 23, 2007.
  105. ^ "Ditto Blocks — The Amazing Tape Repellent". Flippin' off bits Weblog. May 12, 2006. Retrieved March 1, 2007.
  106. ^ "Adding new disks and ditto block behaviour". Retrieved October 19, 2009.
  107. ^ "OpenSolaris.org". Sun Microsystems. Retrieved May 22, 2009.
  108. ^ "Jeff Bonwick Keynote at Kernel Conference Australia 2009". Blogs.sun.com. September 28, 2009. Retrieved November 4, 2011.
  109. ^ "Bug ID 4852783: reduce pool capacity". OpenSolaris Project. Retrieved March 28, 2009.
  110. ^ Goebbels, Mario (April 19, 2007). "Permanently removing vdevs from a pool". zfs-discuss (Mailing list). {{cite mailing list}}: Unknown parameter |mailinglist= ignored (|mailing-list= suggested) (help)
  111. ^ "Expand-O-Matic RAID-Z". Adam Leventhal. April 7, 2008.
  112. ^ "zpool(1M)". Download.oracle.com. June 11, 2010. Retrieved November 4, 2011.
  113. ^ http://dtrace.org/blogs/ahl/2009/07/21/triple-parity-raid-z/
  114. ^ Bonwick, Jeff. "Smokin' Mirrors (1 May 2006)". Jeff Bonwick's Blog. Oracle. Retrieved February 13, 2012.
  115. ^ brendan (December 2, 2008). "A quarter million NFS IOPS". Oracle Sun. Retrieved January 28, 2012.
  116. ^ "Platform code differences". OpenZFS. Retrieved September 20, 2013.
  117. ^ "Upgrading from OpenSolaris". Retrieved September 24, 2011.
  118. ^ "ZEVO Wiki Site/ZFS Pool And Filesystem Versions". GreenBytes, Inc. September 15, 2012. Retrieved September 22, 2013.
  119. ^ "NetBSD Google Summer of Code projects: ZFS".
  120. ^ Dawidek, Paweł (April 6, 2007). "ZFS committed to the FreeBSD base". Retrieved April 6, 2007.
  121. ^ "Revision 192498". May 20, 2009. Retrieved May 22, 2009.
  122. ^ "ZFS v13 in 7-STABLE". May 21, 2009. Retrieved May 22, 2009.
  123. ^ "iSCSI target for FreeBSD". Retrieved August 6, 2011.
  124. ^ "FreeBSD 8.0-RELEASE Release Notes". FreeBSD. Retrieved November 27, 2009.
  125. ^ "FreeBSD 8.0-STABLE Subversion logs". FreeBSD. Retrieved February 5, 2010.
  126. ^ "FreeBSD 8.2-RELEASE Release Notes". FreeBSD. Retrieved March 9, 2011.
  127. ^ "HEADS UP: ZFS v28 merged to 8-STABLE". June 6, 2011. Retrieved June 11, 2011.
  128. ^ "FreeBSD 8.3-RELEASE Announcement". Retrieved June 11, 2012.
  129. ^ Pawel Jakub Dawidek. "ZFS v28 is ready for wider testing". Retrieved August 31, 2010.
  130. ^ "FreeBSD 9.0-RELEASE Release Notes". FreeBSD. Retrieved January 12, 2012.
  131. ^ "NAS4Free - The Free Network Attached Storage Project". Project web site. Retrieved August 29, 2013.
  132. ^ "Debian GNU/kFreeBSD FAQ". Is there ZFS support?. Retrieved September 24, 2013.
  133. ^ "Debian GNU/kFreeBSD FAQ". Can I use ZFS as root or /boot file system?. Retrieved September 24, 2013.
  134. ^ "Debian GNU/kFreeBSD FAQ". What grub commands are necessary to boot Debian/kFreeBSD from a zfs root?. Retrieved September 24, 2013.
  135. ^ Larabel, Michael (September 10, 2010). "Debian GNU/kFreeBSD Becomes More Interesting". Phoronix. Retrieved September 24, 2013.
  136. ^ Aditya Rajgarhia and Ashish Gehani (November 23, 2012). "Performance and Extension of User Space File Systems" (PDF).
  137. ^ "Linus on GPLv3 and ZFS". Lwn.net. June 12, 2007. Retrieved November 4, 2011.
  138. ^ Jeremy Andrews (April 19, 2007). "Linux: ZFS, Licenses and Patents". Archived from the original on June 12, 2011. Retrieved April 21, 2007.
  139. ^ Behlendorf, Brian (May 28, 2013). "spl/zfs-0.6.1 released". zfs-announce mailing list. Retrieved October 9, 2013.
  140. ^ "ZFS on Linux". Retrieved August 29, 2013.
  141. ^ Matt Ahrens; Brian Behlendorf (September 17, 2013). "LinuxCon 2013: OpenZFS" (PDF). linuxfoundation.org. Retrieved November 13, 2013.
  142. ^ "Ubuntu Wiki". ZFS. ubuntu.com. Retrieved October 9, 2013.
  143. ^ "Gentoo Wiki". ZFS. gentoo.org. Retrieved October 9, 2013.
  144. ^ Ricardo Correia (September 13, 2008). "ZFS on FUSE/Linux". Retrieved November 13, 2013.
  145. ^ Darshin (August 24, 2010). "ZFS Port to Linux (all versions)". Retrieved August 31, 2010.
  146. ^ "Where can I get the ZFS for Linux source code?". Archived from the original on October 8, 2011. Retrieved August 29, 2013. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  147. ^ Phoronix (November 22, 2010). "Running The Native ZFS Linux Kernel Module, Plus Benchmarks". Retrieved December 7, 2010.
  148. ^ a b "KQ ZFS Linux Is No Longer Actively Being Worked On". June 10, 2011.
  149. ^ "zfs-linux / zfs".

Bibliography