Disk encryption theory: Difference between revisions

Content deleted Content added

Inline

Revision as of 12:53, 3 September 2008

Disk encryption is a special case of data at rest protection when the storage media is a sector-addressable device (e.g., a hard disk). This article presents cryptographic aspects of the problem. For discussion of different software packages and hardware devices devoted to this problem see disk encryption software and disk encryption hardware.

Problem definition

Implementation of encrypted data storage on a sector-level–random-access device faces several constraints:

implementation shall efficiently encrypt and decrypt data in any sector,
implementation shall use only constant amount of additional storage for a device of arbitrary size.

The strongest definition of security is as follows: An implementation is secure if an adversary, who can observe the raw device, provided plaintexts to be stored, and modify some ciphertexts, cannot deduce any information about the plaintext of each sector, except the information that sector $i$ at time $t_{0}$ was the same as the same sector $i$ at time $t_{1}$ (in order to update only a single encrypted sector for each plain sector update this information cannot be hidden).

The above definition addresses only the confidentiality requirements, the integrity requirement is that any unauthorized modifications of the raw device shall be noticed before the modified data is used. Unfortunately, it is impossible to implement this requirement in its totality, because an adversary can rollback the device to one of its previous states. Ignoring the rollback attack of separate sectors, this requirement is straightforward to implement by storing and verifying a MAC tag for each sector: $t_{i}=M(i\|d_{i})$ , where $i$ is the number of the sector and $d_{i}$ is the data stored in the sector. In real life this scheme is almost never implemented, arguably, because advanced disk encryption methods (such as XEX, LRW, or CMC / EME) support pseudo-integrity: an adversary can replace some block (for XEX or LRW) or sector (for CMC / EME) with some of its previous values, but if he uses some other data (e.g., some previous value of some other block or sector) then the decryption will be some random data which adversary cannot predict. Note that preventing inconsistent rollbacks (an attack where some sectors are replaced with the data they used to contain whereas some other sectors are not changed) requires the use of a hash tree with MAC of the root.

Simple approaches

Any implementation can be “secure” depending on the threat model. For example, if we want to “protect” a PC disk partition from a “kid sister” it is enough to change the type of the partition (see MBR) using any low-level disk editor to make the partition unreadable by standard tools. Of course, any computer-savvy attacker will easily break such a “protection,” but if the attacker does not know what an MBR is, or how to edit it, he will not be able to read any information from the “hidden” partition. On the other extreme, an implementation may be designed to prevent traitor tracing, that is to protect against an adversary who only wants to confirm that some particular file (crafted by the adversary) is actually stored somewhere on user's computer.

The minimal addressable part of a disk is called a sector. On many systems each sector has 512 bytes, although there are some exceptions, for example, AS/400 uses 520-byte sectors. A block cipher operates on a single block (commonly, 8 or 16 bytes), and thus encrypting all $k$ blocks in the sector requires some mode of operation ( $k=64$ or $k=32$ in a 512-byte sector). Since the ECB mode always encrypts the same plaintext block into the same ciphertext, it reveals data patterns and is thus vulnerable to watermarking attacks. Other simple modes of operation (CBC, CFB, OFB and CTR) require an IV (Initialization Vector—an auxiliary random input) for each chunk of blocks which are to be encrypted independently.

Despite the fact that it is possible in practice to use the counter (CTR) mode with a single IV per volume, such uses are insecure if an adversary is able to gather several encrypted versions of the same sector (e.g., snapshots taken at different times). Since, in this particular mode of operation, the ciphertext of each fixed sector is a plaintext XORed with a fixed value ( $C=P\oplus V$ ) it follows that given several ciphertexts the adversary knows that $C\oplus C'=P\oplus P'$ and thus (if enough information is known about possible values of the plaintexts, e.g., that they are from a file with English text) cryptanalysis will be straightforward. It is not always easy to tell if such a threat model is applicable. For example, it can be used to protect the hard disk of a laptop so that, if stolen (only once!), no data can be recovered. However even this is not foolproof for modern hard disks which can often anticipate the failure of a sector, map it to a new one and stop using the damaged sector. On the other hand, if the encrypted volume is stored as a file it is possible that, due to inner working of journaling file systems, several versions of (some sectors of) the encrypted volume will be available to an adversary.

CBC-based approaches

Despite its deficiencies (described below) the CBC (Cipher Block Chaining) mode is still the most commonly used for disk encryption. Since auxiliary information isn't stored for the IV of each sector, it is thus derived from the sector number, its content, and some static information. Several such methods were proposed and used.

The simplest method is to encrypt each sector in CBC mode ( $C_{i}=E(C_{i-1}\oplus P_{i})$ ) using the (padded) sector number as the initialization vector (IV): $C_{-1}^{(n)}=n\|0$ . Here the IV is not secret and thus this scheme is vulnerable to a watermarking attack: if, for example, the sector number 6 has $P_{0}^{(6)}=0$ and the next sector has $P_{0}^{(7)}=1$ then $C_{0}^{(6)}=E(6\oplus 0)=E(7\oplus 1)=C_{0}^{(7)}$ . So, for example, if the user stores a specially crafted file (sent to him by an adversary) then an adversary has a proof that the file is indeed stored.

In order to prevent this attack the ESSIV was introduced: $C_{-1}^{({\textrm {Sector}})}=E({\textrm {Sector}}\|{\textrm {Salt}})$ . Unfortunately, since $C_{i}$ does not depend on $C_{i+1}$ it follows that if only a block in the end of a sector is changed then all the preceding blocks stay the same, and thus an adversary who sees the same sector before and after such a change knows that only part of it was changed.

It is possible to prevent this attack by deriving the IV from the data stored in the sector. One approach is to use a hash of all the blocks starting from the second one (counting from zero it has number 1): $C_{-1}=H({\textrm {Salt}}\|n\|P_{1}\|\ldots \|P_{k-1})$ . Since in order to decrypt $P_{1}$ one only needs $C_{0}$ and $C_{1}$ , one can decrypt $P_{1},\ldots ,P_{k-1}$ , calculate $C_{-1}$ and then decrypt $P_{0}$ . With this method, a change of any plaintext block inside a sector can change all the ciphertexts. Unfortunately, this method is about twice as slow as the previous one: each block (except the first one) has to be processed twice. And still there is an attack against it (and all the CBC-based approaches): suppose that an attacker is allowed to read some files on the device (but not all of them) and he can change the ciphertext. Using these capabilities he can read $P_{1},\ldots ,P_{k-1}$ of any sector: he replaces the ciphertext of his sector and asks the system to decrypt his sector. If $C_{-1}$ depends on $n$ then the first block is garbage, but all the other blocks depend only on the ciphertext and thus he receives the original plaintext.

LRW

In order to prevent such elaborate attacks, different modes of operation were introduced: tweakable narrow-block encryption (LRW and XEX) and wide-block encryption (CMC and EME).

Whereas a purpose of a usual block cipher $E_{K}$ is to mimic a random permutation for any secret key $K$ , the purpose of tweakable encryption $E_{K}^{T}$ is to mimic a random permutation for any secret key $K$ and any known tweak $T$ . The tweakable narrow-block encryption (LRW)^[1] is an instantiation of the mode of operations introduced by Liskov, Rivest, and Wagner^[2] (see Theorem 2). This mode uses two keys: $K$ is the key for the block cipher and $F$ is an additional key of the same size as block. For example, for AES with a 256-bit key, $K$ is a 256-bit number and $F$ is a 128-bit number. Encrypting block $P$ with logical index (tweak) $I$ uses the following formula: $E_{K}(P\oplus X)\oplus X$ , where $X=F\otimes I$ . Here multiplication $\otimes$ and addition $\oplus$ are performed in the finite field ( ${\textrm {GF}}(2^{128})$ for AES). With some precomputation, only a single multiplication per sector is required (note that addition in a binary finite field is a simple bitwise addition, also known as xor): $F\otimes I=F\otimes (I_{0}\oplus \delta )=F\otimes I_{0}\oplus F\otimes \delta$ , where $F\otimes \delta$ are precomputed for all possible values of $\delta$ . This mode of operation needs only a single encryption per block and protects against all the above attacks except a minor leak: if the user changes a single plaintext block in a sector then only a single ciphertext block changes. (Note that this is not the same leak the ECB mode has: with LRW mode equal plaintexts in different positions are encrypted to random ciphertexts.)

Some security concerns exist with LRW, and this mode of operation has now been replaced by XTS

LRW is employed by the FreeOTFE, Bestcrypt and dm-crypt disk encryption systems.

XEX

Another tweakable encryption mode XEX (Xor-Encrypt-Xor), was designed by Rogaway^[3] to allow very efficient processing of consecutive blocks. The key $K$ is divided into two parts of equal size: $K=K_{1}\|K_{2}$ . The tweak is represented as a combination of the sector address and index of the block inside the sector (the original XEX mode proposed by Rogaway^[3] allows to have several indexes). To encrypt block $j$ in sector $I$ , the following formula is used $C=E_{K_{1}}(P\oplus X)\oplus X$ , where $X=E_{K_{2}}(I)\otimes \alpha ^{j}$ and $\alpha$ is the primitive element of ${\textrm {GF}}(2^{128})$ defined by polynomial $x$ (0x2 in hexadecimal).

The basic blocks of the LRW mode (AES cipher and Galois field multiplication) are the same as the ones used in the Galois/Counter Mode (GCM) thus permitting a compact implementation of the universal LRW/XEX/GCM hardware.

XTS

XTS is XEX-based Tweaked CodeBook mode (TCB) with CipherText Stealing (CTS). Although XEX-TCB-CTS should be abbreviated as XTC, “C” was replaced with “S” (for “stealing”) to avoid confusion with ecstasy, a well-known drug that is illegal in most countries. Ciphertext stealing provides support for sectors with size not divisible by block size, for example, 520-byte sectors and 16-byte blocks. XTS-AES was standardized in 2007-12-19 as IEEE P1619 Standard for Cryptographic Protection of Data on Block-Oriented Storage Devices.

As of August 2008, XTS is supported by dm-crypt, FreeOTFE, TrueCrypt and OpenBSD softraid disk encryption software.

The XTS proof yields strong security guarantees as long as the same key is not used to encrypt much more than 1 terabyte of data. Up until this point, no attack can succeed with probability better than approximately one in eight quadrillion. However this security guarantee deteriorates as more data is encrypted with the same key. With a petabyte the attack success probability rate decreases to *at most* eight in a trillion, with an exabyte, the success probability is reduced to *at most* eight in a million.

This means that using XTS, with one key for more than a few hundred terabytes of data opens up the possibility of attacks (and is not mitigated by using a larger AES key size, so using a 256-bit key doesn't change this).

The decision on the maximum amount to data to be encrypted with a single key using XTS should consider the above together with the practical implication of the attack (which is the ability of the adversary to modiy plaintext of a specific block, where the position of this block may not be under the advisary's control).

CMC and EME

CMC and EME protect even against the minor leak mentioned above. Unfortunately, the price is a twofold degradation of performance: each block must be encrypted twice; many consider this to be too high a cost, since the same leak on a sector level is unavoidable anyway.

CMC, introduced by Halevi and Rogaway, stands for CBC-mask-CBC: the whole sector encrypted in CBC mode (with $C_{-1}=E_{A}(I)$ ), the ciphertext is masked by xoring with $2(C'_{0}\oplus C'_{k-1})$ , and decrypted in CBC mode starting from the last block. When the underlying block cipher is a strong pseudorandom permutation (PRP) then on the sector level the scheme is a tweakable PRP. One problem is that in order to decrypt $P_{0}$ one must sequentially pass over all the data twice.

In order to solve this problem, Halevi and Rogaway introduced a parallelizable variant called EME (ECB-mask-ECB). It works in the following way:

the plaintexts are xored with $L=E_{K}(0)$ , shifted by different amount to the left, and are encrypted: $P'_{i}=E_{K}(P_{i}\oplus 2^{i}L)$ ;
the mask is calculated: $M=M_{P}\oplus M_{C}$ , where $M_{P}=I\;\oplus \;\bigoplus P'_{i}$ and $M_{C}=E_{K}(M_{P})$ ;
intermediate ciphertexts are masked: $C'_{i}=P'_{i}\oplus 2^{i}M$ for $i=1,\ldots ,k-1$ and $C'_{0}=M_{C}\oplus I\oplus \bigoplus _{i=1}^{k-1}C'_{i}$ ;
the final ciphertexts are calculated: $C_{i}=E_{K}(C'_{i})\oplus 2^{i}L$ for $i=0,\ldots ,k-1$ .

Note that unlike LRW and CMC there is only a single key $K$ .

CMC and EME were considered for standardization by SISWG. CMC was rejected for technical considerations.^{[citation needed]} EME is patented, and so is not favored to be a primary supported mode.^[4]

ESSIV

Encrypted Salt-Sector Initialization Vector (ESSIV)^[1] is a method for generating initialization vectors for block encryption to use in disk encryption.

The usual methods for generating IVs are predictable sequences of numbers based on for example time stamp or sector number and permits certain attacks such as a Watermarking attack.

ESSIV prevents such attacks by generating IVs from a combination of the sector number with the hash of the key. It is the combination with the key in form of a hash that makes the IV unpredictable.

${\begin{matrix}&IV(sector)&=&E_{s}(Sector),&where&s=Hash_{K}\end{matrix}}$

ESSIV was designed by Clemens Fruhwirth and has been integrated into the Linux kernel since version 2.6.10, though a similar scheme has been used to generate IVs for OpenBSD's swap encryption since 2000 ^[2]. It is employed by the dm-crypt and FreeOTFE disk encryption systems to increase security.

Sources

References

As I can't seem to delete it, I wanted to add a warning about the Fruhwirth reference cited here. In Proposition 2 on page 5 the author claims that the (ring of) polynomials in one indeterminate over a field is also a field. The definition of a Galois field on page 6 could have been lifted from something written by Mike Rosing and is typical of the trashing standard mathematical language (not to mention English!) receives in this work. In other words, beware!

Endnotes

^ Latest SISWG and IEEE P1619 drafts and meeting information are on the P1619 home page [5].
^ M. Liskov, R. Rivest, and D. Wagner. Tweakable block ciphers [6], CRYPTO '02 (LNCS, volume 2442), 2002.
^ ^a P. Rogaway, Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes OCB and PMAC [7].
^ P. Rogaway, Block cipher mode of operation for constructing a wide-blocksize block cipher from a conventional block cipher, US Patent Application 20040131182 A1, [8]

Papers

S. Halevi and P. Rogaway, A Tweakable Enciphering Mode, CRYPTO '03 (LNCS, volume 2729), 2003.
S. Halevi and P. Rogaway, A Parallelizable Enciphering Mode [9], 2003.
Standard Architecture for Encrypted Shared Storage Media, IEEE Project 1619 (P1619), PAR FORM.
SISWG, Draft Proposal for Key Backup Format [10], 2004.
SISWG, Draft Proposal for Tweakable Wide-block Encryption [11], 2004.
James Hughes, Encrypted Storage — Challenges and Methods [12]
J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten (2008-02-21). "Lest We Remember: Cold Boot Attacks on Encryption Keys" (PDF). Princeton University. {{cite journal}}: Cite journal requires |journal= (help)CS1 maint: multiple names: authors list (link)
Niels Fergusson (August 2006). "AES-CBC + Elephant Diffuser: A Disk Encryption Algorithm for Windows Vista" (PDF). Microsoft. {{cite journal}}: Cite journal requires |journal= (help)

External links

Security in Storage Working Group SISWG.

[1] New Methods in Hard Disk Encryption (PDF)

[2] Encrypting Virtual Memory (Postscript)

[1]

[2]

@@ Line 45: / Line 45: @@
 As of August 2008, XTS is supported by [[dm-crypt]], [[FreeOTFE]], [[TrueCrypt]] and [[OpenBSD]] softraid disk encryption software.
+The [[XTS proof]]<!--http://grouper.ieee.org/groups/1619tmp/1619-2007-NIST-Submission.pdf--> yields strong security guarantees as long as the same key is not used to encrypt much more than 1 terabyte of data. Up until this point, no attack can  succeed with probability better than approximately one in eight quadrillion. However this security guarantee deteriorates as more data is encrypted with the same key. With a petabyte the attack success probability rate decreases to *at most* eight in a trillion, with an exabyte, the success probability is reduced to *at most* eight in a million.
+This means that using XTS, with one key for more than a few hundred terabytes of data opens up the possibility of attacks (and is not mitigated by using a larger AES key size, so using a 256-bit key doesn't change this).
+The decision on the maximum amount to data to be encrypted with a single key using XTS should consider the above together with the practical implication of the attack (which is the ability of the adversary to modiy plaintext of a specific block, where the position of this block may not be under the advisary's control).
 ==CMC and EME==