Rainbow table

A rainbow table is an efficient way to store data that has been computed in advance to facilitate cracking passwords. To protect stored passwords from compromise in case of a data breach, organizations avoid storing them directly, instead transforming them using a scrambling function – typically a cryptographic hash. One line of attack against this protection is to precompute the hashes of likely or possible passwords, and then store them in a dataset. However, such a dataset can become too big as the range of possible passwords grows. Rainbow tables address this problem by storing chains of possible passwords to save space. Undoing the chains takes significant computation time, but overall this tradeoff makes certain classes of attacks practical.

Rainbow tables partition a function (the hash), whose domain is a set of values and whose codomain is a set of keys derived from those values, into chains such that each chain is an alternating sequence of values and keys, followed by a final value. Each item in the chain is derived from the previous item so that the chain may be algorithmically reproduced from the first value in the chain: A key derivation function produces a key from a preceding value, and a reduction function produces a value from a preceding key. The first value and last value of each chain are precomputed and stored, making the chain a row in a virtual table where each even-numbered field contains a value and each odd-numbered field contains the corresponding key. Such a table may be used to discover a secret value (password) given its associated key. It is a practical example of a space–time tradeoff, requiring less processing but using more storage than a brute-force attack which calculates a key on each iteration, but requiring more processing and less storage than a simple table. Use of a key derivation function that employs a salt makes rainbow tables infeasible for recovering a secret value from a key.

Rainbow tables were invented by Philippe Oechslin[1] as an application of an earlier, simpler algorithm by Martin Hellman.[2]

Background

For user authentication, a password is stored either as plaintext or as a key produced from an algorithm that usually involves a hash function. Since a password stored as plaintext may be easily stolen by an actor who gains access to storage, a key corresponding to the password is typically stored instead. Thus, no one – including the authentication system – can learn a password merely by looking at stored value.

When a user enters a password for authentication, a key is computed for it and then compared to the key stored for that user. Authentication fails if the two keys do not match; moreover, authentication would equally fail if a key were entered directly as a password, since a different key would be computed for it.

To learn a password from a key is to find a string which, when input into the key derivation function, creates that same key. This is the same as inverting the key derivation function.

Although a brute-force attack (e.g. dictionary attack) might be used to attempt to invert a key derivation function, brute-force becomes infeasible when the set of possible passwords is large enough. An alternative to brute-force is to use precomputed hash chain tables. Rainbow tables are a special kind of such table that overcome certain technical difficulties.

Etymology

The term, rainbow tables, was first used in Oechslin's initial paper. The term refers to the way different reduction functions are used to increase the success rate of the attack. The original method by Hellman uses many small tables with a different reduction function each. Rainbow tables are much bigger and use a different reduction function in each column. When colors are used to represent the reduction functions, a rainbow appears in the rainbow table. Figure 2 of Oechslin's paper contains a black-and-white graphic that illustrates how these sections are related. For his presentation at the Crypto 2003 conference, Oechslin added color to the graphic in order to make the rainbow association more clear. The enhanced graphic that was presented at the conference is shown to the right.

Precomputed key chains

Given a key derivation function K and a finite set of secret values V, the goal is to precompute a map that, given any output k of the key derivation function, may be used to locate an element v in V such that K(v) = k, or determine that there is no such v in V. The simplest way to do this is to compute K(v) for all v in V, but then storing the table requires Θ(|V|n) bits of space, where |V| is the size of the set V and n is the size of an output of K, which is prohibitive for large |V|. Chains are a technique for decreasing this space requirement. To create a chain, a function, called a reduction function is used. Given a key, it produces a value in V. It is not necessary that the reduction function produce exactly the inverse map of the key derivation function. It is an independent function with the domain and codomain of the key derivation function swapped. A chain of alternating values and keys is formed. For example, where V is the set of lowercase alphabetic 6-character values, and keys were 32 bits long, a chain might look like this:

${\displaystyle {\color {Red}{\mathtt {aaaaaa}}}\,{\xrightarrow[{\;K\;}]{}}\,{\mathtt {281DAF40}}\,{\xrightarrow[{\;R\;}]{}}\,{\mathtt {sgfnyd}}\,{\xrightarrow[{\;K\;}]{}}\,{\mathtt {920ECF10}}\,{\xrightarrow[{\;R\;}]{}}\,{\color {Violet}{\mathtt {kiebgt}}}}$

To generate the table, choose a random set of initial values from V, compute chains having some fixed length t, storing the first and last value of each chain as the starting point and endpoint, respectively. In the example chain above, "aaaaaa" is the starting point and "kiebgt" is the endpoint.

Now, given a key k to invert (find the corresponding secret value for), compute its own chain starting with k by applying R, then K, then R, and so on, searching at each step for a match between the current value and an endpoint in the table. When a match is found, k is almost surely in that chain, which may be reproduced from its starting point, and value immediately k target secret value v. This works because assuming that all possible values are represented in the virtual table whose rows are the chains, then the chain formed from k corresponds to one of those chains, and iterating through it eventually produces the value that is the endpoint for that chain in the table.

For example, given the pair 920ECF10, its chain can be computed by first applying R:

${\displaystyle {\mathtt {920ECF10}}\,{\xrightarrow[{\;R\;}]{}}\,{\color {Violet}{\mathtt {kiebgt}}}}$

Since "kiebgt" is one of the endpoints in our table, the corresponding starting pair "aaaaaa" in the chain may be followed until 920ECF10 is reached:

${\displaystyle {\color {Red}{\mathtt {aaaaaa}}}\,{\xrightarrow[{\;K\;}]{}}\,{\mathtt {281DAF40}}\,{\xrightarrow[{\;R\;}]{}}\,{\mathtt {sgfnyd}}\,{\xrightarrow[{\;\;}]{}}\,{\mathtt {920ECF10}}}$

Thus, the password is "sgfnyd" (or a different password that has the same key).

Note however that this chain does not always contain the key k; it may so happen that the chain starting at k merges with a chain having a different starting point. For example, the chain of key FB107E70, also leads to kiebgt:

${\displaystyle {\mathtt {FB107E70}}\,{\xrightarrow[{\;R\;}]{}}\,{\mathtt {bvtdll}}\,{\xrightarrow[{\;K\;}]{}}\,{\mathtt {0EE80890}}\,{\xrightarrow[{\;R\;}]{}}\,{\color {Violet}{\mathtt {kiebgt}}}}$

But FB107E70 is not in the chain starting at "aaaaaa". This is called a false alarm. In this case, the match is ignored and the chain of k is extended looking for another match. If the chain of k gets extended to length t with no good matches, then the password was never produced in any of the chains.

It is not necessary to invert keys to create the table, which is created once and then repeatedly used for the lookups. Increasing the length of the chain decreases the size of the table. However, it also increases the time required to perform lookups, and this is the time-memory trade-off of the rainbow table. In a simple case of one-item chains, the lookup is very fast, but the table is very big. Once chains get longer, the lookup slows, but the table size goes down.

Simple hash chains have several flaws, the most serious of which is that if two chains collide (produce the same value), they will merge and consequently the table will not cover as many passwords despite having the same computational cost to generate. Because chains are not stored in their entirety, this is impossible to detect efficiently. For example, if the third value in chain 3 matches the second value in chain 7, the two chains will cover almost the same sequence of values, but their final values will not be the same. The key derivation function K is unlikely to produce collisions as it is usually considered an important security feature not to do so, but the reduction function R, because of its need to correctly cover the likely plaintexts, cannot be collision resistant.

Other difficulties result from the importance of choosing the correct function for R. Picking R to be the identity is little better than a brute force approach. Only when the attacker has a good idea of what the likely plaintexts will be they can choose a function R that makes sure time and space are only used for likely plaintexts, not the entire space of possible passwords. In effect R shepherds the results of prior key calculations back to likely plaintexts but this benefit comes with the drawback that R likely won't produce every possible plaintext in the class the attacker wishes to check denying certainty to the attacker that no passwords came from their chosen class. Also it can be difficult to design the function R to match the expected distribution of plaintexts.[2]

Rainbow tables

Rainbow tables effectively solve the problem of collisions with ordinary hash chains by replacing the single reduction function R with a sequence of related reduction functions R1 through Rk. In this way, for two chains to collide and merge they must hit the same value on the same iteration: consequently, the final values in these chain will be identical. A final postprocessing pass can sort the chains in the table and remove any "duplicate" chains that have the same final values as other chains. New chains are then generated to fill out the table. These chains are not collision-free (they may overlap briefly) but they will not merge, drastically reducing the overall number of collisions.[citation needed]

Using sequences of reduction functions changes how lookup is done: because the key of interest may be found at any location in the chain, it's necessary to generate k different chains. The first chain assumes the key is in the last key position and just applies Rk; the next chain assumes the hash value is in the second-to-last hash position and applies Rk−1, then K, then Rk; and so on until the last chain, which applies all the reduction functions, alternating with K. This creates a new way of producing a false alarm: an incorrect "guess" of the position of the hash value may needlessly evaluate a chain.

Although rainbow tables have to follow more chains, they make up for this by having fewer tables: simple hash chain tables cannot grow beyond a certain size without rapidly becoming inefficient due to merging chains; to deal with this, they maintain multiple tables, and each lookup must search through each table. Rainbow tables can achieve similar performance with tables that are k times larger, allowing them to perform a factor of k fewer lookups.

Example

1. Starting from the key ("re3xes") in the image below, one computes the last reduction used in the table and checks whether the password appears in the last column of the table (step 1).
2. If the test fails (rambo doesn't appear in the table), one computes a chain with the two last reductions (these two reductions are represented at step 2)
Note: If this new test fails again, one continues with 3 reductions, 4 reductions, etc. until the password is found. If no chain contains the password, then the attack has failed.
3. If this test is positive (step 3, linux23 appears at the end of the chain and in the table), the password is retrieved at the beginning of the chain that produces linux23. Here we find passwd at the beginning of the corresponding chain stored in the table.
4. At this point (step 4), one generates a chain and compares at each iteration the key with the target key. The test is valid and we find the key re3xes in the chain. The current value (culture) is the one that produced the target key (re3xes): The attack is successful.

Rainbow tables use a refined algorithm with a different reduction function for each "link" in a chain, so that when there is a key collision in two or more chains, the chains will not merge as long as the collision doesn't occur at the same position in each chain. This increases the probability of a correct crack for a given table size, at the cost of squaring the number of steps required per lookup, as the lookup routine now also needs to iterate through the index of the first reduction function used in the chain.[1]

Rainbow tables are specific to the key function they were created for e.g., MD5 tables can crack only MD5 hashes. The theory of this technique was invented by Philippe Oechslin[3] as a fast form of time/memory tradeoff,[1] which he implemented in the Windows password cracker Ophcrack. The more powerful RainbowCrack program was later developed that can generate and use rainbow tables for a variety of character sets and hashing algorithms, including LM hash, MD5, and SHA-1.

In the simple case where the reduction function and the key have no collision, given a complete rainbow table (one that makes sure to find the corresponding password given any hash) the size of the password set |P|, the time T that had been needed to compute the table, the length of the table L and the average time t needed to find a value matching a given key are directly related:[citation needed]

${\displaystyle T=|P|}$
${\displaystyle t={\frac {|P|}{2L}}}$

Thus the 8-character lowercase alphanumeric passwords case (|P| ≃ 3×1012) would be easily tractable with a personal computer while the 16-character lowercase alphanumeric passwords case (|P| ≃ 1025) would be completely intractable.

Defense against rainbow tables

A rainbow table is ineffective against one-way hashes that include large salts. For example, consider a password hash that is generated using the following function (where "+" is the concatenation operator):

saltedhash(password) = hash(password + salt)

Or

saltedhash(password) = hash(hash(password) + salt)

The salt value is not secret and may be generated at random and stored with the password hash. A large salt value prevents precomputation attacks, including rainbow tables, by ensuring that each user's password is hashed uniquely. This means that two users with the same password will have different password hashes (assuming different salts are used). In order to succeed, an attacker needs to precompute tables for each possible salt value. The salt must be large enough, otherwise an attacker can make a table for each salt value. For older Unix passwords which used a 12-bit salt this would require 4096 tables, a significant increase in cost for the attacker, but not impractical with terabyte hard drives. The SHA2-crypt and bcrypt methods—used in Linux, BSD Unixes, and Solaris—have salts of 128 bits.[4] These larger salt values make precomputation attacks against these systems infeasible for almost any length of a password. Even if the attacker could generate a million tables per second, they would still need billions of years to generate tables for all possible salts.

Another technique that helps prevent precomputation attacks is key stretching. When stretching is used, the salt, password, and some intermediate hash values are run through the underlying hash function multiple times to increase the computation time required to hash each password.[5] For instance, MD5-Crypt uses a 1000 iteration loop that repeatedly feeds the salt, password, and current intermediate hash value back into the underlying MD5 hash function.[4] The user's password hash is the concatenation of the salt value (which is not secret) and the final hash. The extra time is not noticeable to users because they have to wait only a fraction of a second each time they log in. On the other hand, stretching reduces the effectiveness of brute-force attacks in proportion to the number of iterations because it reduces the number of attempts an attacker can perform in a given time frame. This principle is applied in MD5-Crypt and in bcrypt.[6] It also greatly increases the time needed to build a precomputed table, but in the absence of salt, this needs only be done once.

An alternative approach, called key strengthening, extends the key with a random salt, but then (unlike in key stretching) securely deletes the salt. This forces both the attacker and legitimate users to perform a brute-force search for the salt value.[7] Although the paper that introduced key stretching[8] referred to this earlier technique and intentionally chose a different name, the term "key strengthening" is now often (arguably incorrectly) used to refer to key stretching.

Rainbow tables and other precomputation attacks do not work against passwords that contain symbols outside the range presupposed, or that are longer than those precomputed by the attacker. However, tables can be generated that take into account common ways in which users attempt to choose more secure passwords, such as adding a number or special character. Because of the sizable investment in computing processing, rainbow tables beyond fourteen places in length are not yet common. So, choosing a password that is longer than fourteen characters may force an attacker to resort to brute-force methods.[citation needed]

Specific intensive efforts focused on LM hash, an older hash algorithm used by Microsoft, are publicly available. LM hash is particularly vulnerable because passwords longer than 7 characters are broken into two sections, each of which is hashed separately. Choosing a password that is fifteen characters or longer guarantees that an LM hash will not be generated.[9]

Common uses

Nearly all distributions and variations of Unix, Linux, and BSD use hashes with salts, though many applications use just a hash (typically MD5) with no salt. The Microsoft Windows NT/2000 family uses the LAN Manager and NT LAN Manager hashing method (based on MD4) and is also unsalted, which makes it one of the most popularly generated tables. Rainbow tables have seen reduced usage as of 2020 as salting is more common and GPU-based brute force attacks have become more practical. However, rainbow tables are available for eight and nine character NTLM passwords.[10]

Notes

1. ^ a b c Oechslin, P. (2003). "Making a Faster Cryptanalytic Time-Memory Trade-Off" (PDF). Advances in Cryptology - CRYPTO 2003. LNCS. Vol. 2729. pp. 617–630. doi:10.1007/978-3-540-45146-4_36. ISBN 978-3-540-40674-7.
2. ^ a b Hellman, M. (1980). "A cryptanalytic time-memory trade-off" (PDF). IEEE Transactions on Information Theory. 26 (4): 401–406. CiteSeerX 10.1.1.120.2463. doi:10.1109/TIT.1980.1056220. ISSN 0018-9448.
3. ^ "LASEC - Security and Cryptography Laboratory: Dr Philippe Oechslin - Research". Faculté I&C - School of Computer and Communication Sciences. March 2004.
4. ^ a b Alexander, Steven (June 2004). "Password Protection for Modern Operating Systems" (PDF). Login. USENIX Association. 29 (3).
5. ^ Ferguson, Neils; Bruce Schneier (2003). Practical Cryptography. Indianapolis: John Wiley & Sons. ISBN 978-0-471-22357-3.
6. ^ Provos, Niels; Mazières, David (June 6, 1999). "A Future-Adaptable Password Scheme" (PDF). Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference. Monterey, CA, USA: USENIX Association.
7. ^ Manber, U. (1996). "A simple scheme to make passwords based on one-way functions much harder to crack" (PDF). Computers & Security. 15 (2): 171–176. CiteSeerX 10.1.1.102.2597. doi:10.1016/0167-4048(96)00003-X.
8. ^ Kelsey, J.; Schneier, B.; Hall, C.; Wagner, D. (1998). "Secure applications of low-entropy keys" (PDF). Information Security. LNCS. Vol. 1396. p. 121. doi:10.1007/BFb0030415. ISBN 978-3-540-64382-1.
9. ^
10. ^ "A Case for Modern Rainbow Table Usage". rainbowcrackalack.com. Positron Security. 26 February 2021.