# RSA (cryptosystem)

(Redirected from RSA algorithm)
RSA
General
First published1977
CertificationPKCS#1, ANSI X9.31, IEEE 1363
Cipher detail
Key sizes2,048 to 4,096 bit typical
Rounds1
Best public cryptanalysis
General number field sieve for classical computers;
Shor's algorithm for quantum computers.
An 829-bit key has been broken.

RSA (Rivest–Shamir–Adleman) is a public-key cryptosystem, one of the oldest, that is widely used for secure data transmission. The acronym "RSA" comes from the surnames of Ron Rivest, Adi Shamir and Leonard Adleman, who publicly described the algorithm in 1977. An equivalent system was developed secretly in 1973 at Government Communications Headquarters (GCHQ) (the British signals intelligence agency) by the English mathematician Clifford Cocks. That system was declassified in 1997.

In a public-key cryptosystem, the encryption key is public and distinct from the decryption key, which is kept secret (private). An RSA user creates and publishes a public key based on two large prime numbers, along with an auxiliary value. The prime numbers are kept secret. Messages can be encrypted by anyone, via the public key, but can only be decoded by someone who knows the prime numbers.

The security of RSA relies on the practical difficulty of factoring the product of two large prime numbers, the "factoring problem". Breaking RSA encryption is known as the RSA problem. Whether it is as difficult as the factoring problem is an open question. There are no published methods to defeat the system if a large enough key is used.

RSA is a relatively slow algorithm. Because of this, it is not commonly used to directly encrypt user data. More often, RSA is used to transmit shared keys for symmetric-key cryptography, which are then used for bulk encryption–decryption.

## History

The idea of an asymmetric public-private key cryptosystem is attributed to Whitfield Diffie and Martin Hellman, who published this concept in 1976. They also introduced digital signatures and attempted to apply number theory. Their formulation used a shared-secret-key created from exponentiation of some number, modulo a prime number. However, they left open the problem of realizing a one-way function, possibly because the difficulty of factoring was not well-studied at the time.

Ron Rivest, Adi Shamir, and Leonard Adleman at the Massachusetts Institute of Technology made several attempts over the course of a year to create a function that was hard to invert. Rivest and Shamir, as computer scientists, proposed many potential functions, while Adleman, as a mathematician, was responsible for finding their weaknesses. They tried many approaches, including "knapsack-based" and "permutation polynomials". For a time, they thought what they wanted to achieve was impossible due to contradictory requirements. In April 1977, they spent Passover at the house of a student and drank a good deal of wine before returning to their homes at around midnight. Rivest, unable to sleep, lay on the couch with a math textbook and started thinking about their one-way function. He spent the rest of the night formalizing his idea, and he had much of the paper ready by daybreak. The algorithm is now known as RSA – the initials of their surnames in same order as their paper.

Clifford Cocks, an English mathematician working for the British intelligence agency Government Communications Headquarters (GCHQ), described an equivalent system in an internal document in 1973. However, given the relatively expensive computers needed to implement it at the time, it was considered to be mostly a curiosity and, as far as is publicly known, was never deployed. His discovery, however, was not revealed until 1997 due to its top-secret classification.

Kid-RSA (KRSA) is a simplified, insecure public-key cipher published in 1997, designed for educational purposes. Some people feel that learning Kid-RSA gives insight into RSA and other public-key ciphers, analogous to simplified DES.

## Patent

A patent describing the RSA algorithm was granted to MIT on 20 September 1983: U.S. Patent 4,405,829 "Cryptographic communications system and method". From DWPI's abstract of the patent:

The system includes a communications channel coupled to at least one terminal having an encoding device and to at least one terminal having a decoding device. A message-to-be-transferred is enciphered to ciphertext at the encoding terminal by encoding the message as a number M in a predetermined set. That number is then raised to a first predetermined power (associated with the intended receiver) and finally computed. The remainder or residue, C, is... computed when the exponentiated number is divided by the product of two predetermined prime numbers (associated with the intended receiver).

A detailed description of the algorithm was published in August 1977, in Scientific American's Mathematical Games column. This preceded the patent's filing date of December 1977. Consequently, the patent had no legal standing outside the United States. Had Cocks's work been publicly known, a patent in the United States would not have been legal either.

When the patent was issued, terms of patent were 17 years. The patent was about to expire on 21 September 2000, but RSA Security released the algorithm to the public domain on 6 September 2000.

## Operation

The RSA algorithm involves four steps: key generation, key distribution, encryption, and decryption.

A basic principle behind RSA is the observation that it is practical to find three very large positive integers e, d, and n, such that with modular exponentiation for all integers m (with 0 ≤ m < n):

$(m^{e})^{d}\equiv m{\pmod {n}}$ and that knowing e and n, or even m, it can be extremely difficult to find d. The triple bar (≡) here denotes modular congruence (which is to say that when you divide (me)d by n and m by n, they both have the same remainder).

In addition, for some operations it is convenient that the order of the two exponentiations can be changed and that this relation also implies

$(m^{d})^{e}\equiv m{\pmod {n}}.$ RSA involves a public key and a private key. The public key can be known by everyone and is used for encrypting messages. The intention is that messages encrypted with the public key can only be decrypted in a reasonable amount of time by using the private key. The public key is represented by the integers n and e, and the private key by the integer d (although n is also used during the decryption process, so it might be considered to be a part of the private key too). m represents the message (previously prepared with a certain technique explained below).

### Key generation

The keys for the RSA algorithm are generated in the following way:

1. Choose two large prime numbers p and q.
• To make factoring harder, p and q should be chosen at random, be both large and have a large difference. For choosing them the standard method is to choose random integers and use a primality test until two primes are found.
• p and q should be kept secret.
2. Compute n = pq.
• n is used as the modulus for both the public and private keys. Its length, usually expressed in bits, is the key length.
• n is released as part of the public key.
3. Compute λ(n), where λ is Carmichael's totient function. Since n = pq, λ(n) = lcm(λ(p), λ(q)), and since p and q are prime, λ(p) = φ(p) = p − 1, and likewise λ(q) = q − 1. Hence λ(n) = lcm(p − 1, q − 1).
• The lcm may be calculated through the Euclidean algorithm, since lcm(ab) = |ab|/gcd(ab).
• λ(n) is kept secret.
4. Choose an integer e such that 2 < e < λ(n) and gcd(e, λ(n)) = 1; that is, e and λ(n) are coprime.
• e having a short bit-length and small Hamming weight results in more efficient encryption – the most commonly chosen value for e is 216 + 1 = 65537. The smallest (and fastest) possible value for e is 3, but such a small value for e has been shown to be less secure in some settings.
• e is released as part of the public key.
5. Determine d as de−1 (mod λ(n)); that is, d is the modular multiplicative inverse of e modulo λ(n).
• This means: solve for d the equation de ≡ 1 (mod λ(n)); d can be computed efficiently by using the extended Euclidean algorithm, since, thanks to e and λ(n) being coprime, said equation is a form of Bézout's identity, where d is one of the coefficients.
• d is kept secret as the private key exponent.

The public key consists of the modulus n and the public (or encryption) exponent e. The private key consists of the private (or decryption) exponent d, which must be kept secret. p, q, and λ(n) must also be kept secret because they can be used to calculate d. In fact, they can all be discarded after d has been computed.

In the original RSA paper, the Euler totient function φ(n) = (p − 1)(q − 1) is used instead of λ(n) for calculating the private exponent d. Since φ(n) is always divisible by λ(n), the algorithm works as well. The possibility of using Euler totient function results also from Lagrange's theorem applied to the multiplicative group of integers modulo pq. Thus any d satisfying de ≡ 1 (mod φ(n)) also satisfies de ≡ 1 (mod λ(n)). However, computing d modulo φ(n) will sometimes yield a result that is larger than necessary (i.e. d > λ(n)). Most of the implementations of RSA will accept exponents generated using either method (if they use the private exponent d at all, rather than using the optimized decryption method based on the Chinese remainder theorem described below), but some standards such as FIPS 186-4 (Section B.3.1) may require that d < λ(n). Any "oversized" private exponents not meeting this criterion may always be reduced modulo λ(n) to obtain a smaller equivalent exponent.

Since any common factors of (p − 1) and (q − 1) are present in the factorisation of n − 1 = pq − 1 = (p − 1)(q − 1) + (p − 1) + (q − 1), it is recommended that (p − 1) and (q − 1) have only very small common factors, if any, besides the necessary 2.[failed verification][failed verification]

Note: The authors of the original RSA paper carry out the key generation by choosing d and then computing e as the modular multiplicative inverse of d modulo φ(n), whereas most current implementations of RSA, such as those following PKCS#1, do the reverse (choose e and compute d). Since the chosen key can be small, whereas the computed key normally is not, the RSA paper's algorithm optimizes decryption compared to encryption, while the modern algorithm optimizes encryption instead.

### Key distribution

Suppose that Bob wants to send information to Alice. If they decide to use RSA, Bob must know Alice's public key to encrypt the message, and Alice must use her private key to decrypt the message.

To enable Bob to send his encrypted messages, Alice transmits her public key (n, e) to Bob via a reliable, but not necessarily secret, route. Alice's private key (d) is never distributed.

### Encryption

After Bob obtains Alice's public key, he can send a message M to Alice.

To do it, he first turns M (strictly speaking, the un-padded plaintext) into an integer m (strictly speaking, the padded plaintext), such that 0 ≤ m < n by using an agreed-upon reversible protocol known as a padding scheme. He then computes the ciphertext c, using Alice's public key e, corresponding to

$c\equiv m^{e}{\pmod {n}}.$ This can be done reasonably quickly, even for very large numbers, using modular exponentiation. Bob then transmits c to Alice. Note that at least nine values of m will yield a ciphertext c equal to m, but this is very unlikely to occur in practice.

### Decryption

Alice can recover m from c by using her private key exponent d by computing

$c^{d}\equiv (m^{e})^{d}\equiv m{\pmod {n}}.$ Given m, she can recover the original message M by reversing the padding scheme.

### Example

Here is an example of RSA encryption and decryption. The parameters used here are artificially small, but one can also use OpenSSL to generate and examine a real keypair.

1. Choose two distinct prime numbers, such as
$p=61$ and $q=53$ .
2. Compute n = pq giving
$n=61\times 53=3233.$ 3. Compute the Carmichael's totient function of the product as λ(n) = lcm(p − 1, q − 1) giving
$\lambda (3233)=\operatorname {lcm} (60,52)=780.$ 4. Choose any number 1 < e < 780 that is coprime to 780. Choosing a prime number for e leaves us only to check that e is not a divisor of 780.
Let $e=17$ .
5. Compute d, the modular multiplicative inverse of e (mod λ(n)), yielding
$d=413,$ as $1=(17\times 413){\bmod {7}}80.$ The public key is (n = 3233, e = 17). For a padded plaintext message m, the encryption function is

{\begin{aligned}c(m)&=m^{e}{\bmod {n}}\\&=m^{17}{\bmod {3}}233.\end{aligned}} The private key is (n = 3233, d = 413). For an encrypted ciphertext c, the decryption function is

{\begin{aligned}m(c)&=c^{d}{\bmod {n}}\\&=c^{413}{\bmod {3}}233.\end{aligned}} For instance, in order to encrypt m = 65, one calculates

$c=65^{17}{\bmod {3}}233=2790.$ To decrypt c = 2790, one calculates

$m=2790^{413}{\bmod {3}}233=65.$ Both of these calculations can be computed efficiently using the square-and-multiply algorithm for modular exponentiation. In real-life situations the primes selected would be much larger; in our example it would be trivial to factor n = 3233 (obtained from the freely available public key) back to the primes p and q. e, also from the public key, is then inverted to get d, thus acquiring the private key.

Practical implementations use the Chinese remainder theorem to speed up the calculation using modulus of factors (mod pq using mod p and mod q).

The values dp, dq and qinv, which are part of the private key are computed as follows:

{\begin{aligned}d_{p}&=d{\bmod {(}}p-1)=413{\bmod {(}}61-1)=53,\\d_{q}&=d{\bmod {(}}q-1)=413{\bmod {(}}53-1)=49,\\q_{\text{inv}}&=q^{-1}{\bmod {p}}=53^{-1}{\bmod {6}}1=38\\&\Rightarrow (q_{\text{inv}}\times q){\bmod {p}}=38\times 53{\bmod {6}}1=1.\end{aligned}} Here is how dp, dq and qinv are used for efficient decryption (encryption is efficient by choice of a suitable d and e pair):

{\begin{aligned}m_{1}&=c^{d_{p}}{\bmod {p}}=2790^{53}{\bmod {6}}1=4,\\m_{2}&=c^{d_{q}}{\bmod {q}}=2790^{49}{\bmod {5}}3=12,\\h&=(q_{\text{inv}}\times (m_{1}-m_{2})){\bmod {p}}=(38\times -8){\bmod {6}}1=1,\\m&=m_{2}+h\times q=12+1\times 53=65.\end{aligned}} ### Signing messages

Suppose Alice uses Bob's public key to send him an encrypted message. In the message, she can claim to be Alice, but Bob has no way of verifying that the message was from Alice, since anyone can use Bob's public key to send him encrypted messages. In order to verify the origin of a message, RSA can also be used to sign a message.

Suppose Alice wishes to send a signed message to Bob. She can use her own private key to do so. She produces a hash value of the message, raises it to the power of d (modulo n) (as she does when decrypting a message), and attaches it as a "signature" to the message. When Bob receives the signed message, he uses the same hash algorithm in conjunction with Alice's public key. He raises the signature to the power of e (modulo n) (as he does when encrypting a message), and compares the resulting hash value with the message's hash value. If the two agree, he knows that the author of the message was in possession of Alice's private key and that the message has not been tampered with since being sent.

This works because of exponentiation rules:

$h=\operatorname {hash} (m),$ $(h^{e})^{d}=h^{ed}=h^{de}=(h^{d})^{e}\equiv h{\pmod {n}}.$ Thus the keys may be swapped without loss of generality, that is, a private key of a key pair may be used either to:

1. Decrypt a message only intended for the recipient, which may be encrypted by anyone having the public key (asymmetric encrypted transport).
2. Encrypt a message which may be decrypted by anyone, but which can only be encrypted by one person; this provides a digital signature.

## Proofs of correctness

### Proof using Fermat's little theorem

The proof of the correctness of RSA is based on Fermat's little theorem, stating that ap − 1 ≡ 1 (mod p) for any integer a and prime p, not dividing a.[note 1]

We want to show that

$(m^{e})^{d}\equiv m{\pmod {pq}}$ for every integer m when p and q are distinct prime numbers and e and d are positive integers satisfying ed ≡ 1 (mod λ(pq)).

Since λ(pq) = lcm(p − 1, q − 1) is, by construction, divisible by both p − 1 and q − 1, we can write

$ed-1=h(p-1)=k(q-1)$ for some nonnegative integers h and k.[note 2]

To check whether two numbers, such as med and m, are congruent mod pq, it suffices (and in fact is equivalent) to check that they are congruent mod p and mod q separately.[note 3]

To show medm (mod p), we consider two cases:

1. If m ≡ 0 (mod p), m is a multiple of p. Thus med is a multiple of p. So med ≡ 0 ≡ m (mod p).
2. If m $\not \equiv$ 0 (mod p),
$m^{ed}=m^{ed-1}m=m^{h(p-1)}m=(m^{p-1})^{h}m\equiv 1^{h}m\equiv m{\pmod {p}},$ where we used Fermat's little theorem to replace mp−1 mod p with 1.

The verification that medm (mod q) proceeds in a completely analogous way:

1. If m ≡ 0 (mod q), med is a multiple of q. So med ≡ 0 ≡ m (mod q).
2. If m $\not \equiv$ 0 (mod q),
$m^{ed}=m^{ed-1}m=m^{k(q-1)}m=(m^{q-1})^{k}m\equiv 1^{k}m\equiv m{\pmod {q}}.$ This completes the proof that, for any integer m, and integers e, d such that ed ≡ 1 (mod λ(pq)),

$(m^{e})^{d}\equiv m{\pmod {pq}}.$ 