## RSA setup

Recall the setup for RSA encryption given in the previous post.

- Select two very large prime numbers
*p*and*q*. - Compute
*n*=*pq*and φ(*n*) = (*p*– 1)(*q*– 1). - Choose an encryption key
*e*relatively prime to φ(*n*). - Calculate the decryption key
*d*such that*ed*= 1 (mod φ(*n*)). - Publish
*e*and*n*, and keep*d*,*p*, and*q*secret.

φ is Euler’s totient function, defined here.

There’s a complication in the first step. Maybe you *think* the numbers *p* and *q* are prime, but they’re not. In that case the calculation of φ in step 2 is wrong.

## Pseudoprimes

The numbers *p* and *q* need to be “very large,” where the definition of what constitutes large changes over time due to Moore’s law and progress with factoring algorithms. Currently *p* and *q* would typically have at least 2048 bits each. It’s easy to find numbers that large that are *probably* prime, but it’s difficult to be certain.

At the moment, the fastest way to test for primes has a small chance of making a false positive error, but no chance of a false negative. That is, if the test says a number is composite, it is certainly composite. But if the test says a number may be prime, there is a small chance that it is not prime. (Details here.) So in practice RSA starts by finding two large **probable primes** or **pseudoprimes**.

Discussions of RSA encryption often point out that large pseudoprimes are very rare, so it isn’t a problem that RSA starts with pseudoprimes. But what does that mean? Is there a one in a trillion chance that your private key won’t work and nobody can send you messages? Or that you can receive some messages and not others?

## Encryption and decryption

RSA encryption works by taking a message *m* and raising it to the exponent *e* modulo *n* where *e* and *n* are defined at the top of the post. To decrypt the message, you raise it to the exponent *d* modulo *n* where *d* is your private decryption key. Because *d* was computed to satisfy

*ed* = 1 (mod φ(*n*)),

Euler’s theorem says that we’ll get our message back. We’ll give an example below.

## What if *p* or *q* are pseudoprime?

If *p* and *q* are prime, then φ(*n*) = (*p* − 1)(*q* − 1). But if we’re wrong in our assumption that one of these factors is prime, our calculation of φ(*n*) will be wrong. Will our encryption and decryption process work anyway? Maybe.

We’ll do three examples below, all using small numbers. We’ll use *e* = 257 as our public encryption exponent and *m* = 42 as the message to encrypt in all examples.

In the first example *p* and *q* are indeed prime, and everything works as it should. In the next two examples we will replace *p* with a pseudoprime. In the second example everything works despite our error, but in the third example decryption fails.

### Example 1: *p* and *q* prime

We’ll start with *p* = 337 and *q* = 283. Both are prime. The Python code below shows that *d* = 60833 and the encrypted message is 45431. Everything works as advertised.

### Example 2: *p* pseudoprime

Now we use *p* = 341 and *q* = 283. Here *p* is a pseudoprime for base 2, i.e.

2^{340} = 1 mod 341

and so 341 passes Fermat’s primality test [1] for *b* = 2. Now *d* = 10073 and the encrypted message is 94956. Decrypting the encrypted message works, even though *p* is not prime and our calculation for φ is wrong. In fact, the process works not just for our message *m* = 42 but for every message.

### Example 3: *p* pseudoprime

Here again we let *p* be the pseudoprime 341 but set *q* to the prime 389. Now *d* = 6673, the encrypted message is 7660, but decrypting the encrypted message returns 55669, not 42 as we started with. Decryption fails for all other messages as well.

If we use the correct value for φ(*pq*) the example works. That is, if we use

φ(341*389) = φ(11*31*389) = 10*30*388

rather than the incorrect value 340*388, the decryption process recovers our original message.

## What can go wrong

The examples above show that if we mistakenly think one of our numbers is a prime when it is only a pseudoprime, decryption might succeed or it might fail. In either case, we assume *n* = *pq* has two prime factors when in fact it has more, and so *n* is easier to factor than we thought.

## Python code

Here’s the code for the examples above.

from sympy import gcd, mod_inverse message = 42 e = 257 def example(p, q): n = p*q phi = (p-1)*(q-1) assert( gcd(e, phi) == 1 ) d = mod_inverse(e, phi) assert( d*e % phi == 1 ) encrypted = pow(message, e, n) decrypted = pow(encrypted, d, n) return (message == decrypted) print(example(337, 283)) print(example(341, 283)) print(example(341, 389))

## Related posts

[1] Fermat’s primality test is explained here. In our example we only tried one base, *b* = 2. If we had also tried *b* = 3 we would have discovered that 341 is not prime. In practice one would try several different bases. Using the heuristic that failures are independent for each base, it’s very unlikely that a composite number would be a pseudoprime for each of, say, 5o different bases.

So you have one example where all messages correctly decrypt and one where all fail.

Is that general? If so it would be easy to add an additional test when picking the modulus. How hard is it to find a modulus where some (or most) messages decrypt correctly, but other messages fail?