The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product.
Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p and r, and q ≠ r. (This has happened .) Then you can easily find p. All three primes can be obtained in less than a millisecond as illustrated by the Python script below.
In a way it would be better to share both your primes with someone else than to share just one. If both your primes are the same as someone else’s, you could read each other’s messages, but a third party attacker could not. If you’re lucky, the person you share primes with doesn’t know that you share primes or isn’t interested in reading your mail. But if that person’s private key is ever disclosed, so is yours.
Nearly all the time required to run the script is devoted to finding the primes, so we time just the factorization.
from secrets import randbits from sympy import nextprime, gcd from timeit import default_timer as timer numbits = 2048 p = nextprime(randbits(numbits)) q = nextprime(randbits(numbits)) r = nextprime(randbits(numbits)) m = p*q n = p*r t0 = timer() g = gcd(m, n) assert(p == g) assert(q == m//p) assert(r == n//p) t1 = timer() # Print time in milliseconds print(1000*(t1 - t0))
There are a few noteworthy things about the Python code.
First, it uses a cryptographic random number generator, not the default generator, to create 2048-bit random numbers.
Second, it uses the portable
default_timer which chooses the appropriate timer on different operating systems. Note that it returns wall clock time, not CPU time.
Finally, it uses integer division
// to recover q and r. If you use the customary division operator Python will carry out integer division and attempt to cast the result to a float, resulting in the error message “OverflowError: integer division result too large for a float.”
GCD vs factoring
If you wanted to find the greatest common divisor of two small numbers by hand, you’d probably start by factoring the two numbers. But as Euclid knew, you don’t have to factor two numbers before you can find the greatest common factor of the numbers. For large numbers, the Euclidean algorithm is orders of magnitude faster than factoring.
Nobody has announced being able to factor a 1024 bit number yet. The number m and n above have four times as many bits. The largest of the RSA numbers factored so far has 768 bits. This was done in 2009 and took approximately two CPU millennia, i.e. around 2000 CPU years.
Fast Euclidean algorithm
The classic Euclidean algorithm takes O(n²) operations to find the greatest common divisor of two integers of length n. But there are modern sub-quadratic variations on the Euclidean algorithm that take
O(n log²(n) log(log(n)))
operations, which is much faster for very large numbers. I believe SymPy is using the classic Euclidean algorithm, but it could transparently switch to using the fast Euclidean algorithm for large inputs.
 As noted in the comments, this has happened due to faulty software implementation, but it shouldn’t happen by chance. By the prime number theorem, the number of n-bit primes is approximately 2n / (n log 2). If M people choose 2M primes each n bits long, the probability of two of these primes being the same is roughly
22-n M n log 2.
If M = 7.5 billion (the current world population) and n = 1024, the probability of two primes being the same is about 10-295. This roughly the probability of winning the Powerball jackpot 40 times in a row.