Time needed to factor large integers

The optimal choice of factoring algorithm depends on the size of the number you want to factor. For numbers larger than a googol (10100) the GNFS (general number field sieve) algorithm is the fastest known factoring algorithm, making GNFS the algorithm of choice for trying to factor public keys for RSA encryption

The expected time required to factor a number n using the GNFS is proportional to

exp( f(n) g(n) )

where f(n) is relatively constant and g(n) varies more strongly with n. More specifically

f(n) = (64/9)1/3 + o(1)

and

g(n) = (log n)1/3 (log log n)2/3.

The notation o(1) means a term that goes to zero as n increases. Very often in practice you can ignore o(1) terms when your value of n is large. In cryptographic applications n is certainly large, at least 21024, and yet the o(1) term is still significant for values of n seen in practice. I’ve seen one source say that for keys used in practice the o(1) term is around 0.27.

Security levels

It is widely stated that factoring a 1024-bit private key is comparable in difficulty to breaking an 80-bit symmetric encryption key, i.e. that 1024-bit keys provide 80-bit security.

To find the security level of a 1024-bit RSA key, the size of the o(1) term matters. But given this, if we want to find the security level of more RSA key sizes, it doesn’t matter how big the o(1) term is. It only that the term is roughly constant over the range of key sizes we are interested in.

If f(n) is relatively constant, then the log of the time to factor n is proportional to g(n), and the log of the time to break an encryption with security level k is proportional to k. So the security level of a key n should be proportional to g(n). Let’s see if that’s the case, using the reported security levels of various key sizes.

import numpy as np

# RSA key sizes and security levels
data = {
    1024 : 80,
    2048 : 112,
    3072 : 128,
    4096 : 140,
    7680 : 192,
}
for d in data:
    x = d*np.log(2) # natural log of 2^d
    y = x**(1/3)*(np.log(x)**(2/3)) 
    print(data[d]/y)

This prints the following:

2.5580
2.6584
2.5596
2.4819
2.6237

So the ratio is fairly constant, as expected. All the reported security levels are multiples of 4, which suggests there was some rounding done before reporting them. This would account for some of the variation above.

The output above suggests that the security level of an RSA key with b bits is roughly

2.55 x1/3 log(x)2/3

where x = log(2) b.

Aside on RSA security

RSA encryption is not broken by factoring keys but by exploiting implementation flaws.

Factoring a 2048-bit RSA key would require more energy than the world produces in a year, as explained here.

log log x

This post will include some simple but useful calculations.

Throughout this post, a and b are real numbers greater than 1.

All logs are proportional

For any two bases a and b, log base a and log base b are proportional, and in fact the proportionality constant is logb a. We have

\log_b(x) = \log_b(a) \log_a(x)

or to put it another way

\frac{\log_b(x)}{\log_a(x)} = \log_b(a)

As a corollary, O( loga(x) ) = O( logb(x) ) as x → ∞. So if you say something is O(log n) and someone asks “log to what base?” the answer is “it doesn’t matter.”

Double logs are asymptotically proportional

Since loga(x) is proportional to logb(x), is loga( loga(n) ) proportional to logb( logb(n) )? No, but sorta. A little algebra shows that

\frac{\log_b( \log_b(x) )}{\log_a( \log_a(x) )} = \log_b(a) + \frac{\log_a(\log_b (a)) \log_b (a)}{ \log_a(\log_a(x))}

The second term on the right hand side goes to zero as x → ∞, and so we can write

\frac{\log_b( \log_b(x) )}{\log_a( \log_a(x) )} = \log_b(a) + o(1)

The double logs to different bases are not proportional, but they become more nearly proportional as x grows larger. As a corollary, O( loga( loga(x) ) )= O( logb( logb(x) ) ).

Note that the o(1) term goes to zero very slowly, like c / log (log x). If you’re proving a theorem in which x → ∞, you may be able to ignore this term. But for finite x, the o(1) term may be substantial, especially when a and b are far apart.

Mixed logs

I was reading a paper the other day that frequently iterated logs to different bases, which was kind of confusing. The mix could be eliminated with the following equation.

\log_a(\log_b(x)) = \log_a(b) \log_b(\log_b(x))

So mixed logs are proportional to applying the log to the inner base twice.

Checking the algebra

The algebra above is a little tedious and error-prone. Here’s some code to give complementary validation of the calculations.

from math import log
# Note that log(x, b) = log_b(x)

a, b = 3, 5

for x in [9, 29, 2025]:
    u = log(log(x, b), b) / log(log(x, a), a)
    v = log(a, b) + log(log(a, b), a)*log(a, b) / log(log(x, a), a)
    assert( abs(u - v) < 1e-15 )

    u = log(log(x, b), a)
    v = log(log(x, b), b) * log(b, a)
    assert( abs(u - v) < 1e-15 )

Related posts

Fitting a double exponential function to three points

The previous post needed to fit a double exponential function to data, a function of the form

a \exp(b \exp(cx))

By taking logs, we have a function of the form

f(x) = a + b \exp(cx)

where the a in the latter equation is the log of the a in the former.

In the previous post I used a curve fitting function from SciPy to find the parameters ab, and c. The solution in the post works, but I didn’t show all the false starts along the way. I ran into problems with overflow and with the solution method not converging before I came up with the right formulation of the problem and a good enough initial guess at the solution.

This post will look at fitting the function more analytically. I’ll still need to solve an equation numerically, but an equation in only one variable, not three.

If you need to do a regression, fitting a function with noisy data to more than three points, you could use the code below with three chosen data points to find a good starting point for a curve-fitting method.

Solution

Suppose we have x1 < x2 < x3 and we want to find a, b, and c such that

\begin{align*} y_1 &= a + b \exp(c x_1) \\ y_2 &= a + b \exp(c x_2) \\ y_3 &= a + b \exp(c x_3) \\ \end{align*}

where y1 < y2 < y3.

Subtract the equations pairwise to eliminate a:

\begin{align*} y_2 - y_1 &= b (\exp(c x_2) - \exp(c x_1)) \\ y_3 - y_1 &= b (\exp(c x_3) - \exp(c x_1)) \\ \end{align*}

Divide these to eliminate b:

\frac{y_2 - y_1}{y_3 - y_1} = \frac{\exp(c x_2) - \exp(c x_1)}{\exp(c x_3) - \exp(c x_1)}

Factor out exp(cx1) from the right side:

\frac{y_2 - y_1}{y_3 - y_1} = \frac{1 - \exp(c (x_2 - x_1))}{1 - \exp(c (x_3 - x_1))}

Now the task is to solve

L = \frac{1 - \exp(c d_2)}{1 - \exp(c d_3)}

where

\begin{align*} L &= \frac{y_2 - y_1}{y_3 - y_1} \\ d_2 &= x_2 - x_1 \\ d_3 &= x_3 - x_1 \end{align*}

Note that L, d2, and d3 are all positive.

Once we find c numerically,

b = \frac{y_2 - y_1}{\exp(c x_2) - \exp(c x_1)}

and

a = y_1 - b \exp(c x_1)

Python code

Here’s a Python implementation to illustrate and test the derivation above.

from scipy import exp
from scipy.optimize import newton

def fit(x1, x2, x3, y1, y2, y3):
    assert(x1 < x2 < x3)
    assert(y1 < y2 < y3)

    d2 = x2 - x1
    d3 = x3 - x1
    L = (y2 - y1)/(y3 - y1)
    g = lambda x: L - (1 - exp(x*d2))/(1 - exp(x*d3))
    c = newton(g, 0.315)

    b = (y2 - y1)/(exp(c*x2) - exp(c*x1))
    a = y1 - b*exp(c*x1)
    return a, b, c

a, b, c = fit(9, 25, 28, 42, 55, 92)
print([a + b*exp(c*x) for x in [9, 25, 28]])

Extrapolating quantum factoring

In 2001, a quantum computer was able to factor 15.

In 2012, a quantum computer was able to factor 21, sorta. They cheated by not implementing gates that they knew a priori would not be used. To this day, there still hasn’t been a quantum computer to factor 21 without taking some shortcuts. Some reasons why are given here.

Extrapolating from two data points is kinda ridiculous, but we only have two data points, and we’re going to do it anyway.

xkcd 605: Extrapolating

Some may object that extrapolating the size of the numbers that can be factored isn’t the right approach. Instead we should be extrapolating the number of qubits or something else. Fine, but that’s not what we’re doing in this post.

Linear interpolation

Using linear interpolation, a quantum computer wouldn’t be able to factor a cryptographically relevant prime number before the heat death of the universe.

Exponential interpolation

Let’s switch to exponential extrapolation. Instead of assuming the size of the numbers grows linearly, we’ll assume the number of bits in the numbers grows linearly, meaning the numbers grow exponentially.

In about 10 years, the size of number that could be factored increased by

21/15 = 1.4 ≈ √2.

This means the size of the numbers increased by about half a bit in a decade. By now we should be able to factor 35.

RSA keys have at least 1024 bits, so at half a bit per decade, quantum computers should be able to factor RSA keys 20,000 years from now.

Double exponential interpolation

Now let’s assume the number of bits in a number that a quantum computer can factor is itself growing exponentially, meaning that the numbers are growing doubly exponentially, doubling every decade. Then we should be able to factor 9-bit numbers by now, and in 100 years we will be able to factor RSA keys.

Breaking RSA in 10 years

The estimate above is kinda arbitrary because a double exponential function has three degrees of freedom and so we’d need three data points, which we don’t have.

We can fit a double exponential by making up a third data point. Some researchers speculate that a quantum computer will be able to factor 1024-bit RSA keys 10 years from now. If so, that gives us a third data point.

To make things easier, let’s work in terms of bits and set x to be the number of years since 2000. Then we have

f(x) = ab 2cx

with f(1) = 15, f(12) = 21, and f(35) = 1024. We can fit this function using Python [1].

from scipy.optimize import curve_fit

def f(x, a, b, c):
    return a +  b * 2**(c*x)

x_data = [1.0, 12.0, 35.0]
y_data = [log2(15), log2(21), 1024]  
initial_guess =  [4, 0.01, 0.3]

popt, pcov = curve_fit(f, x_data, y_data, p0=initial_guess, maxfev=10000)
a, b, c = popt
print(f"Fitted parameters: {a=}, {b=}, {c=}")

The parameter values are a = 3.893887005243357, b = 0.0093347992761344, c = 0.4782191085655488.

Here’s a plot of the fitted function f.

If we’re going to be able to factor 1024-bit integers by 2035, according to a double exponential model, we’re already behind schedule. We should be factoring 40-bit numbers by now, but so far we’ve only factored a 4-bit number.

Or to put it another way, those who believe we will be able to factor 1024-bit integers by 2035 are implicitly assuming a future rate of growth faster than double exponential. And maybe they’re right.

If we’re going to break 1024-bit RSA keys by 2035, at some point we’ll have to do better than the curve above, and so far we’re doing worse.

Bookmark this post and come back when a new quantum factoring record is set and rerun the code with new data.

Related posts

[1] The curve_fit function needed a lot of help to work. See the next post for a better way to fit the function.

Post-quantum RSA with gargantuan keys

If and when practical scalable quantum computers become available, RSA encryption would be broken, at least for key sizes currently in use. A quantum computer could use Shor’s algorithm factor n-bit numbers in time on the order of n². The phrase “quantum leap” is misused and overused, but this would legitimately be a quantum leap.

That said, Shor’s method isn’t instantaneous, even on a hypothetical machine that does not yet exist and may never exist. Daniel Bernstein estimates that RSA encryption with terabyte public keys would be secure even in a post-quantum world.

Bernstein said on a recent podcast that he isn’t seriously suggesting using RSA with terabyte keys. Computing the necessary key size is an indirect way of illustrating how impractical post-quantum RSA would be.

Related posts

Conway’s pinwheel tiling

John Conway discovered a right triangle that can be partitioned into five similar triangles. The sides are in proportion 1 : 2 : √5.

You can make a larger similar triangle by making the entire triangle the central (green) triangle of a new triangle.

Here’s the same image with the small triangles filled in as in the original.

Repeating this process creates an aperiodic tiling of the plane.

The tiling was discovered by Conway, but Charles Radin was the first to describe it in a publication [1]. Radin attributes the tiling to Conway.

Alternate visualization

It would be easiest to illustrate the tiling if we were standing together in a room and placing new triangles on the floor, watching the tiling expand. Given the limitations of a screen, it may be easier to visualize subdividing the triangle rather than tiling the plane.

Imagine the smallest triangles are a constant size and at each step we’re viewing the process from further away. We see a constant size outer triangle at each step, but the triangle is growing and covering the plane.

Here’s an animated GIF of the process.

 

Related posts

[1] Charles Radin. “The Pinwheel Tilings of the Plane.” Annals of Mathematics, vol. 139, no. 3, 1994, pp. 661–702.

Silent Payments

Bitcoin transactions appear to be private because names are not attached to accounts. But that is not sufficient to ensure privacy; if it were, much of my work in data privacy would be unnecessary. It’s quite possible to identify people in data that does not contain any direct identifiers.

I hesitate to use the term pseudonymization because people define it multiple ways, but I’d say Bitcoin addresses provide pseudonymization but not necessarily deidentification.

Privacy and public ledgers are in tension. The Bitcoin ledger is superficially private because it does not contain user information, only addresses [1]. However, transaction details are publicly recorded on the blockchain.

To add some privacy to Bitcoin, addresses are typically used only once. Wallet software generates new addresses for each transaction, and so it’s not trivial to track all the money flowing between two people. But it’s not impossible either. Forensic tools make it possible to identify parties behind blockchain transactions via metadata and correlating information on the blockchain with information available off-chain.

Silent Payments

Suppose you want to take donations via Bitcoin. If you put a Bitcoin address on your site, that address has to be permanent, and it’s publicly associated with you. It would be trivial to identify (the addresses of) everyone who sends you a donation.

Silent payments provide a way to work around this problem. Alice could send donations to Bob repeatedly without it being obvious who either party is, and Bob would not have to change his site. To be clear, there would be no way to tell from the addresses that funds are moving between Alice and Bob. The metadata vulnerabilities remain.

Silent payments are not implemented in Bitcoin Core, but there are partial implementations out there. See BIP 352.

The silent payment proposal depends on ECDH (elliptic curve Diffie-Hellman) key exchange, so I’ll do a digression on ECDH before returning to silent payments per se.

ECDH

The first thing to know about elliptic curves, as far as cryptography is concerned, is that there is a way to add two points on an elliptic curve and obtain a third point. This turns the curve into an Abelian group.

You can bootstrap this addition to create a multiplication operation: given a point g on an elliptic curve and an integer nng means add g to itself n times. Multiplication can be implemented efficiently even when n is an enormous number. However, and this is key, multiplication cannot be undone efficiently. Given g and n, you can compute ng quickly, but it’s practically impossible to start with knowledge of ng and g and recover n. To put it another way, multiplication is efficient but division is practically impossible [2].

Suppose Alice and Bob both agree on an elliptic curve and a point g on the curve. This information can be public. ECDH lets Alice and Bob share a secret as follows. Alice generates a large random integer a, her private key, and computes a public key A = ag. Similarly, Bob generates a large random integer b and computes a public key Bbg. They’re shared secret is

aBbA.

Alice can compute aB because she (alone) knows a and B is public information. Similarly Bob can compute bA. The two points on the curve aB and bA are the same because

aBabgbagbA.

Back to silent payments

ECDH lets Alice and Bob share a secret, a point on a (very large) elliptic curve. This is the heart of silent payments.

In its simplest form, Alice can send Bob funds using the address

PB + hash(aBg.

A slightly more complicated form lets Alice send Bob funds multiple times. The kth time she sends money to Bob she uses the address

PB + hash(aB || kg

where || denotes concatenation.

But how do you know k? You have to search the blockchain to determine k, much like the way hierarchical wallets search the blockchain. Is the address corresponding to k = 0 on the blockchain? Then try again with k = 1. Keep doing this until you find the first unused k. Making this process more efficient is one of the things that is being worked on to fully implement silent payments.

Note that Bob needs to make B public, but B is not a Bitcoin address per se; it’s information needed to generate addresses via the process described above. Actual addresses are never reused.

Related posts

[1] Though if you obtain your Bitcoin through an exchange, KYC laws require them to save a lot of private information.

[2] Recovering n from ng is known as the discrete logarithm problem. It would be more logical to call it the discrete division problem, but if you write the group operation on an elliptic curve as multiplication rather than addition, then it’s a discrete logarithm, i.e. trying to solve for an unknown exponent. If and when a large-scale quantum computer exists, the discrete logarithm problem will be practical to solve, but presumably not until then.

Inverting series that are flat at zero

The previous post looked at solving the equation

2\theta + \sin(2\theta) = \pi \sin( \psi)

which arises from the Mollweide map projection. Newton’s method works well unless φ is near π/2. Using a modified version of Newton’s method makes the convergence faster when φ = π/2, which is kinda useless because we know the solution is $theta; = π/2 there. When φ is near π/2, the modified Newton’s method may diverge. I ended the previous post by saying a series solution would work better when φ is sufficiently close to π/2. This post will flesh that out.

Let x = π − 2θ. Now the task is to solve

x - \sin(x) = y

for small positive values of y.

The left side is

\frac{x^3}{3!} - \frac{x^5}{5!} + \frac{x^7}{7!} + \cdots

and so for very small values of y, and thus very small values of x, we have

x = (6y)^{1/3}

If this solution is not sufficiently accurate, we can invert the power series above to get a power series in y that gives the solution x. However, the Lagrange inversion theorem does not apply because the series has a zero derivative at 0. Instead, we have to use Puiseux series inversion, looking for a series in y1/3 rather than a series in y. From the Puiseux series we can see that

x = (6y)^{1/3} + y/10

is a more accurate solution. For even more accuracy, you can compute more terms of the Puiseux series.

Mollweide map projection and Newton’s method

Mollweide projection

Karl Brandan Mollweide (1774-1825) designed an equal-area map projection, mapping the surface of the earth to an ellipse with an aspect ratio of 2:1. If you were looking at the earth from a distance, the face you’re looking at would correspond to a circle in the middle of the Mollweide map [1]. The part of the earth that you can’t see is mapped to the rest of the ellipse.

The lines of latitude are not quite evenly spaced; some distortion is required to achieve equal area. Instead, latitude φ corresponds to θ on the map according to the equation

2\theta + \sin(2\theta) = \pi \sin( \varphi)

There is no closed-form solution for θ as a function of φ and so the equation must be solved numerically. For each φ we have to find the root of

f(\theta) = 2\theta + \sin(2\theta) - \pi \sin(\varphi)

Here’s a plot of θ as a function of φ.

Newton’s method

Newton’s method works efficiently (converges quadratically) except when φ is close to π/2. The reason Newton’s method slows down for high latitudes is that f and its derivative are both at π/2, i.e. f has a double root there.

Newton’s method slows down from quadratic to linear convergence near a double root, but there is a minor modification that maintains quadratic convergence near a double root.

x_{n+1} = x_n - m\frac{f(x_n)}{f^\prime(x_n)}

When m = 1 this is the usual form of Newton’s method. Setting m = 2 tunes the method for double roots.

The modified version of Newton’s method with m = 2 works when the root you’re trying to is exactly a double root. However, if you’re trying to find a root near a double root, setting m = 2 can cause the method to diverge, so you have to be very careful with changing m.

Illustration

Here’s a version of Newton’s method, specialized for the Mollweide equation. We can specify the value of m, and the initial estimate of the root defaults to θ = φ.

from numpy import sin, cos, pi

def mynewton(phi, m, x0 = None):
    x = x0 if x0 else phi
    delta = 1
    iterations = 0
    
    while abs(delta) > 1e-12:
        fx = 2*x + sin(2*x) - pi*sin(phi)
        fprime = 2 + 2*cos(2*x)
        delta = m*fx/fprime
        x -= delta
        iterations += 1

    return (x, iterations)

Far from the double root

Say φ = 0.25 and m = 1. When we use the default initial estimate Newton’s method converges in 5 iterations.

Near the double root

Next, say φ = π/2 − 0.001. Again we set m = 1 and use the default initial estimate. Now Newton’s method takes 16 iterations. The convergence is slower because we’re getting close to the double root at π/2.

But if we switch to m = 2, Newton’s method diverges. Slow convergence is better than no convergence, so we’re better off sticking with m = 1.

On the double root

Now let’s say φ = π/2. In this case the default guess is exactly correct, but the code divides by zero and crashes. So let’s take our initial estimate to be π/2 − 0.001. When m = 1, Newton’s method takes 15 steps to converge. When m = 2, it converges in 6 steps, which is clearly faster. However, both cases the returned result is only good to 6 decimal places, even though the code is looking for 12 decimal places.

This shows that as we get close to φ = π/2, we have both a speed and an accuracy issue.

A better method

I haven’t worked this out, but here’s how I imagine I would fix this. I’d tune Newton’s method, preventing it from taking steps that are too large, so that the method is accurate for φ in the range [0, π/2 − ε] and use a different method, such as power series inversion, for φ in [π/2 − ε, π/2].

Update: The next post works out the series solution alluded to above.

Related posts

[1] Someone pointed out on Hacker News that this statement could be confusing. The perspective you’d see from a distance is not the Mollweide projection—that would not be equal area—but it corresponds to the same region. As someone put it on HN “a circle in the center corresponds to one half of the globe.”

1420 MHz

There are four manmade object that have left our solar system: Pioneer 10, Pioneer 11, Voyager 1, and Voyager 2.

Yesterday I wrote about the Golden Record which is on both of the Voyager probes. Some of the graphics on the cover of the Golden Record is also on plaques attached to the Pioneer probes. In particular, the two plaques and the two record covers have the following image.

Pioneer plaque hydrogen diagram

This image is essential to decoding the rest of the images. It represents a “spin flip transition” of a neutral hydrogen atom. This event has a frequency of 1420.405 MHz, corresponding to a wavelength of about 21 cm. The little hash mark is meant to indicate that this is a unit of measurement, both of length (21 cm) and time (the time it takes light to travel 21 cm).

The thought behind the unit of measurement was that it might make sense to an extraterrestrial being. Frequencies associated with hydrogen atoms are more universal than seconds (which ultimately derive from the period of Earth’s rotation) or meters (which derive from the circumference of the Earth).

The Wow! signal

The frequency 1420 MHz is special, so special that scientists believed advanced civilizations would recognize it. That’s why an astronomer, Jerry R. Ehman, was excited to observe a strong signal at that frequency on August 15, 1977. (Incidentally, this was five days before Voyager 2 was launched.)

Ehman wrote “Wow!” on a computer printout of the signal, and the signal has become known as “the Wow! signal.”

Ehman thought this signal might have come from an extraterrestrial intelligence for the same reason we were using its frequency in our message to potential extraterrestrials.

Wow! 6DQUJ5

Related posts