NP vs small P

The P vs NP conjecture has always seemed a little odd to me. Or rather the interpretation of the conjecture that any problem in P is tractable. How reassuring is to to know a problem can be solved in time less than some polynomial function of the size of the input if that polynomial has extremely high degree?

But this evening I ran across a fascinating comment by Avi Wigderson [1] that puts the P vs NP conjecture in a different light:

Very few known polynomial time algorithms for natural problems have exponents above 3 or 4 (even though at discovery the initial exponent may have been 30 or 40).

Problems in P may be more tractable in practice than in (current) theory. Wigderson’s comment suggests that if you can find any polynomial time algorithm, your chances are improved that you can find a small-order polynomial time algorithm. It seems there’s something deep going on here that would be hard to formalize.

Related posts

[1] From his new book Mathematics and Computation

Inverse trig function implementations

Programming languages differ in the names they use for inverse trig functions, and in some cases differ in their return values for the same inputs.

Choice of range

If sin(θ) = x, then θ is the inverse sine of x, right? Maybe. Depends on how you define the inverse sine. If -1 ≤ x ≤ 1, then there are infinitely many θ’s whose sine is x. So you have to make a choice.

The conventional choice is for inverse sine to return a value of θ in the closed interval

[-π/2, π/2].

Inverse tangent uses the same choice. However, the end points aren’t possible outputs, so inverse tangent typically returns a value in the open interval

(-π/2, π/2).

For inverse cosine, the usual choice is to return values in the closed interval

[0, π].

Naming conventions

Aside from how to define inverse trig functions, there’s the matter of what to call them. The inverse of tangent, for example, can be written tan-1, arctan, or atan. You might even see arctg or atn or some other variation.

Most programming languages I’m aware of use atan etc. Python uses atan, but SciPy uses arctan. Mathematica uses ArcTan.

Software implementations typically follow the conventions above, and will return the same value for the same inputs, as long as inputs and outputs are real numbers.

Arctangent with two arguments

Many programming languages have a function atan2 that takes two arguments, y and x, and returns an angle whose tangent is y/x. Note that the y argument to atan2 comes first: top-down order, numerator then denominator, not alphabetical order.

However, unlike the one-argument atan function, the return value may not be in the interval (-π/2, π/2). Instead, atan2 returns an angle in the same quadrant as x and y. For example,

    atan2(-1, 1)

returns -π/4; because the point (1, -1) is in the 4th quadrant, it returns an angle in the 4th quadrant. But

    atan2(1, -1)

returns 3π/4. Here the point (-1, 1) and the angle 3π/4 are in the 2nd quadrant.

In Common Lisp, there’s no function named atan2; you just call atan with two arguments. Mathematica is similar: you call ArcTan with two arguments. However, Mathematica uses the opposite convention and takes x as its first argument and y as its second.

Complex values

Some software libraries support complex numbers and some do not. And among those that do support complex numbers, the return values may differ. This is because, as above, you have choices to make when extending these functions to complex inputs and outputs.

For example, in Python,

    math.asin(2)

returns a domain error and

    scipy.arcsin(2)

returns 1.5707964 + 1.3169578i.

In Common Lisp,

    (asin 2)

returns 1.5707964 – 1.3169578i. Both SciPy and Common Lisp return complex numbers whose sine equals 2, but they follow different conventions for what number to chose. That is, both

    scipy.sin(1.5707964 - 1.3169578j)
    scipy.sin(1.5707964 + 1.3169578j)

in Python and

    (sin #C(1.5707964 -1.3169578))
    (sin #C(1.5707964 +1.3169578))

in Common Lisp both return 2, aside from rounding error.

In C99, the function casin (complex arc sine) from <complex.h>

    double complex z = casin(2);

returns 1.5707964 + 1.3169578i. The Mathematica call

    ArcSin[2.]

returns 1.5707964 – 1.3169578i.

Branch cuts

The Common Lisp standardization committee did a very careful job of defining math functions for complex arguments. I’ve written before about this here. You don’t need to know anything about Lisp to read that post.

The committee ultimately decided to first rigorously define the two-argument arctangent function and bootstrap everything else from there. The post above explains how.

Other programming language have made different choices and so may produce different values, as demonstrated above. I mention the Common Lisp convention because they did a great job of considering every detail, such as how to handle +0 and -0 in IEEE floating point.

The hardest logarithm to compute

Suppose you want to compute the natural logarithms of every floating point number, correctly truncated to a floating point result. Here by floating point number we mean an IEEE standard 64-bit float, what C calls a double. Which logarithm is hardest to compute?

We’ll get to the hardest logarithm shortly, but we’ll first start with a warm up problem. Suppose you needed to compute the first digit of a number z. You know that z equals 2 + ε where ε is either 10-100 or -10-100. If ε is positive, you should return 2. If ε is negative, you should return 1. You know z to very high precision a priori, but to return the first digit correctly, you need to compute z to 100 decimal places.

In this post we’re truncating floating point numbers, i.e. rounding down, but similar considerations apply when rounding to nearest. For example, if you wanted to compute 0.5 + ε rounded to the nearest integer.

In general, it may not be possible to know how accurately you need to compute a value in order to round it correctly. William Kahan called this the table maker’s dilemma.

Worst case for logarithm

Lefèvre and Muller [1] found that the floating point number whose logarithm is hardest to compute is

x = 1.0110001010101000100001100001001101100010100110110110 × 2678

where the fractional part is written in binary. The log of x, again written in binary, is

111010110.0100011110011110101110100111110010010111000100000000000000000000000000000000000000000000000000000000000000000111 …

The first 53 significant bits, all the bits a floating point number can represent [2], are followed by 65 zeros. We have to compute the log with a precision of more than 118 bits in order to know whether the last bit of the fraction part of the float representation should be a 0 or a 1. Usually you don’t need nearly that much precision, but this is the worst case.

Mathematica code

Here is Mathematica code to verify the result above. I don’t know how to give Mathematica floating point numbers in binary, but I know how to give it integers, so I’ll multiply the fraction part by 252 and take 52 from the exponent.

n = FromDigits[
  "10110001010101000100001100001001101100010100110110110", 2]
x = n  2^(678 - 52)
BaseForm[N[Log[x], 40], 2]

This computes the log of x to 40 significant decimal figures, between 132 and 133 bits, and gives the result above.

Related posts

A few days ago I wrote about another example by Jean-Michel Muller: A floating point oddity.

William Kahan also came up recently in my post on summing an array of floating point numbers.

***

[1] Vincent Lefèvre, Jean-Michel Muller. Worst Cases for Correct Rounding of the Elementary Functions in Double Precision. November 2000. Research Report 2000-35. Available here.

[2] There are 52 bits in the the faction of an IEEE double. The first bit of a fraction is 1, unless a number is subnormal, so the first bit is left implicit, squeezing out one extra bit of precision. That is, storing 52 bits gives us 53 bits of precision.

Permutation distance

If you have two sequences, and one is a permutation of the other, how do you measure how far apart the two sequences are?

This post was motivated by the previous post on anagram statistics. For a certain dictionary, the code used in that post found that the longest anagram pairs were the following:

certifications, rectifications
impressiveness, permissiveness
teaspoonsful, teaspoonfuls

Anagrams are more amusing when the two words are more dissimilar, at least in my opinion.

There are several ways to measure (dis)similarity. The words in the first two pairs above are dissimilar in meaning, but the words in the last pair are synonyms [1]. The pairs of words above are not that dissimilar as character strings.

It would be hard to quantify how dissimilar two words are in meaning, but easier to quantify how dissimilar they are as sequences of characters. We’ll look at two approaches: Hamming distance and transposition distance.

Hamming distance

The Hamming distance between two words is the number of positions in which they differ. We can see the Hamming distances between the words above by viewing them in a fixed-width font.

certifications
rectifications

impressiveness
permissiveness

teaspoonsful
teaspoonfuls

The Hamming distances above are 2, 5, and 4.

The anagrams with maximum Hamming distance in my dictionary are “imperfections” and “perfectionism.” (There’s something poetic about these words being anagrams.) The Hamming distance is 13 because these 13-letter words differ in every position.

imperfections
perfectionism

Transposition distance

Another way to measure the distance between two permutations is how many elements would have to be swapped to turn one list into the other. This is called the transposition distance. For example, the transposition distance between “certifications” and “rectifications” is 1, because you only need to swap the first and third characters to turn one into the other.

It’s not so obvious what the transposition distance is between “impressiveness” and “permissiveness”. We can find an upper bound on the distance by experimentation. The two words only differ in the first five letters, so the distance between “impressiveness” and “permissiveness” is no larger than the distance between “impre” and “permi”. The latter pair of words are no more than 4 transpositions apart as the following sequence shows:

impre
pmire
peirm
perim
permi

But is there a shorter sequence of transpositions? Would it help if we used the rest of the letters “ssiveness” as working space?

Computing transposition distance is hard, as in NP-hard. This is unfortunate since transposition has more practical applications than just word games. It is used, for example, in genetic research. It may be practical to compute for short sequences, but in general one must rely on bounds and approximations.

Note that transposition distance allows swapping any two elements. If we only allowed swapping consecutive elements, the problem is much easier, but the results are not the same. When restricted to consecutive swaps, the distance between “certifications” and “rectifications” is 3 rather than 1. We can swap the “c” and the “r” to turn “cer” into “rec” so we have to do something like

cer
ecr
erc
rec

I don’t know a name for the distance where you allow rotations. The words “teaspoonsful” and “teaspoonfuls” differ by one rotation, turning “sful” into “fuls.” While this can be done in one rotation, it would take several swaps.

Related posts

[1] Although “teaspoonfuls” is far more common, I remember a schoolmarm teaching us that this is incorrect.

And I still recoil at “brother in laws” and say “brothers in law” instead; the majority is on my side on this one.

Summing an array of floating point numbers

Adding up an array of numbers is simple: just loop over the array, adding each element to an accumulator.

Usually that’s good enough, but sometimes you need more precision, either because your application need a high-precision answer, or because your problem is poorly conditioned and you need a moderate-precision answer [1].

There are many algorithms for summing an array, which may be surprising if you haven’t thought about this before. This post will look at Kahan’s algorithm, one of the first non-obvious algorithms for summing a list of floating point numbers.

We will sum a list of numbers a couple ways using C’s float type, a 32-bit integer. Some might object that this is artificial. Does anyone use floats any more? Didn’t moving from float to double get rid of all our precision problems? No and no.

In fact, the use of 32-bit float types is on the rise. Moving data in an out of memory is now the bottleneck, not arithmetic operations per se, and float lets you pack more data into a giving volume of memory. Not only that, there is growing interest in floating point types even smaller than 32-bit floats. See the following posts for examples:

And while sometimes you need less precision than 32 bits, sometimes you need more precision than 64 bits. See, for example, this post.

Now for some code. Here is the obvious way to sum an array in C:

    float naive(float x[], int N) {
        float s = 0.0;
        for (int i = 0; i < N; i++) {
            s += x[i];
        }
        return s;
    }

Kahan’s algorithm is not much more complicated, but it looks a little mysterious and provides more accuracy.

    float kahan(float x[], int N) {
        float s = x[0];
        float c = 0.0;
        for (int i = 1; i < N; i++) {
            float y = x[i] - c;
            float t = s + y;
            c = (t - s) - y;
            s = t;
        }
        return s;
    }

Now to compare the accuracy of these two methods, we’ll use a third method that uses a 64-bit accumulator. We will assume its value is accurate to at least 32 bits and use it as our exact result.

    double dsum(float x[], int N) {
        double s = 0.0;
        for (int i = 0; i < N; i++) {
            s += x[i];
        }
        return s;
    }

For our test problem, we’ll use the reciprocals of the first 100,000 positive integers rounded to 32 bits as our input. This problem was taken from [2].

    #include <stdio.h&gt:
    int main() {
        int N = 100000;
        float x[N];
        for (int i = 1; i <= N; i++)
            x[i-1] = 1.0/i;
  
        printf("%.15g\n", naive(x, N));
        printf("%.15g\n", kahan(x, N));  
        printf("%.15g\n", dsum(x, N));
    }

This produces the following output.

    12.0908508300781
    12.0901460647583
    12.0901461953972

The naive method is correct to 6 significant figures while Kahan’s method is correct to 9 significant figures. The error in the former is 5394 times larger. In this example, Kahan’s method is as accurate as possible using 32-bit numbers.

Note that in our example we were not exactly trying to find the Nth harmonic number, the sum of the reciprocals of the first N positive numbers. Instead, we were using these integer reciprocals rounded to 32 bits as our input. Think of them as just data. The data happened to be produced by computing reciprocals and rounding the result to 32-bit floating point accuracy.

But what if we did want to compute the 100,000th harmonic number? There are methods for computing harmonic numbers directly, as described here.

Related posts

[1] The condition number of a problem is basically a measure of how much floating point errors are multiplied in the process of computing a result. A well-conditioned problem has a small condition number, and a ill-conditioned problem has a large condition number. With ill-conditioned problems, you may have to use extra precision in your intermediate calculations to get a result that’s accurate to ordinary precision.

[2] Handbook of Floating-Point Arithmetic by Jen-Michel Muller et al.

Arbitrary precision is not a panacea

Having access to arbitrary precision arithmetic does not necessarily make numerical calculation difficulties go away. You still need to know what you’re doing.

Before you extend precision, you have to know how far to extend it. If you don’t extend it enough, your answers won’t be accurate enough. If you extend it too far, you waste resources. Maybe you’re OK wasting resources, so you extend precision more than necessary. But how far is more than necessary?

As an example, consider the problem of finding values of n such that tan(n) > n discussed a couple days ago. After writing that post, someone pointed out that these values of n are one of the sequences on OEIS. One of the values on the OEIS site is

k = 1428599129020608582548671.

Let’s verify that tan(k) > k. We’ll use our old friend bc because it supports arbitrary precision. Our k is a 25-digit number, so lets tell bc we want to work with 30 decimal places to give ourselves a little room. (bc automatically gives you a little more room than you ask for, but we’ll ask for even more.)

bc doesn’t have a tangent function, but it does have s() for sine and c() for cosine, so we’ll compute tangent as sine over cosine.

    $ bc -l
    scale = 30
    k = 1428599129020608582548671
    s(k)/c(k) - k

We expect this to return a positive value, but instead it returns

-1428599362980017942210629.31…

So is OEIS wrong? Or is bc ignoring our scale value? It turns out neither is true.

You can’t directly compute the tangent of a large number. You use range reduction to reduce it to the problem of computing the tangent of a small angle where your algorithms will work. Since tangent has period π, we reduce k mod π by computing k – ⌊k/π⌋π. That is, we subtract off as many multiples of π as we can until we get a number between 0 and π. Going back to bc, we compute

    pi = 4*a(1)
    k/pi

This returns

    454737226160812402842656.500000507033221370444866152761

and so we compute the tangent of

    t = 0.500000507033221370444866152761*pi
      = 1.570797919686740002588270178941

Since t is slightly larger than π/2, the tangent will be negative. We can’t have tan(t) greater than k because tan(t) isn’t even greater than 0. So where did things break down?

The calculation of pi was accurate to 30 significant figures, and the calculation of k/pi was accurate to 30 significant figures, given our value of pi. So far has performed as promised.

The computed value of k/pi is correct to 29 significant figures, 23 before the decimal place and 6 after. So when we take the fractional part, we only have six significant figures, and that’s not enough. That’s where things go wrong. We get a value ⌊k/π⌋ that’s greater than 0.5 in the seventh decimal place while the exact value is less than 0.5 in the 25th decimal place. We needed 25 – 6 = 19 more significant figures.

This is the core difficulty with floating point calculation: subtracting nearly equal numbers loses precision. If two numbers agree to m decimal places, you can expect to lose m significant figures in the subtraction. The error in the subtraction will be small relative to the inputs, but not small relative to the result.

Notice that we computed k/pi to 29 significant figures, and given that output we computed the fractional part exactly. We accurately computed ⌊k/π⌋π, but lost precision when we computed we subtracted that value from k.

Our error in computing k – ⌊k/π⌋π was small relative to k, but not relative to the final result. Our k is on the order of 1025, and the error in our subtraction is on the order of 10-7, but the result is on the order of 1. There’s no bug in bc. It carried out every calculation to the advertised accuracy, but it didn’t have enough working precision to produce the result we needed.

Related posts

Formatting numbers at the command line

The utility numfmt, part of Gnu Coreutils, formats numbers. The main uses are grouping digits and converting to and from unit suffixes like k for kilo and M for mega. This is somewhat useful for individual invocations, but like most command line utilities, the real value is using it as part of pipeline.

The --grouping option will separate digits according to the rules of your locale. So on my computer

    numfmt --grouping 123456789

returns 123,456,789. On a French computer, it would return 123.456.789 because the French use commas as decimal separators and use periods to group digits [1].

You can also use numfmt to convert between ordinary numbers and numbers with units. Unfortunately, there’s some ambiguity regarding what units like kilo and mega mean. A kilogram is 1,000 grams, but a kilobyte is 210 = 1,024 bytes. (Units like kibi and mebi were introduced to remove this ambiguity, but the previous usage is firmly established.)

If you want to convert 2M to an ordinary number, you have to specify whether you mean 2 × 106 or 2 × 220. For the former, use --from=si (for Système international d’unités) and for the latter use --from=iec (for International Electrotechnical Commission).

    $ numfmt --from=si 2M
    2000000
    $ numfmt --from=iec 2M
    2097152 

One possible gotcha is that the symbol for kilo is capital K rather than lower case k; all units from kilo to Yotta use a capital letter. Another is that there must be no space between the numerals and the suffix, e.g. 2G is legal but 2 G is not.

You can use Ki for kibi, Mi for mebi etc. if you use --from=iec-i.

    $ numfmt --from=iec-i 2Gi  
    2147483648   

To convert from ordinary numbers to numbers with units use the --to option.

    $ numfmt --to=iec 1000000 
    977K  

Related links

[1] I gave a presentation in France years ago. Much to my chagrin, the software I was demonstrating had a hard-coded assumption that the decimal separator was a period. I had tested the software on a French version of Windows, but had only entered integers and so I didn’t catch the problem.

To make matters worse, there was redundant input validation. Entering 3.14 rather than 3,14 would satisfy the code my team wrote, but the input was first validated by a Windows API which rejected 3.14 as invalid in the user’s locale.

Linear feedback shift registers

The previous post looked at an algorithm for generating De Bruijn sequences B(k, n) where k is a prime number. These are optimal sequences that contain every possible consecutive sequence of n symbols from an alphabet of size k. As noted near the end of the post, the case k = 2 is especially important in application, i.e. binary sequences.

If we set k = 2, the generating algorithm is an example of a linear feedback shift register (LFSR) sequence.

Here’s the algorithm from the previous post:

  1. If (x1, x2, …, xn ) = (0,0, …, 0), return c1.
  2. If (x1, x2, …, xn ) = (1,0, …, 0), return 0.
  3. Otherwise return c1x1 + c2x2 + … cnxn mod k.

Note that the last line says to return a linear combination of the previous symbols. That is, we operate on the latest n symbols, saving them to registers. We take the feedback from the previous outputs and compute a linear combination. We then shift the register content over by one and add the new output on the end. Hence the name “linear feedback shift register” sequence.

Note that the LFSR algorithm is a linear operator over the field GF(2), except for the special cases in steps 1 and 2. The algorithm couldn’t be entirely linear because it would get stuck; It would produce nothing but zeros forevermore once it encountered an input sequence of all zeros. So technically a LFSR is an “nearly always linear feedback shift register.” It’s linear for 2n – 2 inputs and nonlinear for 2 special inputs.

A LFSR is more general than a binary De Bruijn sequence. For some choices of linear coefficients the output is a De Bruijn sequence, but not for others. If you associate the linear coefficients with polynomial coefficient (with a sign change, as noted in the previous post) then the LFSR sequence is a De Bruijn sequence if and only if the polynomial is a primitive polynomial of degree n.

A few months ago I wrote about LFSRs in the context of random number generation. LFSRs make very efficient random number generators, but they’re not appropriate for cryptography because their linear structure makes them vulnerable to attack. The idea of shrinking generators is use one random number generator to sample another generator. The two generators can both be LFSRs, but the combined generator is nonlinear because the sampling mechanism is nonlinear. The net result is that you can combine two efficient but insecure generators to create a new generator that is efficient and secure.

Related posts

Generating De Bruijn cycles with primitive polynomials

Last week I wrote about a way to use De Bruijn cycles. Today I’ll write about a way to generate De Bruijn cycles.

De Bruijn cycles

Start with an alphabet of k symbols. B(k, n) is the set of cycles of that contain every sequence of n symbols, and is as short as possible. Since there are kn possible sequences of n symbols, and every one corresponds to some starting position in the De Bruijn cycle, an element of B(k, n) has to have at least kn symbols. In fact, the elements of B(k, n) have exactly kn symbols.

It’s not obvious that B(k, n) should be non-empty for all k and n, but it is. And there are algorithms that will take a k and an n and return an element of B(k, n).

The post from last week gave the example of a sequence in B(4, 3) that contains all triples of DNA base pairs:

AAACAAGAATACCACGACTAGCAGGAGTATCATGATTCCCGCCTCGGCGTCTGCTTGGGTGTTT

Generating De Bruijn cycles

When k is a prime number, i.e. we’re working over an alphabet with a prime number of symbols, it is particularly simple generate De Bruijn sequences [1]. For example, let’s work over the alphabet {0, 1, 2}. Then the following code will produce the next symbol in a sequence in B(3, 4).

    def B34(a,b,c,d):
        if (a,b,c,d) == (0,0,0,0):
            return 1
        if (a,b,c,d) == (1,0,0,0):
            return 0
        return (a+b) % 3

We can initialize the sequence wherever we like since it produces a cycle, but if we start with 0000 we get the following:

000010011012110021020122101011112220112120002002202122001201021120202222111022121

You can verify that every sequence of four elements from {0, 1, 2} is in there somewhere, provided you wrap the end around. For example, 1000 can be found by starting in the last position.

Where did the algorithm above come from? How would you create an analogous algorithm for other values of k and n?

The algorithm goes back to Willem Mantel in 1894, and can be found, for example, in Knuth’s TAOCP Volume 4. Here is Mantel’s algorithm to generate an element of B(k, n) where k is prime. The function takes the latest n symbols in the De Bruijn cycle and returns the next symbol.

  1. If (x1, x2, …, xn ) = (0,0, …, 0), return c1.
  2. If (x1, x2, …, xn ) = (1,0, …, 0), return 0.
  3. Otherwise return c1x1 + c2x2 + … cnxn mod k.

In our example above, c1 = c2 = 1 and c3 = c4 = 0, but how do you find the c‘s in general?

Primitive polynomials

To find the c‘s, first find a primitive polynomial of degree n over the prime field with k elements. Then the c‘s are the coefficients of the polynomial, with a sign change, if you write the polynomial in the following form.

x^n - c_n x^{n-1} - \cdots - c_2x - c_1

In the example above, I started with the polynomial

x^4 + 2x + 2

We can read the coefficients off this polynomial: c1 = c2 = -2 and c3 = c4 = 0. Since -1 and 2 are the same working mod 3, I used c1 = c2 = 1 above.

Backing up a bit, what is a primitive polynomial and how do you find them?

A primitive polynomial of degree n with coefficients in GF(k), the finite field with k elements, has leading coefficient 1 and has a root α that generates the multiplicative group of GF(kn). That is, every nonzero element of GF(kn) can be written as a power of α.

In my example, I found a primitive polynomial in GF(34) by typing polynomials into Mathematica until I found one that worked.

    In[1]:= PrimitivePolynomialQ[x^4 + 2 x + 2, 3]

    Out[1]= True

Since coefficients can only be 0, 1, or 2 when you’re working mod 3, it only took a few guesses to find a primitive polynomial [2].

Brute force guessing works fine k and n are small, but clearly isn’t practical in general. There are algorithms for searching for primitive polynomials, but I’m not familiar with them.

The case where k = 2 and n may be large is particularly important in applications, and you can find where people have tabulated primitive binary polynomials, primitive polynomials in GF(2n). It’s especially useful to find primitive polynomials with a lot of zero coefficients because, for example, this leads to less computation when producing De Bruijn cycles.

Finding sparse primitive binary polynomials is its own field of research. See, for example, Richard Brent’s project to find primitive binary trinomials, i.e. primitive binary polynomials with only three non-zero coefficients.

More on binary sequences in the next post on linear feedback shift registers.

***

[1] The same algorithm can be used when k is not prime but a prime power because you can equate sequence of length n from an alphabet with k = pm elements with a sequence of length mn from an alphabet with p elements. For example, 4 is not prime, but we could have generated a De Bruijn sequence for DNA basepair triples by looking for binary sequences of length 6 and using 00 = A, 01 = C, 10 = G, and 11 = T.

[2] The number of primitive polynomials in GF(pn) is φ(pn – 1)/m where φ is Euler’s totient function. So in our case there were 8 polynomials we could have found.

Quantum supremacy and post-quantum crypto

Google announced today that it has demonstrated “quantum supremacy,” i.e. that they have solved a problem on a quantum computer that could not be solved on a classical computer. Google says

Our machine performed the target computation in 200 seconds, and from measurements in our experiment we determined that it would take the world’s fastest supercomputer 10,000 years to produce a similar output.

IBM disputes this claim. They don’t dispute that Google has computed something with a quantum computer that would take a lot of conventional computing power, only that it “would take the world’s fastest supercomputer 10,000 years” to solve. IBM says it would take 2.5 days.

Scott Aaronson gets into technical details of the disagreement between Google and IBM. He explains that the supercomputer in question is Oak Ridge National Labs’ Summit machine. It covers the area of two basketball courts and has 250 petabytes of disk storage. By exploiting its enormous disk capacity, Summit could simulate Google’s quantum calculation on classical hardware in “only” two and half days. In a nutshell, it seems Google didn’t consider that you could trade off a gargantuan amount of storage for processor power. But as Aaronson points out, if Google’s machine added just a couple more qubits, even Summit couldn’t keep up on this particular problem.

So does this mean that all the world’s encryption systems are going to fail by the end of the week?

Google selected a problem that is ideal for a quantum computer. And why wouldn’t they? This is the natural thing to do. But they’re not on the verge of rendering public key encryption useless.

Google’s Sycamore processor used 54 qubits. According to this paper, it would take 20,000,000 qubits to factor 2048-bit semiprimes such as used in RSA encryption. So while Google has achieved a major milestone in quantum computing, public key encryption isn’t dead yet.

If and when large-scale quantum computing does become practical, encryption algorithms that depend on the difficulty of factoring integers will be broken. So will algorithms that depend on discrete logarithms, either working over integers or over elliptic curves.

Post-quantum encryption methods, methods that will remain secure even in a world of quantum computing (as far as we know), have been developed but not widely deployed. There’s a push to develop post-quantum methods now so that they’ll be ready by the time they’re needed. Once a new method has been proposed, it takes a long time for researchers to have confidence in it. It also takes a long time to develop efficient implementations that don’t introduce vulnerabilities.

The NSA recommends using existing methods with longer keys for now, then moving to quantum-resistant methods, i.e. not putting any more effort into developing new algorithms that are not quantum-resistant.

Here are some posts I’ve written about post-quantum encryption methods.