Cosmic rays flipping bits

A cosmic ray striking computer memory at just the right time can flip a bit, turning a 0 into a 1 or vice versa. While I knew that cosmic ray bit flips were a theoretical possibility, I didn’t know until recently that there had been documented instances on the ground [1].

Radiolab did an episode on the case of a cosmic bit flip changing the vote tally in a Belgian election in 2003. The error was caught because one candidate got more votes than was logically possible. A recount showed that the person in question got 4096 more votes in the first count than the second count. The difference of exactly 212 votes was a clue that there had been a bit flip. All the other counts remained unchanged when they reran the tally.

It’s interesting that the cosmic ray-induced error was discovered presumably because the software quality was high. All software is subject to cosmic bit flipping, but most of it is so buggy that you couldn’t rule out other sources of error.

Cosmic bit flipping is becoming more common because processors have become smaller and more energy efficient: the less energy it takes for a program to set a bit intentionally, the less energy it takes for radiation to set a bit accidentally.

Related post: Six sigma events

[1] Spacecraft are especially susceptible to bit flipping from cosmic rays because they are out from under the radiation shield we enjoy on Earth’s surface.

Strong primes

There are a couple different definitions of a strong prime. In number theory, a strong prime is one that is closer to the next prime than to the previous prime. For example, 11 is a strong prime because it is closer to 13 than to 7.

In cryptography, a strong primes are roughly speaking primes whose products are likely to be hard to factor. More specifically, though still not too specific, p is a strong prime if

  1. p is large
  2. p – 1 has a large prime factor q
  3. q – 1 has a large prime factor r
  4. p + 1 has a large prime factor s

The meaning of “large” is not precise, and varies over time. In (1), large means large enough that it is suitable for use in cryptography, such as in RSA encryption. This standard increases over time due to increases in computational power and improvements in factoring technology. The meaning of “large” in (2), (3), and (4) is not precise, but makes sense in relative terms. For example in (2), the smaller the ratio (p – 1)/q the better.

Relation between the definitions

The Wikipedia article on strong primes makes the following claim without any details:

A computationally large safe prime is likely to be a cryptographically strong prime.

I don’t know whether this has been proven, or even if it’s true, but I’d like to explore it empirically. (Update: see the section on safe primes below. I misread “safe” above as “strong.” Just as well: it lead to an interesting investigation.)

We’ll need some way to quantify whether a prime is strong in the cryptographic sense. This has probably been done before, but for my purposes I’ll use the sum of the logarithms of q, r, and s. We should look at these relative to the size of p, but all the p‘s I generate will be roughly the same size.

Python code

I’ll generate 100-bit primes just so my script will run quickly. These primes are too small for use in practice, but hopefully the results here will be representative of larger primes.

    from sympy import nextprime, prevprime, factorint, randprime
    import numpy as np
    
    # largest prime factor
    def lpf(n):
        return max(factorint(n).keys())
    
    def log2(n):
        np.log2(float(n))
    
    num_samples = 100
    data = np.zeros((num_samples, 5))
    
    bitsize = 100
    
    for i in range(num_samples):
        p = randprime(2**bitsize, 2**(bitsize+1))
        data[i,0] = 2*p > nextprime(p) + prevprime(p)
        q = lpf(p-1)
        r = lpf(q-1)
        s = lpf(p+1)
        data[i,1] = log2(q)
        data[i,2] = log2(r)
        data[i,3] = log2(s)
        data[i,4] = log2(q*r*s)      

The columns of our matrix correspond to whether the prime is strong in the number theory sense, the number of bits in qr, and s, and the total bits in the three numbers. (Technically the log base 2 rather than the number of bits.)

Results

There were 75 strong primes and 25 non-strong primes. Here were the averages:

    |-----+--------+------------|
    |     | strong | not strong |
    |-----+--------+------------|
    | q   |   63.6 |       58.8 |
    | r   |   41.2 |       37.0 |
    | s   |   66.3 |       64.3 |
    | sum |  171.0 |      160.1 |
    |-----+--------+------------|

The numbers are consistently higher for strong primes. However, the differences are small relative to the standard deviations of the values. Here are the standard deviations:

    |-----+--------+------------|
    |     | strong | not strong |
    |-----+--------+------------|
    | q   |   20.7 |       15.6 |
    | r   |   19.8 |       12.3 |
    | s   |   18.7 |       19.9 |
    | sum |   30.8 |       41.9 |
    |-----+--------+------------|

Safe primes

I realized after publishing this post that the Wikipedia quote didn’t say what I thought it did. It said that safe primes are likely to be cryptographically strong primes. I misread that as strong primes. But the investigation above remains valid. It shows weak evidence that strong primes in the number theoretical sense are also strong primes in the cryptographic sense.

Note that safe does not imply strong; it only implies the second criterion in the definition of strong. Also, strong does not imply safe.

To test empirically whether safe primes are likely to be cryptographically strong, I modified my code to generate safe primes and compute the strength as before, the sum of the logs base 2 of qr, and s. We should expect the strength to be larger since the largest factor of p will always be as large as possible, (p – 1)/2. But there’s no obvious reason why r or s should be large.

For 100-bit safe primes, I got an average strength of 225.4 with standard deviation 22.8, much larger than in my first experiment, and with less variance.

Related posts

Unifiers and Diversifiers

I saw a couple tweets this morning quoting Freeman Dyson’s book Infinite in All Directions.

Unifiers are people whose driving passion is to find general principles which will explain everything. They are happy if they can leave the universe looking a little simpler than they found it.

Diversifiers are people whose passion is to explore details. They are in love with the heterogeneity of nature … They are happy if they leave the universe a little more complicated than they found it.

Presumably these categories correspond to what Freeman elsewhere calls birds and frogs, or what others call hedgehogs and foxes. I imagine everyone takes pleasure in both unification and diversification, though in different proportions. Some are closer to one end of the spectrum than the other.

The scientific heroes presented to children are nearly always unifiers like Newton or Einstein [1]. You don’t see as many books celebrating, for example, a biologist who discovered that what was thought to be one species is really 37 different species. This creates an unrealistic picture of science since not many people discover grand unifying principles, though more find unifying principles on a small scale. I imagine many are discouraged from a career in science because they believe they have to be a unifier / bird / hedgehog, when in fact there are more openings for a diversifier / frog / fox.

Dyson may be taking a subtle swipe at unifiers by saying they want to leave the world looking a little simpler than they found it. There may be an unspoken accusation that unifiers create the illusion of unity by papering over diversity. True and significant unifying theories like general relativity are hard to come by. It’s much easier to come up with unifying theories that are incomplete or trivial.

Related posts

[1] Or at least scientists best known for their unifying work. Newton, for example, wasn’t entirely a unifier, but he’s best known for discovering unifying principles of gravity and motion.

Internet privacy as seen from 1975

Science fiction authors set stories in the future, but they don’t necessarily try to predict the future, and so it’s a little odd to talk about what they “got right.” Getting something right implies they were making a prediction rather than imagining a setting of a story.

However, sometimes SF authors do indeed try to predict the future. This seems to have been at least somewhat the case with John Brunner and his 1975 novel The Shockwave Rider because he cites futurist Alvin Toffler in his acknowledgement.

The Shockwave Rider derives in large part from Alvin Toffler’s stimulating study Future Shock, and in consequence I’m much obliged to him.

In light of Brunner’s hat tip to Toffler, I think it’s fair to talk about what he got right, or possibly what Toffler got right. Here’s a paragraph from the dust jacket that seemed prescient.

Webbed in a continental data-net that year by year draws tighter as more and still more information is fed to it, most people are apathetic, frightened, resigned to what ultimately will be a total abolishment of individual privacy. A whole new reason has been invented for paranoia: it is beyond doubt — whoever your are! — that someone, somewhere, knows something about you that you wanted to keep a secret … and you stand no chance of learning what it is.

Related posts

Impossible to misunderstand

“The goal is not to be possible to understand, but impossible to misunderstand.”

I saw this quote at the beginning of a math book when I was a student and it stuck with me. I would think of it when grading exams. Students often assume it is enough to be possible to understand, possible for an infinitely patient and resourceful reader to reverse engineer the thought process behind a stream of consciousness.

The quote is an aphorism, and so not intended to be taken literally, but I’d like to take the last part literally for a moment. I think the quote would be better advice if it said “unlikely to misunderstand.” This ruins the parallelism and the aesthetics of the quote, but it gets to an important point: trying to be impossible to misunderstand leads to bad writing. It’s appropriate when writing for computers, but not when writing for people.

Trying to please too wide and too critical an audience leads to defensive, colorless writing.

You’ll never use an allusion for fear that someone won’t catch it.

You’ll never use hyperbole for fear that some hyper-literalist will object.

You’ll never leave a qualification implicit for fear that someone will pounce on it.

Social media discourages humor, at least subtle humor. If you say something subtle, you may bring a smile to 10% of your audience, and annoy 0.1%. The former are much less likely to send feedback. And if you have a large enough audience, the feedback of the annoyed 0.1% becomes voluminous.

Much has been said about social media driving us to become partisan and vicious, and certainly that happens. But not enough has been said about an opposite effect that also happens, driving us to become timid and humorless.

Comparing Truncation to Differential Privacy

Traditional methods of data de-identification obscure data values. For example, you might truncate a date to just the year.

Differential privacy obscures query values by injecting enough noise to keep from revealing information on an individual.

Let’s compare two approaches for de-identifying a person’s age: truncation and differential privacy.

Truncation

First consider truncating birth date to year. For example, anyone born between January 1, 1955 and December 31, 1955 would be recorded as being born in 1955. This effectively produces a 100% confidence interval that is one year wide.

Next we’ll compare this to a 95% confidence interval using ε-differential privacy.

Differential privacy

Differential privacy adds noise in proportion to the sensitivity Δ of a query. Here sensitivity means the maximum impact that one record could have on the result. For example, a query that counts records has sensitivity 1.

Suppose people live to a maximum of 120 years. Then in a database with n records [1], one person’s presence in or absence from the database would make a difference of no more than 120/n years, the worst case corresponding to the extremely unlikely event of a database of n-1 newborns and one person 120 year old.

Laplace mechanism and CIs

The Laplace mechanism implements ε-differential privacy by adding noise with a Laplace(Δ/ε) distribution, which in our example means Laplace(120/nε).

A 95% confidence interval for a Laplace distribution with scale b centered at 0 is

[b log 0.05, –b log 0.05]

which is very nearly

[-3b, 3b].

In our case b = 120/nε, and so a 95% confidence interval for the noise we add would be [-360/nε, 360/nε].

When n = 1000 and ε = 1, this means we’re adding noise that’s usually between -0.36 and 0.36, i.e. we know the average age to within about 4 months. But if n = 1, our confidence interval is the true age ± 360. Since this is wider than the a priori bounds of [0, 120], we’d truncate our answer to be between 0 and 120. So we could query for the age of an individual, but we’d learn nothing.

Comparison with truncation

The width of our confidence interval is 720/ε, and so to get a confidence interval one year wide, as we get with truncation, we would set ε = 720. Ordinarily ε is much smaller than 720 in application, say between 1 and 10, which means differential privacy reveals far less information than truncation does.

Even if you truncate age to decade rather than year, this still reveals more information than differential privacy provided ε < 72.

Related posts

[1] Ordinarily even the number of records in the database is kept private, but we’ll assume here that for some reason we know the number of rows a priori.

Golden ratio primes

The golden ratio is the larger root of the equation

φ² – φ – 1 = 0.

By analogy, golden ratio primes are prime numbers of the form

p = φ² – φ – 1

where φ is an integer. To put it another way, instead of solving the equation

φ² – φ – 1 = 0

over the real numbers, we’re looking for prime numbers p where the equation can be solved in the integers mod p. [1]

Application

When φ is a large power of 2, these prime numbers are useful in cryptography because their special form makes modular multiplication more efficient. (See the previous post on Ed448.) We could look for such primes with the following Python code.

    from sympy import isprime

    for n in range(1000):
        phi = 2**n
        q = phi**2 - phi - 1
        if isprime(q):
            print(n)

This prints 19 results, including n = 224, corresponding to the golden ratio prime in the previous post. This is the only output where n is a multiple of 32, which was useful in the design of Ed448.

Golden ratio primes in general

Of course you could look for golden ratio primes where φ is not a power of 2. It’s just that powers of 2 are the application where I first encountered them.

A prime number p is a golden ratio prime if there exists an integer φ such that

p = φ² – φ – 1

which, by the quadratic theorem, is equivalent to requiring that m = 4p + 5 is a square. In that case

φ = (1 + √m)/2.

Here’s some code for seeing which primes less than 1000 are golden ratio primes.

    from sympy import primerange

    def issquare(m):
        return int(m**0.5)**2 == m

    for p in primerange(2, 1000):
        m = 4*p + 5
        if issquare(m):
            phi = (int(m**0.5) + 1) // 2
            assert(p == phi**2 - phi - 1)
            print(p)

By the way, there are faster ways to determine whether an integer is a square. See this post for algorithms.

(Update: Aaron Meurer pointed out in the comments that SymPy has an efficient function sympy.ntheory.primetest.is_square for testing whether a number is a square.)

Instead of looping over primes and testing whether it’s possible to solve for φ, we could loop over φ and test whether φ leads to a prime number.

    for phi in range(1000):
        p = phi**2 - phi - 1
        if isprime(p):     
            print(phi, p)

Examples

The smallest golden ratio prime is p = 5, with φ = 3.

Here’s a cute one: the pi prime 314159 is a golden ratio prime, with φ = 561.

The golden ratio prime that started this rabbit trail was the one with φ = 2224, which Mike Hamburg calls the Goldilocks prime in his design of Ed448.

Related posts

[1] If p = φ² – φ – 1 for some integer φ, then φ² – φ – 1 = 0 (mod p). But the congruence can have a solution when p is not a golden ratio prime. The following code shows that the smallest example is p = 31 and φ = 13.

from sympy import primerange
from sympy.ntheory.primetest import is_square

for p in primerange(2, 100):
    m = 4*p + 5
    if not is_square(m):
        for x in range(p):
            if (x**2 - x - 1) % p == 0:
                print(p, x)
                exit()

Goldilocks and the three multiplications

Illustration by Arthur Rackham, 1918. Public domain.

Mike Hamburg designed an elliptic curve for use in cryptography he calls Ed448-Goldilocks. The prefix Ed refers to the fact that it’s an Edwards curve. The number 448 refers to the fact that the curve is over a prime field where the prime p has size 448 bits. But why Goldilocks?

Golden primes and Goldilocks

The prime in this case is

p = 2448 – 2224 – 1,

which has the same form as the NIST primes. Hamburg says in his paper

I call this the “Goldilocks” prime because its form defines the golden ratio φ = 2224.

That sentence puzzled me. What does this have to do with the golden ratio? The connection is that Hamburg’s prime is of the form

φ² – φ – 1.

The roots of this polynomial are the golden ratio and its conjugate. But instead of looking for real numbers where the polynomial is zero, we’re looking for integers where the polynomial takes on a prime value. (See the followup post on golden ratio primes.)

The particular prime that Hamburg uses is the “Goldilocks” prime by analogy with the fairy tale: the middle term 2224 is just the right size. He explains

Because 224 = 32*7 = 28*8 = 56*4, this prime supports fast arithmetic in radix 228 or 232 (on 32-bit machines) or 256 (on 64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON. Furthermore, radix-264 implementations are possible with greater efficiency than most of the NIST primes.

Karatsuba multiplication

The title of this post is “Goldilocks and the three multiplications.” Where do the three multiplications come in? It’s an allusion to an algorithm for multi-precision multiplication that lets you get by with three multiplications where the most direct approach would require four. The algorithm is called Karatsuba multiplication [1].

Hamburg says “The main advantage of a golden-ratio prime is fast Karatsuba multiplication” and that if we set φ = 2224 then

\begin{align*} (a + b\phi)(c + d\phi) &= ac + (ad+bc)\phi + bd\phi^2 \\ &\equiv (ac+bd) + (ad+bc+bd)\phi \pmod{p} \\ &= (ac + bd) +((a+b)(c+d) - ac)\phi \end{align*}

Note that the first line on the right side involves four multiplications, but the bottom line involves three. Since the variables represent 224-bit numbers, removing a multiplication at the expense of an extra addition and subtraction is a net savings [2].

The most important line of the calculation above, and the only one that isn’t routine, is the second. That’s where the special form of p comes in. When you eliminate common terms from both sides, the calculation boils down to showing that

bd(\phi^2 - \phi - 1) \equiv 0 \pmod{p}

which is obviously true since p = φ² – φ – 1.

Curve Ed448-Goldilocks

Edwards curves only have one free parameter d (besides the choice of field) since they have the form

x² + y² = 1 + d x² y².

Hamburg chose d = -39081 for reasons explained in the paper.

Most elliptic curves used in ECC currently work over prime fields of order 256 bits, providing 128 bits of security. The motivation for developed Ed448 was much the same as for developing P-384. Both work over larger fields and so provide more bits of security, 224 and 192 bits respectively.

Unlike P-384, Ed448 is a safe curve, meaning that it lends itself to a secure practical implementation.

Related posts

[1] Here we’ve just applied the Karatsuba algorithm one time. You could apply it recursively to multiply two n-bit numbers in O(nk) time, where k = log2 3 ≈ 1.86. This algorithm, discovered in 1960, was the first multiplication algorithm faster than O(n²).

[2] Addition and subtraction are O(n) operations. And what about multiplication? That’s not an easy question. It’s no worse than O(n1.68) by virtue of the Karatsuba algorithm. In fact, it’s O(n log n), but only for impractically large numbers. See the discussion here. But in any case, multiplication is slower than addition for multiprecision numbers.

Tricks for arithmetic modulo NIST primes

The US National Institute of Standards and Technology (NIST) originally recommended 15 elliptic curves for use in elliptic curve cryptography [1]. Ten of these are over a field of size 2n. The other five are over prime fields. The sizes of these fields are known as the NIST primes.

The NIST curves over prime fields are named after the number of bits in the prime: the name is “P-” followed by the number of bits. The primes themselves are named p with a subscript for the number of bits.

The five NIST primes are

p192 = 2192 – 264 – 1
p224 = 2224 – 296 + 1
p256 = 2256 – 2244 + 2192 + 296 – 1
p384 = 2384 – 2128 – 296 + 232 – 1
p521 = 2521 – 1

The largest of these, p521, is a Mersenne prime, and the rest are generalized Mersenne primes.

Except for p521, the exponents of 2 in the definitions of the NIST primes are all multiples of 32 or 64. This leads to efficient tricks for arithmetic modulo these primes carried out with 32-bit or 64-bit integers. You can find pseudocode implementations for these tricks in Mathematical routines for the NIST prime elliptic curves.

The elliptic curve Ed448 “Goldilocks” was not part of the original set of recommended curves from NIST but has been added. It employs a multiplication trick in the same spirit as the routines referenced above, but simpler. Ed448 uses

p = 2448 – 2224 – 1

which has the special form φ² – φ – 1 where φ = 2224. This enables a trick known as Karatsuba multiplication. More on that here.

Related posts

[1] FIPS PUB 186-4. This publication is dated 2013, but the curve definitions are older. I haven’t found for certain when the curves were defined. I’ve seen one source that says 1997 and another that says 1999.

Elliptic curve P-384

The various elliptic curves used in ellitpic curve cryptography (ECC) have different properties, and we’ve looked at several of them before. For example, Curve25519 is implemented very efficiently, and the parameters were transparently chosen. Curve1174 is interesting because it’s an Edwards curve and has a special addition formula.

This post looks at curve P-384. What’s special about this curve? It’s the elliptic curve that the NSA recommends everyone use until post-quantum methods have been standardized. It provides 192 bits of security, whereas more commonly used curves provide 128 bits.

Does the NSA recommend this method because they know how to get around it? Possibly, but they also need to recommend methods that they believe foreign governments cannot break.

The equation of the P-384 curve is

y² = x³ + ax + b

working over the field of integers modulo a prime p. We will go into each of the specific parameters ab, and p, and discuss how they were chosen.

Modulus p

Consisting with the naming conventions for elliptic curves used in cryptography, the name “P-384” tells you that the curve is over a prime field where the prime is a 384-bit integer. Specifically, the order of the field is

p = 2384 – 2128 – 296 + 232 – 1

For a given number of bits, in this case 384, you want to pick a prime that’s relatively near the maximum size for that number of bits. In our case, our prime p is a prime near 2384 with a convenient bit pattern. (The special pattern allows implementation tricks that increase efficiency.)

Hasse’s theorem says that the number of points on a curve modulo a large prime is on the order of magnitude equal to the prime, so P-384 contains approximately 2384 points. In fact, the number of points n on the curve is

39402006196394479212279040100143613805079739270465446667946905279627659399113263569398956308152294913554433653942643

or approximately 2384 – 2190. The number n is a prime, and so it is the order of P-384 as a group.

Linear coefficient a

According to a footnote in the standard defining P-384, FIPS PUB 186-4,

The selection a ≡ -3 for the coefficient of x was made for reasons of efficiency; see IEEE Std 1363-2000.

All the NIST elliptic curves over prime fields use a = -3 because this makes it possible to use special algorithms for elliptic curve arithmetic.

Constant coefficient b

The curve P-384 has Weierstrass form

y² = x³ – 3x + b

where b is

27580193559959705877849011840389048093056905856361568521428707301988689241309860865136260764883745107765439761230575.

The parameter b is between 2383 and 2384 but doesn’t have any particular binary pattern:

101100110011000100101111101001111110001000111110111001111110010010011000100011100000010101101011111000111111100000101101000110010001100000011101100111000110111011111110100000010100000100010010000000110001010000001000100011110101000000010011100001110101101011000110010101100011100110001101100010100010111011010001100111010010101010000101110010001110110111010011111011000010101011101111

The specification says that b was chosen at random. How can you convince someone that you chose a parameter at random?

The standard gives a 160-bit seed s, and a hash-based algorithm that s was run through to create a 384-bit parameter c. Then b is the solution to

b² c = -27 mod p.

The algorithm going from the s to c is given in Appendix D.6 and is a sort of key-stretching algorithm. The standard cites ANS X9.62 and IEEE Standard 1363-2000 as the source of the algorithm.

If b was designed to have a back door, presumably a tremendous amount of computation had to go into reverse engineering the seed s.

Koblitz and Menezes wrote a paper in which they suggest a way that the NSA might have picked seeds that lead to weak elliptic curves, but then argue against it.

It is far-fetched to speculate that the NSA would have deliberately selected weak elliptic curves in 1997 for U.S. government usage … confident that no one else would be able to discover the weakness in these curves in the ensuing decades. Such a risky move by the NSA would have been inconsistent with the Agency’s mission.

Related posts