Perfectly nonlinear functions

The other day I heard someone suggest that a good cocktail party definition of cryptography is puzzle making. Cryptographers create puzzles that are easy to solve given the key, but ideally impossible without the key.

Linearity is very useful in solving puzzles, and so a puzzle maker would like to create functions that are as far from linear as possible. That is why cryptographers are interested in perfectly nonlinear functions (PNs) and almost perfect nonlinear functions (APNs), which we will explore here.

The functions we are most interested in map a set of bits to a set of bits. To formalize things, we will look at maps between two finite Abelian groups A and B. If we look at lists of bits with addition defined as component-wise XOR [1], we have an Abelian group, but the ideas below apply to other kinds of Abelian groups as well.

We will measure the degree of nonlinearity of a function F from A to B by the size of the sets

\delta(a, b) = \{x \mid F(a + x) - F(x) = b\}

where a and x are in A and b is in B [2].

If F is a linear function, F(a + x) = F(a) + F(x), and so δ(a, b) simplifies to

\delta(a, b) = \{x \mid F(a) = b\}

which equals all of the domain A, if F(a) = b. So for linear functions, the set δ(a, b) is potentially as large as possible, i.e. all of A. Perfectly nonlinear functions will be functions where the sets δ(a, b) are as small as possible, and almost perfectly nonlinear functions will be functions were δ(a, b) is as small as possible in a given context.

The uniformity of a function F is defined by

\underset{a \ne 0}{\max_{a \in A, \, b\in B}} \,\, |\delta(a,b)|

The restriction a ≠ 0 is necessary since otherwise the uniformity of every function would be the same.

For linear functions, the uniformity is as large as possible, i.e. |A|. As we will show below, a nonlinear function can have maximum uniformity as well, but some nonlinear functions have smaller uniformity. So some nonlinear functions are more nonlinear than others.

We will compute the uniformity of a couple functions, mapping 4 bits to 1 bit, with the following Python code.

    A = range(16)
    B = range(2)
    nonZeroA = range(1, 16)

    def delta(f, a, b):
        return {x for x in A if f(x^a) ^ f(x) == b}

    def uniformity(f):
        return max(len(delta(f, a, b)) for a in nonZeroA for b in B)

We could look at binary [3] functions with domains and ranges of different sizes by changing the definitions of A, B, and nonZeroA.

Note that in our groups, addition is component-wise XOR, denoted by ^ in the code above, and that subtraction is the same as addition since XOR is its own inverse. Of course in general addition would be defined differently and subtraction would not be the same as addition. When working over a binary field, things are simpler, sometimes confusingly simpler.

We want to compute the uniformity of two functions. Given bits x0, x1, x2, and x3, we define

F(x0, x1, x2, x3) = x0 x1

and

G(x0, x1, x2, x3) = x0 x1 + x2 x3.

We can implement these in Python as follows.

    def F(x):
        x = [c == '1' for c in format(x, "04b")]
        return x[0] and x[1]

    def G(x):
        x = [c == '1' for c in format(x, "04b")]
        return (x[0] and x[1]) ^ (x[2] and x[3])

Next we compute the uniformity of the two functions.

    print( uniformity(F) )
    print( uniformity(G) )

This shows that the uniformity of F is 16 and the uniformity of G is 8. Remember that functions with smaller uniformity are considered more nonlinear. Both functions F and G are nonlinear, but G is more nonlinear in the sense defined here.

To see that F is nonlinear, let x = (1, 0, 0, 0) and y = (0, 1, 0, 0).

1 = F(x + y) ≠ F(x) + F(y) = 0.

It turns out that 8 is the smallest possible uniformity for a function from 4 bits to 1 bit, and the function G is said to be perfectly nonlinear.

I haven’t defined perfect nonlinearity or almost perfect nonlinearaity precisely, only saying that they have something to do with maximum nonlinearity (minimum uniformity) in some context. I may say more about that context in a future post.

Related posts

[1] XOR stands for exclusive or. For Boolean variables p and q, p XOR q is true if either p is true or q is true, but not both. The component-wise XOR of two binary numbers sets each bit of the result to the XOR of the corresponding input bits. For example, 1001 XOR 1010 = 0011.

[2] This definition comes from Perfect nonlinear functions and cryptography by Céline Blondeau and Kaisa Nyberg.

[3] The next post replaces binary numbers with base 5 and other bases.

Overview of NIST post-quantum encryption finalists

If and when large-scale quantum computing becomes practical, most public key encryption algorithms currently in use would be breakable. Cryptographers have known this since Peter Shor published his quantum factoring algorithm in 1994.

In 2017 researchers submitted 69 algorithms to the NIST Post-Quantum Cryptography Standardization Process. In 2019 NIST chose 26 of these algorithms to advance to the second round of competition. Yesterday NIST announced the finalists for the third round of its post-quantum cryptography competition.

The four finalists for public key encryption and key establishment management (KEM) are

  • Classic McEliece
  • CRYSTALS-KYBER
  • NTRU
  • SABER

The three finalists for digital signatures are

  • CRYSTALS-DILITHIUM
  • FALCON
  • Rainbow

There were five alternates for public key encryption/KEM

  • BIKE
  • FrodoKEM
  • HQC
  • NTRU Prime
  • SIKE

and three alternates for digital signatures

  • GeMSS
  • Picnic
  • SPHINCS+

Classic McEliece is a code-based encryption method. This terminology makes no sense if you think of codes and encryption as synonymous. Here “code” is being used in the sense of codes as in error-correcting codes.

Rainbow is an “unbalanced oil and vinegar” (UOV) algorithm. I wrote an introduction to UOV here.

The rest of the finalist algorithms (CRYSTALS-KYBER, NTRU, SABER, CRYSTALS-DILITHIUM, and FALCON) are all variations on the theme of learning with errors: RLWE (ring learning with errors), MLWE (module learning with errors), MLWR (module learning with rounding), etc. These are all based on the assumption that the analog of linear regression over a discrete ring or module is a hard problem, and would remain hard for quantum computers.

Among the alternates, SIKE is the only one involving elliptic curves. Shor’s algorithm makes it practical to solve the discrete logarithm problem for elliptic curves, and quantum computers could break traditional elliptic curve cryptography. But SIKE uses isogeny-based encryption. It doesn’t depend on the inner workings of individual elliptic curves but rather the isogenies between different elliptic curves.

The NIST report says that SIKE didn’t make the list of finalists because it was an order of magnitude slower than its competitors. But on the plus side, SIKE uses smaller keys and produces smaller cypher texts than the other methods. If researches find ways to speed up SIKE significantly, and if other researchers don’t find weaknesses in the method, it could be widely adopted in the future.

***

More posts on encryption.

How probable is a probable prime?

A probable prime is a number that passes a test that all primes pass and that most composite numbers fail. Specifically, a Fermat probable prime is a number that passes Fermat’s primality test. Fermat’s test is the most commonly used, so that’s nearly always what anyone means by probable prime unless they’re more specific.

A number n is a Fermat probable prime for base b if

bn-1 = 1 (mod n).

This test isn’t conclusive, but it can be implemented efficiently and it weeds out most composite numbers. To read more on probable primes, see this post.

If a number is a probable prime, how probable is it that it really is prime? This post will briefly summarize some results from a paper [1] that makes this precise. From that paper:

… let P(x) denote the probability that an integer n is composite given that

    1. n is chosen at random with 1 < nx, n odd,
    2. b is chosen at random with 1 < b < n − 1, and
    3. n is a probable prime to the base b.

The authors give upper bounds on P(x) for x equal to various powers of 2. In particular they report

P(2250) ≤ 5.876 × 10−6

and

P(2260) ≤ 3.412 × 10−6

and so the chances that a 256-bit probable prime is composite are in the neighborhood of 4 in a million. In practice, one would test with multiple b‘s. The tests for various b‘s aren’t entirely independent, but running the tests for multiple bases does mean that fewer composite numbers slip through. There are a few pesky numbers, the Carmichael numbers, that are Fermat probable primes for nearly all bases (see footnote [2] here for more details), but these are rare.

I looked through the paper for results for larger powers of 2 to get results that would be applicable to, for example, RSA keys. The largest result explicit in the paper is

P(2330) ≤ 8.713 × 10−8

though I would like to know P(21024) and P(21536) since RSA keys are the products of (probable) primes this large. Presumably the results in [1] could be used to compute P(x) for larger values of x, but I haven’t read the paper closely enough to how much personal effort or computational effort that would require. I imagine it would be difficult or else the authors would have reported results for probable primes of the size frequently used in applications.

Related posts

[1] Jared Ducker Lichtman and Carl Pomerance. Improved error bounds for the Fermat primality test on random inputs. Mathematics of Computation. Volume 87, Number 314, November 2018, pages 2871–2890.

Making an invertible function out of non-invertible parts

How can you make an invertible function out of non-invertable parts? Why would you want to?

Encryption functions must be invertible. If the intended recipient can’t decrypt the message then the encryption method is useless.

Of course you want an encryption function to be really hard to invert without the key. It’s hard to think all at once of a function that’s really hard to invert. It’s easier to think of small components that are kinda hard to invert. Ideally you can iterate steps that are kinda hard to invert and create a composite that’s really hard to invert.

So how do we come up with components that are kinda hard to invert? One way is to make small components that are non-linear, and that are in fact impossible to invert [1]. But how can you use functions that are impossible to invert to create functions that are possible to invert? It doesn’t seem like this could be done, but it can. Feistel networks, named after cryptographer Horst Feistel, provide a framework for doing just that.

Many block encryption schemes are based a Feistel network or a modified Feistel network: DES, Lucifer, GOST, Twofish, etc.

The basic idea of Feistel networks is so simple that it may go by too fast the first time you see it.

You take a block of an even number bits and split it into two sub-blocks, the left half L and the right half R. The nth round of a Feistel cipher creates new left and right blocks from the left and right blocks of the previous round by

\begin{align*} L_n =& R_{n-1} \\ R_n =& L_{n-1} \oplus f(R_{n-1}, K_n) \end{align*}

Here ⊕ is bitwise XOR (exclusive or) and f(Rn-1, Kn) is any function of the previous right sub-block and the key for the nth round. The function f need not be invertible. It could be a hash function. It could even be a constant, crushing all input down to a single value. It is one of the non-invertible parts that the system is made of.

Why is this invertible? Suppose you have Ln and Rn. How could you recover Ln-1 and Rn-1?

Recovering Rn-1 is trivial: it’s just Ln. How do you recover Ln-1? You know Rn-1 and the key Kn and so you can compute

R_n \oplus f(R_{n-1}, K_n) = L_{n-1} \oplus f(R_{n-1}, K_n)\oplus f(R_{n-1}, K_n) = L_{n-1}

The main idea is that XOR is its own inverse. No matter what f(Rn-1, Kn) is, if you XOR it with anything twice, you get that thing back.

At each round, only one sub-block from the previous round is encrypted. But since the roles of left and right alternate each time, the block that was left alone at one round will be encrypted the next round.

When you apply several rounds of a Feistel network, the output of the last round is the encrypted block. To decrypt the block, the receiver reverses each of the rounds in the reverse order.

A sketch of DES

The DES (Data Encryption Standard) algorithm may be the best-known application of Feistel networks. It operates on 64-bit blocks of data and carries out 16 rounds. It takes a 56-bit key [2] and derives from it different 48-bit keys for each of the 16 rounds. In the context of DES, the function f described above takes 32 bits of data and a 48-bit key and returns 32 bits. This function has four steps.

  1. Expand the 32 bits of input to 48 bits by duplicating some of the bits.
  2. XOR with the key for that round.
  3. Divide the 48 bits into eight groups of 6 bits and apply an S box to each group.
  4. Permute the result.

The S boxes are nonlinear functions that map 6 bits to 4 bits. The criteria for designing the S boxes was classified when DES became a standard, and there was speculation that the NSA has tweaked the boxes to make them less secure. In fact, the NSA tweaked the boxes to make them more secure. The S boxes were modified to make them more resistant to differential cryptanalysis, a technique that was not publicly know at the time.

More cryptography posts

[1] These functions are impossible to invert in the sense that two inputs may correspond to the same output; there’s no unique inverse. But they’re also computationally difficult to invert relative to their size: for a given output, it’s time consuming to find any or all corresponding inputs.

[2] When DES was designed in the 1970’s researchers objected that 56-bit keys were too small. That’s certainly the case now, and so DES is no longer secure. DES lives on as a component of Triple DES, which uses three 56-bit keys to produce 112-bit security. (Triple DES does not give 168 bits of security because it is vulnerable to a kind of meet-in-the-middle attack.)

ChaCha RNG with fewer rounds

ChaCha is a CSPRING, a cryptographically secure pseudorandom number generator. When used in cryptography, ChaCha typically carries out 20 rounds of its internal scrambling process. Google’s Adiantum encryption system uses ChaCha with 12 rounds.

The runtime for ChaCha is proportional to the number of rounds, so you don’t want to do more rounds than necessary if you’re concerned with speed. Adiantum was developed for mobile devices and so Google wanted to reduce the number of rounds while maintaining a margin of cryptographic safety.

Cryptographer Jean-Philippe Aumasson suggests 8 rounds of ChaCha is plenty. He reports that there is a known attack on ChaCha with 6 rounds that requires on the order of 2116 operations, but that ChaCha with 5 rounds is definitely breakable, requiring on the order of only 216 operations.

Three rounds of ChaCha are not enough [1], but four rounds are enough to satisfy DIEHARDER, PractRand, and TestU01 [2]. This illustrates the gap between adequate statistical quality and adequate cryptographic quality: Four rounds of ChaCha are apparently enough to produce the former but five rounds are not enough to produce the latter.

There doesn’t seem to be any reason to use ChaCha with four rounds. If you don’t need cryptographic security, then there are faster RNGs you could use. If you do need security, four rounds are not enough.

ChaCha with six rounds seems like a good compromise if you want an RNG that is fast enough for general use and that that also has reasonably good cryptographic quality. If you want more safety margin for cryptographic quality, you might want to go up to eight rounds.

What a difference one round makes

One thing I find interesting about random number generation and block encryption is that a single round of obfuscation can make a huge difference. ChaCha(3) fails statistical tests but ChaCha(4) is fine. ChaCha(5) is easily broken but ChaCha(6) is not.

Interesting failures

Often random number generators are either good or bad; they pass the standard test suites or they clearly fail. ChaCha(3) is interesting in that it is somewhere in between. As the results in the footnotes show, ChaCha(3) is an intermediate case. DIEHARDER hints at problems, but small crush thinks everything is fine. The full crush battery however does find problems.

The decisive failure of the Fourier tests is understandable: low-quality generators often fail spectral tests. But the results of the simple poker test are harder to understand. What is it about simulating poker that makes ChaCha(3) fail spectacularly? And in both cases, one more round of ChaCha fixes the problems.

Related posts

[1] ChaCha(3) passes DIEHARDER, though five of the tests passed with a “weak” pass. Passes TestU01 small crush but fails full crush:

========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        32-bit stdin
 Number of statistics:  144
 Total CPU time:   00:39:05.05
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
 25  SimpPoker, d = 64                eps
 26  SimpPoker, d = 64                eps
 27  CouponCollector, d = 4          7.7e-4
 28  CouponCollector, d = 4          7.4e-4
 29  CouponCollector, d = 16          eps
 33  Gap, r = 0                      2.7e-8
 51  WeightDistrib, r = 0             eps
 52  WeightDistrib, r = 8           5.2e-15
 53  WeightDistrib, r = 16           2.1e-5
 55  SumCollector                     eps
 69  RandomWalk1 H (L = 10000)       1.1e-4
 74  Fourier3, r = 0               1.3e-144
 75  Fourier3, r = 20               6.9e-44
 80  HammingWeight2, r = 0           2.8e-6
 83  HammingCorr, L = 300           3.2e-10
 84  HammingCorr, L = 1200          6.1e-13
 ----------------------------------------------
 All other tests were passed

ChaCha(3) also fails PractRand decisively.

RNG_test using PractRand version 0.94
RNG = chacha(3), seed = 0x7221236f
test set = core, folding = standard (32 bit)

rng=chacha(3), seed=0x7221236f
length= 128 megabytes (2^27 bytes), time= 2.0 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/32]BCFN(2+0,13-6,T)         R= +31.0  p =  4.7e-11   VERY SUSPICIOUS
  [Low1/32]BCFN(2+1,13-6,T)         R= +27.4  p =  8.3e-10   VERY SUSPICIOUS
  [Low1/32]BCFN(2+2,13-6,T)         R= +52.7  p =  1.8e-18    FAIL !
  [Low1/32]BCFN(2+3,13-6,T)         R= +47.6  p =  9.5e-17    FAIL !
  [Low1/32]BCFN(2+4,13-7,T)         R= +26.1  p =  1.7e-8   very suspicious
  [Low1/32]DC6-9x1Bytes-1           R= +26.3  p =  1.3e-14    FAIL !
  [Low1/32]FPF-14+6/16:all          R=  +8.6  p =  2.2e-7   very suspicious
  ...and 147 test result(s) without anomalies

[2] ChaCha(4) passed TestU01 small crush and full crush. It passed PractRand using up to 512 GB.

Number of forms of a finite field

The number of elements in a finite field must be a prime power, and for every prime power there is only one finite field up to isomorphism.

The finite field with 256 elements, GF(28), is important in applications. From one perspective, there is only one such field. But there are a lot of different isomorphic representations of that field, and some are more efficient to work with that others.

Just how many ways are there to represent GF(28)? Someone with the handle joriki gave a clear answer on Stack Exchange:

There are 28−1 different non-zero vectors that you can map the first basis vector to. Then there are 28−2 different vectors that are linearly independent of that vector that you can map the second basis vector to, and so on. In step k, 2k-1 vectors are linear combinations of the basis vectors you already have, so 28−2k−1 aren’t, so the total number of different automorphisms is

\prod_{k=1}^8\left(2^8 - 2^{k-1} \right ) = 5348063769211699200

This argument can be extended to count the number of automorphism of any finite field.

More finite field posts

Top cryptography posts of 2019

Toward the end of each year I write a post or two listing the most popular posts by category. This year the categories will be a little different. I’ll start by listing my most popular posts about cryptography this year.

The next categories will be command line tools, privacy, and math.

(When I wrote this, I started with crypto because I didn’t think I’d write any more posts on the topic. The the announcement about RSA-240 came out and so I wrote something about it yesterday.)

New RSA factoring challenge solved

How hard is it to factor large numbers? And how secure are encryption methods based on the difficulty of factoring large numbers?

The RSA factoring challenges were set up to address these questions. Last year RSA-230 was factored, and this week RSA-240 was factored. This is a 240 digit (795 bit) number, the product of two primes.

Researchers solved two related problems at the same time, factoring RSA-240 and solving a discrete logarithm problem. Together these problems took about 4,000 core-years to solve. It’s not clear from the announcement how long it would have taken just to factor RSA-240 alone.

If you were to rent the computing power used, I imagine the cost would be somewhere in the six figures.

This makes 2048-bit and 3072-bit RSA keys look very conservative. However, the weakest link in RSA encryption is implementation flaws, not the ability to factor big numbers.

Assume for a moment that breaking RSA encryption requires factoring keys. (This may not be true in theory [*] or in practice.) How long would it take to factor a 2048 or 3072 bit key?

The time required to factor a number n using the number field sieve is proportional to

\exp\left( \left(\sqrt[3]{\frac{64}{9}} + o(1)\right)(\ln n)^{\frac{1}{3}}(\ln \ln n)^{\frac{2}{3}}\right)

Here o(1) roughly means terms that go away as n gets larger. (More on the notation here.) For simplicity we’ll assume we can ignore these terms.

This suggests that factoring a 2048-bit key is 12 orders of magnitude harder than factoring RSA-240, and that factoring a 3072-bit key is 18 orders of magnitude harder.

However, I don’t think anyone believes that breaking RSA with 2048-bit keys would require a quadrillion core-years. If the NSA believed this, they wouldn’t be recommending that everyone move to 3072-bit keys.

Why such a large discrepancy? Here are a few reasons. As mentioned above, RSA encryption often has exploitable implementation flaws. And even if implemented perfectly, there is no proof that breaking RSA encryption is as hard as factoring. And there could be breakthroughs in factoring algorithms. And large-scale quantum computers may become practical, in which case factoring would become much easier.

***

[*] Factoring is sufficient to break RSA, but there’s no proof that it’s necessary. Michael Rabin’s variation on RSA is provably as hard to break as factoring: decryption would enable you to factor the key. But as far as I know, Rabin’s method isn’t used anywhere. Even if you know your method is as hard as factoring, maybe factoring isn’t as hard as it seems. Lower bounds on computational difficulty are much harder to obtain than upper bounds.

Quantum supremacy and post-quantum crypto

Google announced today that it has demonstrated “quantum supremacy,” i.e. that they have solved a problem on a quantum computer that could not be solved on a classical computer. Google says

Our machine performed the target computation in 200 seconds, and from measurements in our experiment we determined that it would take the world’s fastest supercomputer 10,000 years to produce a similar output.

IBM disputes this claim. They don’t dispute that Google has computed something with a quantum computer that would take a lot of conventional computing power, only that it “would take the world’s fastest supercomputer 10,000 years” to solve. IBM says it would take 2.5 days.

Scott Aaronson gets into technical details of the disagreement between Google and IBM. He explains that the supercomputer in question is Oak Ridge National Labs’ Summit machine. It covers the area of two basketball courts and has 250 petabytes of disk storage. By exploiting its enormous disk capacity, Summit could simulate Google’s quantum calculation on classical hardware in “only” two and half days. In a nutshell, it seems Google didn’t consider that you could trade off a gargantuan amount of storage for processor power. But as Aaronson points out, if Google’s machine added just a couple more qubits, even Summit couldn’t keep up on this particular problem.

So does this mean that all the world’s encryption systems are going to fail by the end of the week?

Google selected a problem that is ideal for a quantum computer. And why wouldn’t they? This is the natural thing to do. But they’re not on the verge of rendering public key encryption useless.

Google’s Sycamore processor used 54 qubits. According to this paper, it would take 20,000,000 qubits to factor 2048-bit semiprimes such as used in RSA encryption. So while Google has achieved a major milestone in quantum computing, public key encryption isn’t dead yet.

If and when large-scale quantum computing does become practical, encryption algorithms that depend on the difficulty of factoring integers will be broken. So will algorithms that depend on discrete logarithms, either working over integers or over elliptic curves.

Post-quantum encryption methods, methods that will remain secure even in a world of quantum computing (as far as we know), have been developed but not widely deployed. There’s a push to develop post-quantum methods now so that they’ll be ready by the time they’re needed. Once a new method has been proposed, it takes a long time for researchers to have confidence in it. It also takes a long time to develop efficient implementations that don’t introduce vulnerabilities.

The NSA recommends using existing methods with longer keys for now, then moving to quantum-resistant methods, i.e. not putting any more effort into developing new algorithms that are not quantum-resistant.

Here are some posts I’ve written about post-quantum encryption methods.

Proving that a choice was made in good faith

How can you prove that a choice was made in good faith? For example, if your company selects a cohort of people for random drug testing, how can you convince those who were chosen that they weren’t chosen deliberately? Would a judge find your explanation persuasive? This is something I’ve helped companies with.

It may be impossible to prove that a choice was not deliberate, but you can show good faith by providing evidence that the choice was deliberate by a different criteria than the one in question.

I’ll give four examples, three positive and one negative.

Cliff RNG

My previous three blog posts looked at different aspects of the Cliff random number generator. The generator needs a seed between 0 and 1 to start. Suppose I chose 0.8121086949937715 as my seed. On the one hand, that’s a number with no apparent special features. But you might ask “Hey, why that number?” and you’d be right. I show in the first post in the series how that number was chosen to make the generator start off producing duplicate output.

In the next two posts in the series, I chose π – 3 as my seed. That’s a recognizable number and obviously a deliberate choice. But it has no apparent connection to the random number generator, and so it’s reasonable to assume that the seed wasn’t chosen to make the generator look good or bad.

SHA-2

The SHA-2 cryptographic hash function seventy two 32-bit numbers for initial state that needed to be “random” in some sense. But if the values were actually chosen at random, critics would suspect that the values were chosen to provide a back door. And maybe there is a clever way to pick the initial state that provides a non-obvious exploitable weakness.

The designers of SHA-2 chose the square roots of the first consecutive primes to fill one set of constants, and the cube roots of the first consecutive primes to fill another. See code here.

The initial state is definitely not random. Someone looking at the state would eventually discover where it came from. So while the choice was obviously deliberate, but apparently not designed by any cryptographic criteria.

Curve 25519

Daniel Bernstein’s elliptic curve Curve25519 is widely trusted in part because Bernstein made his design choices transparent. The curve is

y² = x³ + 486662x² + x

over the finite field with 2255-19 elements, hence the name.

2255-19 is the largest prime less than 2255, and being close to 2255 makes the method efficient to implement. The coefficient 48666 is less obvious. But Bernstein explains in his paper that he took the three smallest possible values of this parameter that met the explicit design criteria, and then rejected two of them on objective grounds described at the bottom of the paper.

NIST P-384

The design of elliptic curve NIST P-384 is not as transparent as that of Curve25519 which has lead to speculation that NIST may have designed the curve to have a back door.

The curve has Weierstrass form

y² = x³ – 3x + b

over the finite field with p elements where

p = 2384 – 2128 – 296 + 232 – 1.

As with Curve25519, the choice of field size was motivated by efficiency; the pattern of powers of 2 enables some tricks for efficient implementation. Also, there are objective reasons why the linear coefficient is -3. But the last coefficient b is the 383-bit number

27580193559959705877849011840389048093056905856361568521428707301988689241309860865136260764883745107765439761230575

which has raised some eyebrows. NIST says the value was chosen at random, though not directly. More specifically, NIST says it first generated a certain 160-bit random number, then applied a common key stretching algorithm to obtain b above. Researchers are divided over whether they believe this. See more in this post.

Conclusion

Sometimes you can’t prove that a choice wasn’t deliberate. In that case the best you can do is show that the choice was deliberate, but by an innocent criteria, one unrelated to the matter at hand.

I tried to do this in the Cliff RNG blog posts by using π as my seed. I couldn’t actually use π because the seed had to be between 0 and 1, but there’s an obvious way to take π and produce a number between 0 and 1.

The designers of SHA-2 took a similar route. Just as π is a natural choice for a real number, primes are natural choices for integers. They couldn’t use integers directly, but they made complicated patterns from simple integers in a natural way by taking roots.

Daniel Bernstein gained the cryptography community’s trust by making his design criteria transparent. Given his criteria, the design was almost inevitable.

NIST was not as persuasive as Daniel Bernstein. They claim to have chosen a 160-bit number at random, and they may very well have. But there’s no way to know whether they generated a lot of 160-bit seeds until they found one that resulted in a constant term that has some special property. They may have chosen their seed in good faith, but they have a not been completely persuasive.

Sometimes it’s not enough to act in good faith; you have to make a persuasive case that you acted in good faith.