All elliptic curves over fields of order 2 and 3

Introductions to elliptic curves often start by saying that elliptic curves have the form

y² = x³ + ax + b.

where 4a³ + 27b² ≠ 0. Then later they say “except over fields of characteristic 2 or 3.”

What does characteristic 2 or 3 mean? The order of a finite field is the number of elements it has. The order is always a prime or a prime power. The characteristic is that prime. So another way to phrase the exception above is to say “except over fields of order 2n or 3n.”

If we’re looking at fields not just of characteristic 2 or 3, but order 2 or 3, there can’t be that many of them. Why not just list them? That’s what I plan to do here. Continue reading

Efficient modular arithmetic technique for Curve25519

Daniel Bernstein’s Curve25519 is the elliptic curve

y² = x³ + 486662x² + x

over the prime field with order p = 2255 – 19. The curve is a popular choice in elliptic curve cryptography because its design choices are transparently justified [1] and because cryptography over the curve can be implemented very efficiently. This post will concentrate on one of the tricks that makes ECC over Curve25519 so efficient.

Curve25519 was designed for fast and secure cryptography. One of the things that make it fast is the clever way Bernstein carries out arithmetic mod 2255 – 19 which he describes here.

Bernstein represents numbers mod 2255 – 19 by polynomials whose value at 1 gives the number. That alone is not remarkable, but his choice of representation seems odd until you learn why it was chosen. Each number is represented as a polynomial of the form

ui xi

where each ui is an integer multiple ki of 2⌈25.5i, and each ki is an integer between -225 and 225 inclusive.

Why this limitation on the k‘s? Pentium cache optimization. In Bernstein’s words:

Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or eight 32-bit pieces? Answer: The coefficients of a polynomial product do not fit into the Pentium M’s fp registers if pieces are too large. The cost of handling larger coefficients outweighs the savings of handling fewer coefficients.

And why unevenly spaced powers of 2: 1, 226, 251, 277, …, 2230? Some consecutive exponents differ by 25 and some by 26. This looks sorta like a base 225 or base 226 representation, but is a mixture of both. Bernstein answers this in his paper.

Bernstein answers this question as well.

Given that there are 10 pieces, why use radix 225.5 rather than, e.g., radix 225 or radix 226? Answer: My ring R contains 2255x10 − 19, which represents 0 in Z/(2255 − 19). I will reduce polynomial products modulo 2255x10 – 19 to eliminate the coefficients of x10, x11, etc. With radix 225 , the coefficient of x10 could not be eliminated. With radix 226, coefficients would have to be multiplied by 2519 rather than just 19, and the results would not fit into an fp register.

There are a few things to unpack here.

Remember that we’re turning polynomials in to numbers by evaluating them at 1. So when x = 1, 2255x10 – 19  = p = 2255 – 19, which is the zero in the integers mod  2255 – 19.

If we were using base (radix) 225 , the largest number we could represent with a 9th degree polynomial with the restrictions above would be 2250 , so we’d need a 10th degree polynomial; we couldn’t eliminate terms containing x10.

I don’t yet see why working with radix 226 would overflow an fp register. If you do see why, please leave an explanation in the comments.

Related posts

[1] When a cryptographic method has an unjustified parameter, it invites suspicion that the parameter was chosen to create an undocumented back door. This is not the case with Curve25519. For example, why does it use p = 2255 – 19? It’s efficient to use a prime close to a large power of 2, and this p is the closes prime to 2255. The coefficient 486662 is not immediately obvious, but Bernstein explains in his paper how it was the smallest integer that met his design criteria.

Public key encryption based on squares and non squares

The RSA encryption algorithm depends indirectly on the assumption that factoring the product of large primes is hard. The algorithm presented here, invented by Shafi Goldwasser and Silvio Micali, depends on the same assumption but in a different way. The Goldwasser-Micali algorithm is more direct than RSA, thought it is also less efficient.

One thing that makes GM interesting is that allows a form of computing on encrypted data that we’ll describe below.

GM in a nutshell

To create a public key, find two large primes p and q and publish N = pq. (There’s one more piece we’ll get to shortly.) You keep p and q private, but publish N, much like with RSA.

Someone can send you a message, one bit at a time, by sending you numbers that either do or do not have a square root mod N.

Sending a 0

If someone wants to send you a 0, they send you a number that has a square root mod N. This is easy to do: they select a number between 1 and N at random, square it mod N, and send you the result.

Determining whether a random number is a square mod N is easy if and only if you know how to factor N. [1]

When you receive the number, you can quickly tell that it is a square because you know how to factor N. The sender knows that it’s a square because he got it by squaring something. You can produce a square without knowing how to factor N, but it’s computationally infeasible to start with a given number and tell whether it’s a square mod N, unless you know the factorization of N.

Sending a 1

Sending a 1 bit is a little more involved. How can someone who cannot factor N produce a number that’s not a square? That’s actually not feasible without some extra information. The public key is not just N. It’s also a number z that is not a square mod N. So the full public key is two numbers, N and z.

To generate a non-square, you first generate a square then multiply it by z.

Example

Suppose you choose p = 314159 and q = 2718281. (Yes, p is a prime. See the post on pi primes. And q comes from the first few digits of e.) In practice you’d choose p and q to be very large, hundreds of digits, and you wouldn’t pick them to have a cute pattern like we did here. You publish N = pq = 853972440679 and imagine it’s too large for anyone to factor (which may be true for someone armed with only pencil and paper).

Next you need to find a number z that is not a square mod N. You do that by trying numbers at random until you find one that is not a square mod p and not a square mod q. You can do that by using Legendre symbols, It turns out z = 400005 will work.

So you tell the world your public key is (853972440679, 400005).

Someone wanting to send you a 0 bit chooses a number between 1 and N = 853972440679, say 731976377724. Then they square it and take the remainder by N to get 592552305778, and so they send you 592552305778. You can tell, using Legendre symbols, that this is a square mod p and mod q, so it’s a square mod N.

If they had wanted to send you a 1, they could have sent us 592552305778 * 400005 mod N = 41827250972, which you could tell isn’t a square mod N.

Homomorphic encryption

Homomorphic encryption lets you compute things on encrypted data without having to first decrypt it. The GM encryption algorithm is homomorphic in the sense that you can compute an encrypted form of the XOR of two bits from an encrypted form of each bit. Specifically, if c1 and c2 are encrypted forms of bits b1 and b2, then c1 c2 is an encrypted form of b1b2. Let’s see why this is, and where there’s a small wrinkle.

Suppose our two bits are both 0s. Then c1 and c2 are squares mod N, and c1 c2 is a square mod N.

Now suppose one bit is a 0 and the other is a 1. Then either c1 is a square mod N and c2 isn’t or vice versa, but in either case their product is not a square mod N.

Finally suppose both our bits are 1s. Since 1⊕1 = 0, we’d like to say that c1 c2 is a square mod N. Is it?

The product of two non-squares is not necessarily a non-square. For example, 2 and 3 are not squares mod 35, and neither is their product 6 [2]. But if we followed the recipe above, and calculated c1 and c2 both by multiplying a square by the z in the public key, then we’re OK. That is, if c1 = x²z and c2 = y²z, then c1c2 = x²y²z², which is a square. So if you return non-squares that you find as expected, you get the homomorphic property. If you somehow find your own non-squares, they might not work.

Related posts

[1] As far as we know. There may be an efficient way to tell whether x is a square mod N without factoring N, but no such method has been published. The problem of actually finding modular square roots is equivalent to factoring, but simply telling whether modular square roots exist, without having to produce the roots, may be easier.

If quantum computing becomes practical, then factoring will be efficient and so telling whether numbers are squares modulo a composite number will be efficient.

[2] You could find all the squares mod 35 by hand, or you could let Python do it for you:

>>> set([x*x % 35 for x in range(35)])
{0, 1, 4, 9, 11, 14, 15, 16, 21, 25, 29, 30}

An infinite product challenge

Gil Kalai wrote a blog post yesterday entitled “Test Your Intuition (or knowledge, or programming skills) 36.” The challenge is to evaluate the infinite product

\prod_{p\,\, \mathrm{prime}} \frac{p^2+1}{p^2 - 1}

I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used primerange from SymPy to generate the list of primes up to 200, and cumprod from NumPy to generate the list of partial products.

    cumprod(
        [(p*p+1)/(p*p-1) for p in primerange(1,200)]
    )

Apparently the product converges to 5/2, and a plot suggests that it converges very quickly.

Plot of partial products

Here’s another plot to look more closely at the rate of convergence. Here we look at the difference between 5/2 and the partial products, on a log scale, for primes less than 2000.

Plot of 2.5 minus partial products, log scale

 

Testing for primes less than a quintillion

The most common way to test whether a large number is prime is the Miller-Rabin test. If the test says a number is composite, it’s definitely composite. Otherwise the number is very likely, but not certain, to be prime. A pseudoprime is a composite number that slips past the Miller-Rabin test. (Actually, a strong pseudoprime. More on that below.)

Miller-Rabin test

The Miller-Rabin test is actually a sequence of tests, one for each prime number. First you run the test associated with 2, then the test associated with 3, then the one associated with 5, etc. If we knew the smallest numbers for which these tests fail, then for smaller numbers we know for certain that they’re prime if they pass. In other words, we can turn the Miller-Rabin test for probable primes into test for provable primes.

Lower bound on failure

A recent result by Yupeng Jiang and Yingpu Deng finds the smallest number for which the Miller-Rabin test fails for the first nine primes. This number is

N = 3,825,123,056,546,413,051

or more than 3.8 quintillion. So if a number passes the first nine Miller-Rabin tests, and it’s less than N, then it’s prime. Not just a probable prime, but definitely prime. For a number n < N, this will be more efficient than running previously known deterministic primality tests on n.

Python implementation

Let’s play with this in Python. The SymPy library implements the Miller-Rabin test in a function mr.
The following shows that N is composite, and that it is a false positive for the first nine Miller-Rabin tests.

    from sympy.ntheory.primetest import mr

    N = 3825123056546413051
    assert(N == 149491*747451*34233211)
    ps = [2, 3, 5, 7, 11, 13, 17, 19, 23]
    print( mr(N, ps) )

This doesn’t prove that N is the smallest number with these properties; we need the proof of Jiang and Deng for that. But assuming their result is right, here’s an efficient deterministic primality test that works for all n less than N.

    def is_prime(n):
        N = 3825123056546413051
        assert(n < N)
        ps = [2, 3, 5, 7, 11, 13, 17, 19, 23]
        return mr(n, ps)

Jiang and Deng assert that N is also the smallest composite number to slip by the first 10 and 11 Miller-Rabin tests. We can show that N is indeed a strong pseudoprime for the 10th and 11th primes, but not for the 12th prime.

    print( mr(N, [29, 31]) )
    print( mr(N, [37]) )

This code prints True for the first test and False for the second. That is, N is a strong pseudoprime for bases 29 and 31, but not for 37.

Pseudoprimes and strong pseudoprimes

Fermat’s little theorem says that if n is prime, then

an-1 = 1 mod n

for all 0 < an.  This gives a necessary but not sufficient test for primality. A (Fermat) pseudoprime for base a is a composite number n such that the above holds, an example of where the test is not sufficient.

The Miller-Rabin test refines Fermat’s test by looking at additional necessary conditions for a number being prime. Often a composite number will fail one of these conditions, but not always. The composite numbers that slip by are called strong pseudoprimes or sometimes Miller-Rabin pseudoprimes.

Miller and Rabin’s extra testing starts by factoring n-1 into 2sd where d is odd. If n is prime, then for all 0 < a < n either

ad = 1 mod n

or

a2kd = -1 mod n

for all k satisfying 0 ≤ k < s. If one of these two conditions holds for a particular a, then n passes the Miller-Rabin test for the base a.

It wouldn’t be hard to write your own implementation of the Miller-Rabin test. You’d need a way to work with large integers and to compute modular exponents, both of which are included in Python without having to use SymPy.

Example

561 is a pseudoprime for base 2. In fact, 561 is a pseudoprime for every base relatively prime to 561, i.e. it’s a Carmichael number. But it is not a strong pseudoprime for 2 because 560 = 16*35, so d = 35 and

235 = 263 mod 561,

which is not congruent to 1 or to -1. In Python,

    
    >>> pow(2, 560, 561)
    1
    >>> pow(2, 35, 561)
    263

Related posts

Regression, modular arithmetic, and PQC

Linear regression

Suppose you have a linear regression with a couple predictors and no intercept term:

β1x1 + β2x2 = y + ε

where the x‘s are inputs, the β are fixed but unknown, y is the output, and ε is random error.

Given n observations (x1, x2, y + ε), linear regression estimates the parameters β1 and β2.

I haven’t said, but I implicitly assumed all the above numbers are real. Of course they’re real. It would be strange if they weren’t!

Learning with errors

Well, we’re about to do something strange. We’re going to pick a prime number p and do our calculations modulo p except for the addition of the error ε. Our inputs (x1, x2) are going to be pairs of integers. Someone is going to compute

r = β1x1 + β2x2 mod p

where β1 and β2 are secret integers. Then they’re going to tell us

r/p + ε

where ε is a random variable on the interval [0, 1].  We give them n pairs (x1, x2) and they give back n values of r/p with noise added. Our job is to infer the βs.

This problem is called learning with errors or LWE. It’s like linear regression, but much harder when the problem size is bigger. Instead of just two inputs, we could have m of inputs with m secret coefficients where m is large. Depending on the number of variables m, the number of equations n, the modulus p, and the probability distribution on ε, the problem may be possible to solve but computationally very difficult.

Why is it so difficult? Working mod p is discontinuous. A little bit of error might completely change our estimation of the solution. If n is large enough, we could recover the coefficients anyway, using something like least squares. But how would we carry that out? If m and p are small we can just try all pm possibilities, but that’s not going to be practical if m and p are large.

In linear regression, we assume there is some (approximately) linear process out in the real world that we’re allowed to reserve with limited accuracy. Nobody’s playing a game with us, that just how data come to us. But with LWE, we are playing a game that someone has designed to be hard. Why? For cryptography. In particular, quantum-resistant cryptography.

Post Quantum Cryptography

Variations on LWE are the basis for several proposed encryption algorithms that believed to be secure even if an adversary has access to a quantum computer.

The public key encryption systems in common use today would all be breakable if quantum computing becomes practical. They depend on mathematical problems like factoring and discrete logarithms being computationally difficult, which they appear to be with traditional computing resources. But we know that these problems could be solved in polynomial time on a quantum computer with Shor’s algorithm. But LWE is a hard problem, even on a quantum computer. Or so we suspect.

The US government’s National Institute of Standards and Technology (NIST) is holding a competition to identify quantum-resistant encryption algorithms. Last month they announced 26 algorithms that made it to the second round. Many of these algorithms depend on LWE or variations.

One variation is LWR (learning with rounding) which uses rounding rather than adding random noise. There are also ring-based counterparts RLWE and RLWR which add random errors and use rounding respectively. And there are polynomial variations such as poly-LWE which uses a polynomial-based learning with errors problem. The general category for these methods is lattice methods.

Lattice methods

Of the public-key algorithms that made it to the second round of the the NIST competition, 9 out of 17 use lattice-based cryptography:

  • CRYSTALS-KYBER
  • FrodoKEM
  • LAC
  • NewHope
  • NTRU
  • NTRU Prime
  • Round5
  • SABER
  • Three Bears

Also, two of the nine digital signature algorithms are based on lattice problems:

  • CRYSTALS-DILITHIUM
  • FALCON

Based purely on the names, and not on the merits of the algorithms, I hope the winner is one of the methods with a science fiction allusion in the name.

Related posts

Naming elliptic curves used in cryptography

There are an infinite number of elliptic curves, but a small number that are used in elliptic curve cryptography (ECC), and these special curves have names. Apparently there are no hard and fast rules for how the names are chosen, but there are patterns.

The named elliptic curves are over a prime field, i.e. a finite field with a prime number of elements p, denoted GF(p). The number of points on the elliptic curve is on the order of p [1].

The curve names usually contain a number which is the number of bits in the binary representation of p. Let’s see how that plays out with a few named elliptic curves.

Curve nameBits in p
ANSSI FRP256v1256
BN(2, 254)254
brainpoolP256t1256
251
255
Curve383187383
E-222222
E-382382
E-521521
Ed448-Goldilocks448
M-211221
M-383383
M-511511
NIST P-224224
NIST P-256256
256

In Curve25519, p = 2255 – 19 and in Curve 383187, p = 2383 – 187. Here the number of bits in p is part of the name but another number is stuck on.

The only mystery on the list is Curve1174 where p has 251 bits. The equation for the curve is

x² + y² = 1 – 1174 y²

and so the 1174 in the name comes from a coefficient rather than from the number of bits in p.

Edwards curves

The equation for Curve1174 doesn’t look like an elliptic curve. It doesn’t have the familiar (Weierstrass) form

y² = x³ + ax + b

It is an example of an Edwards curve, named after Harold Edwards. So are all the curves above whose names start with “E”. These curves have the form

x² + y² = 1 + d x² y².

where d is not 0 or 1. So some Edwards curves are named after their d parameter and some are named after the number of bits in p.

It’s not obvious that an Edwards curve can be changed into Weierstrass form, but apparently it’s possible; this paper goes into the details.

The advantage of Edwards curves is that the elliptic curve group addition has a simple, convenient form. Also, when d is not a square in the underlying field, there are no exceptional points to consider for group addition.

Is d = -1174 a square in the field underlying Curve1174? For that curve p = 2251 – 9, and we can use the Jacobi symbol code from earlier this week to show that d is not a square.

    p = 2**251 - 9
    d = p-1174
    print(jacobi(d, p))

This prints -1, indicating that d is not a square. Note that we set d to p – 1174 rather than -1174 because our code assumes the first argument is positive, and -1174 and p – 1174 are equivalent mod p.

Update: More on addition on Curve1174.

Prefix conventions

A US government publication (FIPS PUB 186-4) mandates the following prefixes:

  • P for curves over a prime field,
  • B for curves over a binary field (i.e. GF(2n)), and
  • K for Koblitz fields.

The ‘k’ in secp256k1 also stands for Koblitz.

The M prefix above stands for Montgomery.

Related posts

[1] It is difficult to compute the exact number of points on an elliptic curve over a prime field. However, the number is roughly p ± 2√p. More precisely, Hasse’s theorem says

|\#(E/\mathbb{F}_p) - p - 1| \leq 2\sqrt{p}

Sum-product theorem for finite fields

A week ago I wrote about using some Python code to play with the sum-product theorem of Erdős and Szemerédi and its conjectured refinement. This morning I learned that the Erdős-Szemerédi theorem has been extended to finite fields.

David Johnston left a comment saying that he and his colleagues used this extension to finite fields as part of the construction of μRNG, a remarkably efficient true random number generator. The finite field version of Erdős-Szemerédi leads to a way to combine three physical but non-uniform sources of randomness into a uniformly distributed stream of bits.

Bourgain, Katz, and Tao proved that the result

max{|A+A|, |A*A|} ≥ c|A|1+ε

extends to subsets A from a finite field F with conditions on the field F and the size of A relative to F.

It suffices for F to have prime order. More generally, and importantly for applications, it also holds for fields of order 2p for prime p.

Given a constant δ > 0, if the size of the set A satisfies

|F|δ < |A| < |F|1-δ

then the theorem holds where the constant c depends on δ.

Computing Legendre and Jacobi symbols

In a earlier post I introduce the Legendre symbol

\left(\frac{a}{p}\right)

where a is a positive integer and p is prime. It is defined to be 0 if a is a multiple of p, 1 if a has a square root mod p, and -1 otherwise.

The Jacobi symbol is a generalization of the Legendre symbol and uses the same notation. It relaxes the requirement that p be prime and only requires that p is odd.

If m has prime factors pi with exponents ei, then the Jacobi symbol is defined by

\left(\frac{n}{m}\right) = \prod \left(\frac{n}{p_i} \right )^{e_i}

Note that the symbol on the left is a Jacobi symbol while the symbols on the right are Legendre symbols.

The Legendre and Jacobi symbols are not fractions, but they act in some ways like fractions, and so the notation is suggestive. They come up in applications of number theory, so it’s useful to be able to compute them.

Algorithm for computing Jacobi symbols

Since the Legendre symbol is a special case of the Jacobi symbol, we only need an algorithm for computing the latter.

In the earlier post mentioned above, I outline an algorithm for computing Legendre symbols. The code below is more explicit, and more general. It’s Python code, but it doesn’t depend on any libraries or special features of Python, so it could easily be translated to another language. The algorithm is taken from Algorithmic Number Theory by Bach and Shallit. Its execution time is O( (log a)(log n) ).

    def jacobi(a, n):
        assert(n > a > 0 and n%2 == 1)
        t = 1
        while a != 0:
            while a % 2 == 0:
                a /= 2
                r = n % 8
                if r == 3 or r == 5:
                    t = -t
            a, n = n, a
            if a % 4 == n % 4 == 3:
                t = -t
            a %= n
        if n == 1:
            return t
        else:
            return 0

Testing the Python code

To test the code we randomly generate positive integers a and odd integers n greater than a. We compare our self-contained Jacobi symbol function to the one in SymPy.

    N = 1000
    for _ in range(100):
        a = randrange(1, N)
        n = randrange(a+1, 2*N)
        if n%2 == 0:
            n += 1
            
        j1 = jacobi_symbol(a, n)
        j2 = jacobi(a, n)
        if j1 != j2:
            print(a, n, j1, j2)

This prints nothing, suggesting that we coded the algorithm correctly.

Related posts

Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we take products instead, how many distinct products are there?

Proven results

Erdős and Szemerédi proved that there are constants c and ε > 0 such that

max{|A+A|, |A*A|} ≥ c|A|1+ε

In other words, either A+A or A*A is substantially bigger than A. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |A+A| or |A*A| is bounded below by a fixed multiple of |A|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size N from the non-negative integers less than R, then computes the sum and product sets using set comprehensions.

    from numpy.random import choice

    def trial(R, N):
        # R = integer range, N = sample size
        x = choice(R, N, replace=False)
        s = {a+b for a in x for b in x}
        p = {a*b for a in x for b in x}
        return (len(s), len(p))

When I first tried this code I thought it had a bug. I called trial 10 times and got the same values for |A+A| and |A*A| every time. That was because I chose R large relative to N. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if R >> N, it is likely that |A+A| and |A*A| will both equal N(N+1)/2. Things get more interesting when N is closer to R.

Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about lower bounds on |A+A| and |A*A| you could ask for the distribution on |A+A| and |A*A| when the sets A are drawn from some probability distribution.

If the set A is drawn from a continuous distribution, then |A+A| and |A*A| both equal N(N+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case R >> N above.

If the set A is an arithmetic sequence then |A+A| is small and |A*A| is large, and the opposite holds if A is a geometric sequence. So it might be interesting to look at the correlation of |A+A| and |A*A| when A comes from a discrete distribution, such as choosing N integers uniformly from [1, R] when N/R is not too small.