What is an elliptic curve?

Elliptic curves are pure and applied, concrete and abstract, simple and complex.

Elliptic curves have been studied for many years by pure mathematicians with no intention to apply the results to anything outside math itself. And yet elliptic curves have become a critical part of applied cryptography.

Elliptic curves are very concrete. There are some subtleties in the definition—more on that in a moment—but they’re essentially the set of point satisfying a simple equation. And yet a lot of extremely abstract mathematics has been developed out of necessity to study these simple objects. And while the objects are in some sense simple, the questions that people naturally ask about them are far from simple.

Preliminary definition

A preliminary definition of an elliptic curve is the set of points satisfying

y² = x³ + ax + b.

This is a theorem, not a definition, and it requires some qualifications. The values xya, and b come from some field, and that field is an important part of the definition of an elliptic curve. If that field is the real numbers, then all elliptic curves do have the form above, known as the Weierstrass form. For fields of characteristic 2 or 3, the Weierstrass form isn’t general enough. Also, we require that

4a³ + 27b² ≠ 0.

The other day I wrote about Curve1174, a particular elliptic curve used in cryptography. The points on this curve satisfy

x² + y² = 1 – 1174 x² y²

This equation does not specify an elliptic curve if we’re working over real numbers. But Curve1174 is defined over the integers modulo p = 2251 – 9. There it is an elliptic curve. It is equivalent to a curve in Weierstrass, though that’s not true when working over the reals. So whether an equation defines an elliptic curve depends on the field the constituents come from.

Not an ellipse, not a curve

An elliptic curve is not an ellipse, and it may not be a curve in the usual sense.

There is a connection between elliptic curves and ellipses, but it’s indirect. Elliptic curves are related to the integrals you would write down to find the length of a portion of an ellipse.

Working over the real numbers, an elliptic curve is a curve in the geometric sense. Working over a finite field, an elliptic curve is a finite set of points, not a continuum. Working over the complex numbers, an elliptic curve is a two-dimensional surface. The name “curve” is extended by analogy to elliptic curves over general fields.

Final definition

In this section we’ll give the full definition of an algebraic curve, though we’ll be deliberately vague about some of the details.

The definition of an elliptic curve is not in terms of equations of a particular form. It says an elliptic curve is a

  • smooth,
  • projective,
  • algebraic curve,
  • of genus one,
  • having a specified base point.

Working over real numbers, smoothness can be specified in terms of derivatives. But that does smoothness mean working over a finite field? You take the derivative equations from the real case and extend them by analogy to other fields. You can “differentiate” polynomials in settings where you can’t take limits by defining derivatives algebraically. (The condition 4a³ + 27b² ≠ 0 above is to guarantee smoothness.)

Informally, projective means we add “points at infinity” as necessary to make things more consistent. Formally, we’re not actually working with pairs of coordinates (xy) but equivalence classes of triples of coordinates (x, yz). You can usually think in terms of pairs of values, but the extra value is there when you need it to deal with points at infinity.

An algebraic curve is the set of points satisfying a polynomial equation.

The genus of an algebraic curve is roughly the number of holes it has. Over the complex numbers, the genus of an algebraic curve really is the number of holes. As with so many ideas in algebra, a theorem from a familiar context is taken as a definition in a more general context.

We haven’t talked about the idea of a base point. There’s a way to add points on an elliptic curve, and the choice of a base point is a choice of where to put the additive identity. In the post on Curve1174, we go into the addition in detail, and the base point is (0, 1). In elliptic curve cryptography, the choice of base point can be very important. This post gives an example, specifying the base point on a curve used in the implementation of Bitcoin.

Addition on Curve1174

I’ve written about elliptic curve and alluded to the fact that there’s a special kind of addition for points on the curve. But I haven’t gone into details because it’s more complicated than I wanted to get into.

However, there’s a special case where the details are not complicated, the so called Edwards curves. I’ll look briefly at Edwards curves in general, then focus on Curve1174, a particular Edwards curve used in cryptography.

The example here could be used in an introductory group theory course with no reference to elliptic curves. Just think of it as a funny way to add pairs of integers. Continue reading

Naming elliptic curves used in cryptography

There are an infinite number of elliptic curves, but a small number that are used in elliptic curve cryptography (ECC), and these special curves have names. Apparently there are no hard and fast rules for how the names are chosen, but there are patterns.

The named elliptic curves are over a prime field, i.e. a finite field with a prime number of elements p, denoted GF(p). The number of points on the elliptic curve is on the order of p [1].

The curve names usually contain a number which is the number of bits in the binary representation of p. Let’s see how that plays out with a few named elliptic curves.

Curve nameBits in p
ANSSI FRP256v1256
BN(2, 254)254
NIST P-224224
NIST P-256256

In Curve25519, p = 2255 – 19 and in Curve 383187, p = 2383 – 187. Here the number of bits in p is part of the name but another number is stuck on.

The only mystery on the list is Curve1174 where p has 251 bits. The equation for the curve is

x² + y² = 1 – 1174 y²

and so the 1174 in the name comes from a coefficient rather than from the number of bits in p.

Edwards curves

The equation for Curve1174 doesn’t look like an elliptic curve. It doesn’t have the familiar (Weierstrass) form

y² = x³ + ax + b

It is an example of an Edwards curve, named after Harold Edwards. So are all the curves above whose names start with “E”. These curves have the form

x² + y² = 1 + d x² y².

where d is not 0 or 1. So some Edwards curves are named after their d parameter and some are named after the number of bits in p.

It’s not obvious that an Edwards curve can be changed into Weierstrass form, but apparently it’s possible; this paper goes into the details.

The advantage of Edwards curves is that the elliptic curve group addition has a simple, convenient form. Also, when d is not a square in the underlying field, there are no exceptional points to consider for group addition.

Is d = -1174 a square in the field underlying Curve1174? For that curve p = 2251 – 9, and we can use the Jacobi symbol code from earlier this week to show that d is not a square.

    p = 2**251 - 9
    d = p-1174
    print(jacobi(d, p))

This prints -1, indicating that d is not a square. Note that we set d to p – 1174 rather than -1174 because our code assumes the first argument is positive, and -1174 and p – 1174 are equivalent mod p.

Update: More on addition on Curve1174.

Related posts

[1] It is difficult to compute the exact number of points on an elliptic curve over a prime field. However, the number is roughly p ± 2√p. More precisely, Hasse’s theorem says

|\#(E/\mathbb{F}_p) - p - 1| \leq 2\sqrt{p}

Entropy extractor used in μRNG

Yesterday I mentioned μRNG, a true random number generator (TRNG) that takes physical sources of randomness as input. These sources are independent but non-uniform. This post will present the entropy extractor μRNG uses to take non-uniform bits as input and produce uniform bits as output.

We will present Python code for playing with the entropy extractor. (μRNG is extremely efficient, but the Python code here is not; it’s just for illustration.) The code will show how to use the pyfinite library to do arithmetic over a finite field.

Entropy extractor

The μRNG generator starts with three bit streams—X, Y, and Z—each with at least 1/3 bit min-entropy per bit.

Min-entropy is Rényi entropy with α = ∞. For a Bernoulli random variable, that takes on two values, one with probability p and the other with probability 1-p, the min-entropy is

-log2 max(p, 1-p).

So requiring min-entropy of at least 1/3 means the two probabilities are less than 2-1/3 = 0.7937.

Take eight bits (one byte) at a time from XY, and Z, and interpret each byte as an element of the finite field with 28 elements. Then compute


in this field. The resulting stream of bits will be independent and uniformly distributed, or very nearly so.

Python implementation

We will need the bernoulli class for generating our input bit streams, and the pyfinite for doing finite field arithmetic on the bits.

    from scipy.stats import bernoulli
    from pyfinite import ffield

And we will need a couple bit manipulation functions.

    def bits_to_num(a):
        "Convert an array of bits to an integer."
        x = 0
        for i in range(len(a)):
            x += a[i]*2**i
        return x

    def bitCount(n):
        "Count how many bits are set to 1."
        count = 0
            n &= n - 1
            count += 1
        return count 

The following function generates random bytes using the entropy extractor. The input bit streams have p = 0.79, corresponding to min-entropy 0.34.

    def generate_byte():
        "Generate bytes using the entropy extractor."
        b = bernoulli(0.79)
        x = bits_to_num(b.rvs(8))
        y = bits_to_num(b.rvs(8))
        z = bits_to_num(b.rvs(8)) 

        F = ffield.FField(8)
        return F.Add(F.Multiply(x, y), z)

Note that 79% of the bits produced by the Bernoulli generator will be 1’s. But we can see that the output bytes are about half 1’s and half 0’s.

    s = 0
    N = 1000
    for _ in range(N):
        s += bitCount( generate_byte() )
    print( s/(8*N) )

This returned 0.50375 the first time I ran it and 0.49925 the second time.

For more details see the μRNG paper.

Related posts


Solving for probability given entropy

If a coin comes up heads with probability p and tails with probability 1-p, the entropy in the coin flip is

S = –p log2 p – (1-p) log2 (1-p).

It’s common to start with p and compute entropy, but recently I had to go the other way around: given entropy, solve for p. It’s easy to come up with an approximate solution.

entropy and approximation

Entropy in this case is approximately quadratic

S ≈ 4p(1-p)

and so

p ≈ (1 ± √(1-S))/2.

This is a good approximation if S is near 0 or 1 but mediocre in the middle. You could use solve for p numerically, say with Newton’s method, to get more accuracy if needed.

Missing information anxiety

A recurring theme in math is that you may not need to do what it looks like you need to do. There may be a shortcut to where you want to go. A special case of this is that you may not need all the information that you think you need.

For example, if you need to know the last digit of a×b, it might seem you need to know a and b so you can multiply them together. But you only have to know the last digits of a and b. In fact, if one of the last digits is 0, that’s all you need to know.

As a math consultant, I often tell clients they don’t need information that they think they need. That news may come as a relief, or it may cause anxiety. I may tell a client, for instance, that missing data cannot change a conclusion, so it’s not worth waiting for. Whether that brings relief or anxiety depends on whether they believe me.

There’s a physics demonstration where you have a heavy ball on a long cable. You pull back the ball like a pendulum and let it touch your chin. Then let the ball go and stand still. If you’re convinced of the physical laws governing the motion of the ball, you can stand there without flinching. You know that just as it left your chin with zero velocity, it will return with zero velocity. In fact, because of friction, it won’t quite return to your chin. But if you’re not fully convinced of this explanation, you’ll be afraid that a wrecking ball is about to smash your face, and understandably so.

When you tell clients that they don’t need something they think they need, it may come across as if you’re asking them to stand still as a wrecking ball swings toward their face. It’s not enough to be correct; you need to be persuasive as well. Examples help. As with the physics demo, you can put your own own face on the line before asking them to do the same.

Sum-product theorem for finite fields

A week ago I wrote about using some Python code to play with the sum-product theorem of Erdős and Szemerédi and its conjectured refinement. This morning I learned that the Erdős-Szemerédi theorem has been extended to finite fields.

David Johnston left a comment saying that he and his colleagues used this extension to finite fields as part of the construction of μRNG, a remarkably efficient true random number generator. The finite field version of Erdős-Szemerédi leads to a way to combine three physical but non-uniform sources of randomness into a uniformly distributed stream of bits.

Bourgain, Katz, and Tao proved that the result

max{|A+A|, |A*A|} ≥ c|A|1+ε

extends to subsets A from a finite field F with conditions on the field F and the size of A relative to F.

It suffices for F to have prime order. More generally, and importantly for applications, it also holds for fields of order 2p for prime p.

Given a constant δ > 0, if the size of the set A satisfies

|F|δ < |A| < |F|1-δ

then the theorem holds where the constant c depends on δ.

Computing Legendre and Jacobi symbols

In a earlier post I introduce the Legendre symbol


where a is a positive integer and p is prime. It is defined to be 0 if a is a multiple of p, 1 if a has a square root mod p, and -1 otherwise.

The Jacobi symbol is a generalization of the Legendre symbol and uses the same notation. It relaxes the requirement that p be prime and only requires that p is odd.

If m has prime factors pi with exponents ei, then the Jacobi symbol is defined by

\left(\frac{n}{m}\right) = \prod \left(\frac{n}{p_i} \right )^{e_i}

Note that the symbol on the left is a Jacobi symbol while the symbols on the right are Legendre symbols.

The Legendre and Jacobi symbols are not fractions, but they act in some ways like fractions, and so the notation is suggestive. They come up in applications of number theory, so it’s useful to be able to compute them.

Algorithm for computing Jacobi symbols

Since the Legendre symbol is a special case of the Jacobi symbol, we only need an algorithm for computing the latter.

In the earlier post mentioned above, I outline an algorithm for computing Legendre symbols. The code below is more explicit, and more general. It’s Python code, but it doesn’t depend on any libraries or special features of Python, so it could easily be translated to another language. The algorithm is taken from Algorithmic Number Theory by Bach and Shallit. Its execution time is O( (log a)(log n) ).

    def jacobi(a, n):
        assert(n > a > 0 and n%2 == 1)
        t = 1
        while a != 0:
            while a % 2 == 0:
                a /= 2
                r = n % 8
                if r == 3 or r == 5:
                    t = -t
            a, n = n, a
            if a % 4 == n % 4 == 3:
                t = -t
            a %= n
        if n == 1:
            return t
            return 0

Testing the Python code

To test the code we randomly generate positive integers a and odd integers n greater than a. We compare our self-contained Jacobi symbol function to the one in SymPy.

    N = 1000
    for _ in range(100):
        a = randrange(1, N)
        n = randrange(a+1, 2*N)
        if n%2 == 0:
            n += 1
        j1 = jacobi_symbol(a, n)
        j2 = jacobi(a, n)
        if j1 != j2:
            print(a, n, j1, j2)

This prints nothing, suggesting that we coded the algorithm correctly.

Related posts

RSA implementation flaws

Implementation flaws in RSA encryption make it less secure in practice than in theory.

RSA encryption depends on 5 numbers:

  • Large primes p and q
  • The modulus npq
  • Encryption key e
  • Decryption key d

The numbers pq, and d are kept secret, and the numbers e and n are made public. The encryption method relies on the assumption that in practice one cannot factor n into p and q.

All five numbers should be chosen anew for each certificate [1], but in practice numbers are sometimes reused.

Duplicate primes

The numbers p and q should be unique to each use of the method, but in practice there have been instances of duplicate primes, where two instances may have one of their two primes in common, which lets you factor the modulus using the Euclidean algorithm.

Duplicate encryption keys

The encryption key e does not need to be unique, or so we thought. In practice the encryption exponent e is usually 65537, i.e. 216 + 1, because this makes implementation faster. Once study found that this exponent was used 99.5% of the time. However, this allows an attack on RSA encryption if any of the bits of p or q can be recovered from computer memory. More on this here.

Duplicate moduli

The author of this post found several instances of certificates with shared moduli. One way this could happen is if a negligent certificate authority recycles pq, and n, only generating new e and d pairs for each user. If you and someone else share a modulus n, you can use your (ed) pair to factor n, and from knowing their public key e you can recover their private key d.

Duplicate moduli cannot happen by chance. As described here, the probability of having one shared prime due to random selection is roughly the probability of winning the Powerball jackpot 40 times in a row. The probability of having two shared primes due to random selection is inconceivably small.

Related posts

[1] Everyone agrees that pq, and hence n should be unique. This also means that d will be unique. But there is disagreement over whether the exponent e should be unique, though reusing e does lead to a possible attack as described here.

Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we take products instead, how many distinct products are there?

Proven results

Erdős and Szemerédi proved that there are constants c and ε > 0 such that

max{|A+A|, |A*A|} ≥ c|A|1+ε

In other words, either A+A or A*A is substantially bigger than A. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |A+A| or |A*A| is bounded below by a fixed multiple of |A|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size N from the non-negative integers less than R, then computes the sum and product sets using set comprehensions.

    from numpy.random import choice

    def trial(R, N):
        # R = integer range, N = sample size
        x = choice(R, N, replace=False)
        s = {a+b for a in x for b in x}
        p = {a*b for a in x for b in x}
        return (len(s), len(p))

When I first tried this code I thought it had a bug. I called trial 10 times and got the same values for |A+A| and |A*A| every time. That was because I chose R large relative to N. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if R >> N, it is likely that |A+A| and |A*A| will both equal N(N+1)/2. Things get more interesting when N is closer to R.

Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about lower bounds on |A+A| and |A*A| you could ask for the distribution on |A+A| and |A*A| when the sets A are drawn from some probability distribution.

If the set A is drawn from a continuous distribution, then |A+A| and |A*A| both equal N(N+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case R >> N above.

If the set A is an arithmetic sequence then |A+A| is small and |A*A| is large, and the opposite holds if A is a geometric sequence. So it might be interesting to look at the correlation of |A+A| and |A*A| when A comes from a discrete distribution, such as choosing N integers uniformly from [1, R] when N/R is not too small.