Misleading plots of elliptic curves

The elliptic curves used in cryptography are over finite fields. They’re not “curves” at all in the colloquial sense of the word. But they are defined by analogy with continuous curves, and so most discussions of elliptic curves in cryptography start by showing a plot of a real elliptic curve.

Here’s a plot of y² = x³ + 2 for real x and y.

That’s fine as far as it goes. Resources quickly go on to say that the curves they’ll be looking at are discrete, and so they may then add something like the following. Here’s the same elliptic curve as above over the integers mod 11.

And again mod 997.

These plots are informative in a couple ways. First, they show that the elements of the curve are discrete points. Second, they show that the points are kinda randomly distributed, which hints at why elliptic curves might be useful in cryptography.

But the implied density of points is entirely wrong. It implies that the density of elliptic curve points increases with the field size increases, when in fact the opposite is true. The source of the confusion is using dots of constant size. A checkerboard graph would be better. Here’s a checkerboard plot for the curve mod 11.

If we were to do the same for the curve mod 997 we’d see nothing. The squares would be too small and too few to see. Here’s a plot for the curve mod 211 that gives a hint of the curve elements as dust in the plot.

An elliptic curve over the integers modulo a prime p lives in a p by p grid, and the number of points satisfying the equation of an elliptic curve is roughly p. So the density of points in the grid that belong to the curve is 1/p.

The elliptic curves used in cryptography are over fields of size 2255 or larger, and so the probability that a pair of x and y values chosen at random would be on the curve is on the order of 2−255, virtually impossible.

We can be more precise than saying the number of points on the curve is roughly p. If p isn’t too large, we can count the number of points on an elliptic curve. For example, the curve

y² = x³ + 2 mod p

has 12, 199, and 988 points when p = 11, 211, and 997. (If you count the points in the plots above for p = 11 you’ll get 11 points. Elliptic curves always have an extra point at infinity not shown.)

Hasse’s theorem gives upper and lower bounds for the number of points N on an elliptic curve over a field of p elements with p prime:

|N − (p + 1)| ≤ 2 √p.

The heuristic for this is that the right hand side of the equation defining an elliptic curve

x³ + ax + b

is a square mod p above half the time. When it is a square, it corresponds to two values of y and when it is not it corresponds to zero points. So on average an elliptic curve mod p has around p ordinary points and one point at infinity.

Related posts

Memorable Primes

The other day I needed to test some software with a moderately large prime number as input. The first thing that came to mind was 8675309. This would not be a memorable number except it was in the chorus of the song 867-5309/Jenny.

Tommy Tutone 867-5309/Jenny single

It turned out that 8675309 was too large. The software would take too long to run with this input. It occurred to me that while I could quickly come up with a prime number with seven digits, nothing came to mind immediately with four, five, or six digits. This post will share some numbers I came up with.

Symmetry

Symmetry makes things more memorable, so one thing to try is finding primes with a symmetric pattern of digits. The numbers 13331 and 18181 are examples of palendromic primes, prime numbers that read the same forwards and backwards. You can find more palendromic primes here.

This won’t help find memorable primes with four or six digits because all palendromic numbers with an even number of digits are divisible by 11.

Consecutive digits

The number 123457 is a memorable six-digit prime, being the first seven digits with six removed. Maybe there are other variations that are memorable. (Or the you find memorable; memorability is subjective.)

Consecutive digits of π

A variation on looking for primes in consecutive integers is to look for primes in consecutive digits of another number, the most famous being π.

314159 is a pi prime, and it’s memorable if you know the value of π to at least six digits. This won’t help us find a memorable four-digit prime since 3141 is divisible by 9.

Years

The last prime number year that we lived through was 2017, and the next, God willing, will be 2027.

1999 was a prime. It was also the title of a song that came out in 1982, one year after the song 867-5309/Jenny.

Powers of 2

Programmers will likely recognize 65536 as 216, and 65537 is prime, the largest known Fermat prime.

Whether something is memorable depends on what previous memories you can connect it to. Many people are familiar with powers of 2 and would find primes of the form 2k ± 1 memorable.

The number 213 = 8192 is not as familiar as 216 = 65536, but 8191 is a four-digit prime.

Summary

Having written this post, the following responses are likely what would come to mind if you asked me for primes of various lengths.

  1. 7
  2. 97
  3. 997
  4. 1999
  5. 18181
  6. 314159
  7. 8675309

 

Counting points on an elliptic curve

Suppose you have an elliptic curve

y² = x³ + ax + b

over a finite field Fp for prime p. How many points are on the curve?

Brute force

You can count the number of points on the curve by brute force, as I did here. Loop through each of the p possibilities for x and for y and count how many satisfy the curve’s equation, then add one for the point at infinity. This is the most obvious but slowest approach, taking O(p²) time.

Here’s a slight variation on the code posted before. This time, instead of passing in the function defining the equation, we’ll assume the curve is in the form above (short Weierstrass form) and pass in the parameters a and b. This will work better when we refine the code below.

def order(a, b, p):
    c = 1 # The point at infinity
    for x in range(p):
        for y in range(p):
            if (y**2 - x**3 - a*x - b) % p == 0:
                c += 1
    return c

Better algorithm

A better approach would be to loop over the x values but not the y‘s. For each x, test determine whether

x³ + ax + b

is a square mod p by computing the Legendre symbol. This takes O(log³ p) time [1], and we have to do it for p different values of x, so the run time is O(p log³ p).

from sympy import legendre_symbol

def order2(a, b, p):
    c = 1 # The point at infinity
    for x in range(p):
        r = x**3 + a*x + b
        if r % p == 0:
            c += 1 # y == 0
        elif legendre_symbol(r, p) == 1:
            c += 2
    return c

Schoof’s algorithm

There’s a more efficient algorithm, Schoof’s algorithm. It has run time O(logk p) but I’m not clear on the value of k. I’ve seen k = 8 and k = 5. I’ve also seen k left unspecified. In any case, for very large p Schoof’s algorithm will be faster than the one above. However, Schoof’s algorithm is much more complicated, and the algorithm above is fast enough if p isn’t too large.

Comparing times

Let’s take our log to be log base 2; all logs are proportional, so this doesn’t change the big-O analysis.

If p is on the order of a million, i.e. around 220, then the brute force algorithm will have run time on the order of 240 and the improved algorithm will have run time on the order of 220 × 20³ ≈ 233. If k = 8 in Schoof’s algorithm, its runtime will be on the order of 208 ≈ 234, so roughly the same as the previous algorithm.

But if p is on the order of 2256, as it often is in cryptography, then the three algorithms have runtimes on the order of 2512, 2270, and 264. In this case Schoof’s algorithm is expensive to run, but the others are completely unfeasible.

[1] Note that logk means (log q)k, not log applied k times. It’s similar to the convention for sine and cosine.

Making the two-dimensional one-dimensional

We often want to reduce something that’s inherently two-dimensional into something one-dimensional. We want to turn graph into a list.

And we’d like to do this with some kind of faithfulness. We’d like things that are close together in 2D space to be close together in their 1D representation, and vice versa, to the extent possible.

For example, postal codes are a way of imposing a linear order on geographic regions. You would like (or maybe naively assume) that regions whose zip codes are numerically close together are geographically close together. This is roughly true. See this post to explore that further.

Tours are another way to turn a graph into a list. A Traveling Salesman tour is a path of shortest length through a set of points. For example, here is a Traveling Salesman tour of Texas counties. Counties that are visited consecutively are close together, though it may take a long time to come back to a county close to the one you’re in at a given time.

Sometimes there are purely mathematical reasons to flatten a 2D structure into a linear tour, such as Hilbert curves or Cantor’s diagonal trick.

All this came to mind because I saw a post on Hacker News this morning about a way to enumerate a zigzag spiral.

The remarkable thing about this article is that the author gives a sequence of closed-form expressions for the number at position (mn) in the grid.

Related posts

Factoring RSA100

Earlier today I wrote about factoring four 255-bit numbers that I needed for a post. Just out of curiosity, I wanted to see how long it would take to factor RSA 100, the smallest of the factoring challenges posed by RSA Laboratories in 1991. This is a 100-digit (330-bit) number that is the product of two primes.

I used the CADO-NFS software. The software was developed in France, and CADO is a French acronym for Crible Algébrique: Distribution, Optimisation. NFS stands for number field seive, the fastest algorithm for factoring numbers with over 100 digits.

RSA 100 was first factored in 1991 using a few days of compute time on an MP1 MasPar computer, a machine that cost $500,000 at the time, equivalent to around $1,250,000 today.

My effort took about 23 minutes (1376 seconds) on a System 76 Meerkat mini that I paid $600 for in 2022.

The MP1 was about the size of a refrigerator. The Meerkat is about 3″ × 3″ × 1.5″.

Pairing-unfriendly curves

A couple days ago I wrote about two pair of closely related elliptic curves: Tweedledum and Tweedledee, and Pallas and Vesta.

In each pair, the order of one curve is the order of the base field of the other curve. The curves in each pair are used together in cryptography, but they don’t form a “pairing” in the technical sense of a bilinear pairing, and in fact none of the curves are “pairing-friendly” as described below.

An elliptic curve E/Fq is said to be pairing-friendly if r divides qk − 1 for some small k. Here r is the size of the largest prime-order subgroup of the curve, but since our curves have prime order p, r = p.

As for what constitutes a small value of k, something on the order of 10 would be considered small. The larger k is, the less pairing-friendly the curve is. We will show that our curves are extremely pairing-unfriendly.

Since q is not a multiple of p in our examples, there must be some power of q such at

qk = 1 mod p.

The question is whether k is large, i.e. whether the order of q mod p is large. We could try successive values of k, but that won’t get us very far. To be more clever, we use Lagrange’s theorem that says the order of an element divides the order of the group. So k must be one of the factors of p − 1. (We subtract 1 because we’re looking at the multiplicative group mod p, which removes 0.)

Finding the divisors of n − 1 requires factoring n − 1, which isn’t easy, but isn’t insurmountable either. The previous post reports the time required to do this in Python and in Mathematica for each of the following values of n.

p = 2254 + 4707489544292117082687961190295928833
q = 2254 + 4707489545178046908921067385359695873
r = 2254 + 45560315531419706090280762371685220353
s = 2254 + 45560315531506369815346746415080538113

Tweedledum has order p and its base field has order q.

k = 28948022309329048855892746252171976963322203655954433126947083963168578338816

Tweedledee has order q and its base field has order p.

k = 28948022309329048855892746252171976963322203655955319056773317069363642105856

Vesta has order r and its base field has order s.

k = 14474011154664524427946373126085988481681528240970780357977338382174983815168

Pallas has order s and its base field has order r.

k = 14474011154664524427946373126085988481681528240970823689839871374196681474048

It’s safe to say in each case k is not a small number.

 

Time to factor big integers Python and Mathematica

This post will look at the time required to factor n − 1 each of the following prime numbers in Python (SymPy) and Mathematica. The next post will explain why I wanted to factor these numbers.

p = 2254 + 4707489544292117082687961190295928833
q = 2254 + 4707489545178046908921067385359695873
r = 2254 + 45560315531419706090280762371685220353
s = 2254 + 45560315531506369815346746415080538113

Here are the timing results.

    |   |   Python | Mathematica |
    |---+----------+-------------|
    | p |    0.913 |       0.616 |
    | q |    0.003 |       0.002 |
    | r |  582.107 |      14.915 |
    | s | 1065.925 |      20.763 |

This is hardly a carefully designed benchmark, but it’s enough to suggest Mathematica can be a couple orders of magnitude faster than Python.

Here are the factorizations.

p − 1 = 234 × 3 × 4322432633228119 × 129942003317277863333406104563609448670518081918257
q − 1 = 233 × 3 × 5179 × 216901160674121772178243990852639108850176422522235334586122689
r − 1 = 232 × 32 × 463 × 539204044132271846773 × 8999194758858563409123804352480028797519453
s − 1 = 232 × 32 × 1709 × 24859 × 1690502597179744445941507 × 10427374428728808478656897599072717

Cycles of Elliptic Curves

The previous post gave two examples of pairs of elliptic curves in which

#(EFp) = q

and

#(EFq) = p.

That is, the curve E, when defined over integers mod p has q elements, and when defined over the integers mod q has p elements.

Pairs

Silverman and Stange [1] call this arrangement an amicable pair. They found a small pair of amicable elliptic curves:

y² = x³ + 2

with p and q equal to 13 and 19. They give many other examples, but this one is nice because it’s small enough for hand calculations, unlike the curves mentioned in the previous post that had on the order of 2254 elements.

Cycles

More generally, amicable curve pairs are amicable curve cycles with cycle length 3. Silverman and Stange give this example of a cycle of length 3:

y² = x³ − 25x − 8

with (p, q, r) = (83, 79, 73). That is, the curve over F83 has 79 elements, the curve over F79 has 73 elements, and the curve over F73 has 83 elements.

The authors show that there exist cycles of every length m ≥ 2.

Application

Why does any of this matter? Cycles of elliptic curves are useful in cryptography, specifically in zero-knowledge proofs. I hope to go into this further in some future post.

Confirmation

The curves mentioned above are small enough that we can compute the orders of the curves quickly by brute force. The following Python code confirms the claimed orders above.

def order(eqn, p):
    c = 1 # The point at infinity
    for x in range(p):
        for y in range(p):
            if eqn(x, y) % p == 0:
                c += 1
    return c

eqn = lambda x, y: y**2 - x**3 - 2
assert( order( eqn, 13) == 19 )
assert( order( eqn, 19) == 13 )

eqn = lambda x, y: y**2 - x**3 + 25*x + 8
assert( order( eqn, 83) == 79 )
assert( order( eqn, 79) == 73 )
assert( order( eqn, 73) == 83 )

Related posts

[1] Joseph H. Silverman and Katherine E. Stange. Amicable pairs and aliquot cycles for elliptic curves. Experimental Mathematics, 20(3):329–357, 2011.

Pallas, Vesta, and Zcash

Yesterday’s post mentioned Tweedledum and Tweedledee, a pair of elliptic curves named after characters in Lewis Carroll’s Through the Looking Glass.

Zcash uses a very similar pair of elliptic curves for zero-knowledge proofs: Pallas and Vesta. One named after a Greek goddess associated with war, and one named after a Roman goddess associated with home and hearth. (Zcash used the Jubjub curve, also mentioned yesterday, though not in the most recent release.)

The curves Pallas and Vesta are collectively known as the “pasta” curves, pasta being a portmanteau of pallas and vesta.

pa(llas ve)sta

Tweedledum, Tweedledee, Pallas, and Vesta all have the equation

y² = x³ + 5

over different prime-order fields.

The number of elements in Tweedledum’s field equals the number of elements in Tweedledee’s elliptic curve, and vice versa. The same relationship holds between Pallas and Vesta. And all four curves have roughly 2254 points.

If p is the size of Tweedledum’s field (and Tweedledee’s curve), and q is the size of Tweedledee’s field (and Tweedledum’s curve),

p = 2254 + 4707489545178046908921067385359695873
q = 2254 + 4707489544292117082687961190295928833

If p is the size of Pallas’ field (and Vesta’s curve), and q is the size of Vesta’s field (and Pallas’ curve),

p = 2254 + 45560315531419706090280762371685220353
q = 2254 + 45560315531506369815346746415080538113

Lewis Carroll and Zero Knowledge Proofs

Illustration from Through the Looking Glass

Elliptic curves are often used in cryptography, and in particular they are used in zero-knowledge proofs (ZKP). Cryptocurrencies such as Zcash use ZKP to protect the privacy of users.

Several of the elliptic curves used in ZKP have whimsical names taken from characters by Lewis Carroll. This post will look at these five elliptic curves:

  • Jubjub
  • Baby Jubjub
  • Bandersnatch
  • Tweedledee
  • Tweedledum

Charles Dodgson was a mathematician, and perhaps there’s some connection from his mathematical work to elliptic curves and ZKP, the connection explored here is with his literary works written under the name Lewis Carroll.

Jabberwocky names

The first three curves—Jubjub, Baby Jubjub, and Bandersnatch—all get their name from Lewis Carroll’s poem Jabberwocky.

“Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!”

These curves all have a have a twisted Edwards curve form and a Montgomery curve form, just like the relationship between Ed25519 and Curve25519 that I wrote about a few days ago.

As its name suggests, the Baby Jubjub elliptic curve is related to the Jubjub curve but smaller.

Bandersnatch is similar to Jubjub, but arithmetic over this curve can be implemented more efficiently.

Looking Glass names

The last two curves—Tweedledum and Tweedledee—take their names from Through the Looking Glass.

And as their names suggest, Tweedledum and Tweedledee and very closely related. Both have the equation

y² = x³ + 5

but over different fields. Tweedledum is defined over the integers mod p and has q elements. Tweedledee is defined over the integers mod q and has p elements. (Mind your ps and qs!)

Here

p = 2254 + 4707489545178046908921067385359695873
q = 2254 + 4707489544292117082687961190295928833

More Lewis Carroll posts

More elliptic curve posts