Addition on Curve1174

I’ve written about elliptic curve and alluded to the fact that there’s a special kind of addition for points on the curve. But I haven’t gone into details because it’s more complicated than I wanted to get into.

However, there’s a special case where the details are not complicated, the so called Edwards curves. I’ll look briefly at Edwards curves in general, then focus on Curve1174, a particular Edwards curve used in cryptography.

The example here could be used in an introductory group theory course with no reference to elliptic curves. Just think of it as a funny way to add pairs of integers.

Addition on Edwards curves

For a particular class of elliptic curve, Edwards curves, the addition formula is simpler than usual. As mentioned a few days ago, an Edwards curve has the form

x² + y² = 1 + d x² y²

where d is not 0 or 1 in the underlying finite field. Then addition on the curve is given by

(x_1,y_1) + (x_2,y_2) = \left( \frac{x_1 y_2 + x_2 y_1}{1 + dx_1 x_2 y_1 y_2}, \frac{y_1 y_2 - x_1 x_2}{1 - dx_1 x_2 y_1 y_2} \right)

When d is a square, there are some exceptions. When d is not a square, as will be the case in our application, the denominators are never zero, and so the formula above is all there is to the addition rule.

Note that the division in the formula above is division in the underlying finite field, i.e. multiplication by the multiplicative inverse.

Curve1174

We’re interested in Curve1174, a particular elliptic curve used in cryptography. The underlying field is GF(p), the integers modulo the prime p = 2251 – 9. Also, d = -1174, from whence the curve takes its name.

Plotting Curve1174 over the reals

The plot above shows what Curve1174 looks like over the real numbers, though we’re interested in the curve over the integers mod p. (By the way, if d > 0 you get a curve that looks like a squircle.)

We consider the pairs of integers (x, y) that lie on the curve, i.e. those that satisfy

x² + y² = 1 + d x² y² mod p.

You can show that the sum of two points on the curve is another point on the curve, if you define addition with the formula above. The identity element for addition is the pair (0, 1). The additive inverse of a point (xy) is the point (-xy). So we have a group. Addition is commutative, and so in fact we have an Abelian group.

Python code

We can implement addition on Curve1174 in a few lines of Python.

from sympy import mod_inverse

def divide(a, b, p):
    "Compute a/b in GF(p)"
    return (a*mod_inverse(b, p))%p

def group_add(x1, y1, x2, y2, p, d):
    x3 = divide(x1*y2 + x2*y1, 1 + d*x1*x2*y1*y2, p)
    y3 = divide(y1*y2 - x1*x2, 1 - d*x1*x2*y1*y2, p)
    return (x3, y3)

The only thing we needed SymPy for was the mod_inverse function. It wouldn’t take much work to write your own mod_inverse function from scratch using the method outlined here using a variation on the Euclidean algorithm.

It’s clear that (1, 0) is a point on the curve, and so we can add it to itself with the code

p = 2**251 - 9
d = -1174
print(group_add(1, 0, 1, 0, p, d))

and find that it equals

(0, 3618502788666131106986593281521497120414687020801267626233049500247285301238),

which may come as a bit of a surprise. Arithmetic here is not intuitive; it scrambles up points well, which hints at why the curve is useful in cryptography.

Let’s find another point on the curve. Let’s set x = 2019 and see what y is. When we come up with the equation y must satisfy, the Jacobi symbol shows there is no solution.

When x = 2025 there is a solution, and we can compute it using sqrt_mod from sympy.ntheory.

x = 2025
k = divide(1 - x*x, 1 - d*x*x, p)
y = sqrt_mod(k, p)

This says the point

(2025, 588747530266665079407582947937120321357732884331117971504880828350684014295)

is on Curve1174. And since x and y only appear as squares in the equation defining the curve, once we find an (x, y) pair on the curve, the points (±x, ±y) are also on the curve.

Just for grins, let’s double the point (xy) above, i.e. add it to itself. This works out to

(2795920935947049934301363619759082573282734750667150130900931245990107374027,  2577351770662637935098262284237063829290291047539093190165388036658162531660).

Number of points on Curve1174

In general it can be hard to compute how many points lie on an elliptic curve, but in the case of Curve 1174 the number of points is known. Bernstein et al computed that the number of points on Curve1174 is p + 1 – t where t is a big number, but much smaller than p, on the order of the square root of p. Specifically,

t = 45330879683285730139092453152713398836.

Why not just absorb the 1 into t? This was done to match the notation in Hasse’s theorem. See the footnote here.

Elliptic curve cryptography (ECC)

What does all this have to do with cryptography? Cryptographers like to find problems that can be computed easily but that are hard to reverse. Most public key cryptography methods depend on the difficulty of undoing one of three things:

  • multiplication,
  • modular exponentiation, or
  • multiplication over an elliptic curve.

RSA encryption, for example, depends on the difficulty of factoring the product of two large primes.

The elliptic curve discrete logarithm problem (ECDLP) is the problem of undoing multiplication over an elliptic curve. If n is an integer and P is a point on the curve, we can compute QnP easily. If n is large, we don’t just add P to itself n times. Instead we double it log2n times and add the necessary intermediate results, analogous to fast exponentiation.

It’s easy to compute Q given n and P, but it’s hard to compute n given P and Q. This is the elliptic curve discrete logarithm problem that EEC protocols rely on for their security.

More elliptic curve cryptography posts

The hard part in becoming a command line wizard

I’ve long been impressed by shell one-liners. They seem like magical incantations. Pipe a few terse commands together, et voilà! Out pops the solution to a problem that would seem to require pages of code.

Source http://dilbert.com/strip/1995-06-24

Are these one-liners real or mythology? To some extent, they’re both. Below I’ll give a famous real example. Then I’ll argue that even though such examples do occur, they may create unrealistic expectations.

Bentley’s exercise

In 1986, Jon Bentley posted the following exercise:

Given a text file and an integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.

Donald Knuth wrote an elegant program in response. Knuth’s program runs for 17 pages in his book Literate Programming.

McIlroy’s solution is short enough to quote below [1].

    tr -cs A-Za-z '
    ' |
    tr A-Z a-z |
    sort |
    uniq -c |
    sort -rn |
    sed ${1}q

McIlroy’s response to Knuth was like Abraham Lincoln’s response to Edward Everett at Gettysburg. Lincoln’s famous address was 50x shorter than that of the orator who preceded him [2]. (Update: There’s more to the story. See [3].)

Knuth and McIlroy had very different objectives and placed different constraints on themselves, and so their solutions are not directly comparable. But McIlroy’s solution has become famous. Knuth’s solution is remembered, if at all, as the verbose program that McIlroy responded to.

The stereotype of a Unix wizard is someone who could improvise programs like the one above. Maybe McIlroy carefully thought about his program for days, looking for the most elegant solution. That would seem plausible, but in fact he says the script was “written on the spot and worked on the first try.” He said that the script was similar to one he had written a year before, but it still counts as an improvisation.

Why can’t I write scripts like that?

McIlroy’s script was a real example of the kind of wizardry attributed to Unix adepts. Why can’t more people quickly improvise scripts like that?

The exercise that Bentley posed was the kind of problem that programmers like McIlroy solved routinely at the time. The tools he piped together were developed precisely for such problems. McIlroy didn’t see his solution as extraordinary but said “Old UNIX hands know instinctively how to solve this one in a jiffy.”

The traditional Unix toolbox is full of utilities for text manipulation. Not only are they useful, but they compose well. This composability depends not only on the tools themselves, but also the shell environment they were designed to operate in. (The latter is why some utilities don’t work as well when ported to other operating systems, even if the functionality is duplicated.)

Bentley’s exercise was clearly text-based: given a text file, produce a text file. What about problems that are not text manipulation? The trick to being productive from a command line is to turn problems into text manipulation problems.  The output of a shell command is text. Programs are text. Once you get into the necessary mindset, everything is text. This may not be the most efficient approach to a given problem, but it’s a possible strategy.

The hard part

The hard part on the path to becoming a command line wizard, or any kind of wizard, is thinking about how to apply existing tools to your particular problems. You could memorize McIlroy’s script and be prepared next time you need to report word frequencies, but applying the spirit of his script to your particular problems takes work. Reading one-liners that other people have developed for their work may be inspiring, or intimidating, but they’re no substitute for thinking hard about your particular work.

Repetition

You get faster at anything with repetition. Maybe you don’t solve any particular kind of problem often enough to be fluent at solving it. If someone can solve a problem by quickly typing a one-liner in a shell, maybe they are clever, or maybe their job is repetitive. Or maybe both: maybe they’ve found a way to make semi-repetitive tasks repetitive enough to automate. One way to become more productive is to split semi-repetitive tasks into more creative and more repetitive parts.

More command line posts

[1] The odd-looking line break is a quoted newline.

[2] Everett’s speech contained 13,607 words while Lincoln’s Gettysburg Address contained 272, a ratio of almost exactly 50 to 1.

[3] See Hillel Wayne’s post Donald Knuth was Framed. Here’s an excerpt:

Most of the “eight pages” aren’t because Knuth is doing LP [literate programming], but because he’s Donald Knuth:

  • One page is him setting up the problem (“what do we mean by ‘word’? What if multiple words share the same frequency?”) and one page is just the index.
  • Another page is just about working around specific Pascal issues no modern language has, like “how do we read in an integer” and “how do we identify letters when Pascal’s character set is poorly defined.”
  • Then there’s almost four pages of handrolling a hash trie.

The “eight pages” refers to the length of the original publication. I described the paper as 17 pages because that the length in the book where I found it.

Naming elliptic curves used in cryptography

There are an infinite number of elliptic curves, but a small number that are used in elliptic curve cryptography (ECC), and these special curves have names. Apparently there are no hard and fast rules for how the names are chosen, but there are patterns.

The named elliptic curves are over a prime field, i.e. a finite field with a prime number of elements p, denoted GF(p). The number of points on the elliptic curve is on the order of p [1].

The curve names usually contain a number which is the number of bits in the binary representation of p. Let’s see how that plays out with a few named elliptic curves.

Curve name Bits in p
ANSSI FRP256v1   256
BN(2, 254) 254
brainpoolP256t1   256
251
255
Curve383187 383
E-222 222
E-382 382
E-521 521
448
M-211 221
M-383 383
M-511 511
NIST P-224 224
256
384
256

In Curve25519, p = 2255 – 19 and in Curve 383187, p = 2383 – 187. Here the number of bits in p is part of the name but another number is stuck on.

The only mystery on the list is Curve1174 where p has 251 bits. The equation for the curve is

x² + y² = 1 – 1174 y²

and so the 1174 in the name comes from a coefficient rather than from the number of bits in p.

Edwards curves

The equation for Curve1174 doesn’t look like an elliptic curve. It doesn’t have the familiar (Weierstrass) form

y² = x³ + ax + b

It is an example of an Edwards curve, named after Harold Edwards. So are all the curves above whose names start with “E”. These curves have the form

x² + y² = 1 + d x² y².

where d is not 0 or 1. So some Edwards curves are named after their d parameter and some are named after the number of bits in p.

It’s not obvious that an Edwards curve can be changed into Weierstrass form, but apparently it’s possible; this paper goes into the details.

The advantage of Edwards curves is that the elliptic curve group addition has a simple, convenient form. Also, when d is not a square in the underlying field, there are no exceptional points to consider for group addition.

Is d = -1174 a square in the field underlying Curve1174? For that curve p = 2251 – 9, and we can use the Jacobi symbol code from earlier this week to show that d is not a square.

    p = 2**251 - 9
    d = p-1174
    print(jacobi(d, p))

This prints -1, indicating that d is not a square. Note that we set d to p – 1174 rather than -1174 because our code assumes the first argument is positive, and -1174 and p – 1174 are equivalent mod p.

Update: More on addition on Curve1174.

Prefix conventions

A US government publication (FIPS PUB 186-4) mandates the following prefixes:

  • P for curves over a prime field,
  • B for curves over a binary field (i.e. GF(2n)), and
  • K for Koblitz fields.

The ‘k’ in secp256k1 also stands for Koblitz.

The M prefix above stands for Montgomery.

More ECC posts

[1] It is difficult to compute the exact number of points on an elliptic curve over a prime field. However, the number is roughly p ± 2√p. More precisely, Hasse’s theorem says

|\#(E/\mathbb{F}_p) - p - 1| \leq 2\sqrt{p}

Entropy extractor used in μRNG

Yesterday I mentioned μRNG, a true random number generator (TRNG) that takes physical sources of randomness as input. These sources are independent but non-uniform. This post will present the entropy extractor μRNG uses to take non-uniform bits as input and produce uniform bits as output.

We will present Python code for playing with the entropy extractor. (μRNG is extremely efficient, but the Python code here is not; it’s just for illustration.) The code will show how to use the pyfinite library to do arithmetic over a finite field.

Entropy extractor

The μRNG generator starts with three bit streams—X, Y, and Z—each with at least 1/3 bit min-entropy per bit.

Min-entropy is Rényi entropy with α = ∞. For a Bernoulli random variable, that takes on two values, one with probability p and the other with probability 1 − p, the min-entropy is

−log2 max(p, 1 − p).

So requiring min-entropy of at least 1/3 means the two probabilities are less than 2−1/3 = 0.7937.

Take eight bits (one byte) at a time from XY, and Z, and interpret each byte as an element of the finite field with 28 elements. Then compute

X*YZ

in this field. The resulting stream of bits will be independent and uniformly distributed, or very nearly so.

Purified noise

Just a quick aside. Normally you want to remove noise from data to reveal a signal. Said another way, you want to split the data into signal and noise so you can throw out the noise. Here the goal is the opposite: we want to remove any unwanted signal in order to create pure noise!

Python implementation

We will need the bernoulli class for generating our input bit streams, and the pyfinite for doing finite field arithmetic on the bits.

    from scipy.stats import bernoulli
    from pyfinite import ffield

And we will need a couple bit manipulation functions.

    def bits_to_num(a):
        "Convert an array of bits to an integer."
    
        x = 0
        for i in range(len(a)):
            x += a[i]*2**i
        return x

    def bitCount(n):
        "Count how many bits are set to 1."
        count = 0
        while(n):
            n &= n - 1
            count += 1
        return count 

The following function generates random bytes using the entropy extractor. The input bit streams have p = 0.79, corresponding to min-entropy 0.34.

    def generate_byte():
        "Generate bytes using the entropy extractor."
    
        b = bernoulli(0.79)
    
        x = bits_to_num(b.rvs(8))
        y = bits_to_num(b.rvs(8))
        z = bits_to_num(b.rvs(8)) 

        F = ffield.FField(8)
        return F.Add(F.Multiply(x, y), z)

Note that 79% of the bits produced by the Bernoulli generator will be 1’s. But we can see that the output bytes are about half 1’s and half 0’s.

    s = 0
    N = 1000
    for _ in range(N):
        s += bitCount( generate_byte() )
    print( s/(8*N) )

This returned 0.50375 the first time I ran it and 0.49925 the second time.

For more details see the μRNG paper.

Update: RNG test suite results

I ran an experiment, creating streams of biased data and running them through the entropy extractor. The first post in the series, NIST STS, explains the set up. The last post in the series, using TestU01, summarizes the results. In a nutshell, the extractor passes STS and DIEHARDER, but fails PractRand and TestU01.

Related posts

Solving for probability given entropy

If a coin comes up heads with probability p and tails with probability 1 − p, the entropy in the coin flip is

S = −p log2 p − (1 − p) log2 (1 − p).

It’s common to start with p and compute entropy, but recently I had to go the other way around: given entropy, solve for p. It’s easy to come up with an approximate solution.

entropy and approximation

Entropy in this case is approximately quadratic

S ≈ 4p(1 − p)

and so

p ≈ (1 ± √(1 − S))/2.

This is a good approximation if S is near 0 or 1 but mediocre in the middle. You could use solve for p numerically, say with Newton’s method, to get more accuracy if needed.

Update

As Sjoerd Visscher pointed out in the comments, the quadratic approximation for entropy is much better if you raise it to the power 3/4. When I added this new approximation to the graph above, the new approximation agreed with the correct value to within the thickness of the plotting line.

To make the approximation error visible, here’s the log of the absolute value of the error of the two approximations, on a log scale.

approximation error on log scale

The error in the new approximation is about an order of magnitude smaller, sometimes more.

The improved approximation for entropy is

S ≈ (4p(1 − p))3/4

and so the new approximation for probability is

p ≈ (1 ± √(1 − S4/3))/2.

More information theory posts

Missing information anxiety

A recurring theme in math is that you may not need to do what it looks like you need to do. There may be a shortcut to where you want to go. A special case of this is that you may not need all the information that you think you need.

For example, if you need to know the last digit of a×b, it might seem you need to know a and b so you can multiply them together. But you only have to know the last digits of a and b. In fact, if one of the last digits is 0, that’s all you need to know.

As a math consultant, I often tell clients they don’t need information that they think they need. That news may come as a relief, or it may cause anxiety. I may tell a client, for instance, that missing data cannot change a conclusion, so it’s not worth waiting for. Whether that brings relief or anxiety depends on whether they believe me.

There’s a physics demonstration where you have a heavy ball on a long cable. You pull back the ball like a pendulum and let it touch your chin. Then let the ball go and stand still. If you’re convinced of the physical laws governing the motion of the ball, you can stand there without flinching. You know that just as it left your chin with zero velocity, it will return with zero velocity. In fact, because of friction, it won’t quite return to your chin. But if you’re not fully convinced of this explanation, you’ll be afraid that a wrecking ball is about to smash your face, and understandably so.

When you tell clients that they don’t need something they think they need, it may come across as if you’re asking them to stand still as a wrecking ball swings toward their face. It’s not enough to be correct; you need to be persuasive as well. Examples help. As with the physics demo, you can put your own face on the line before asking them to do the same.

Sum-product theorem for finite fields

A week ago I wrote about using some Python code to play with the sum-product theorem of Erdős and Szemerédi and its conjectured refinement. This morning I learned that the Erdős-Szemerédi theorem has been extended to finite fields.

David Johnston left a comment saying that he and his colleagues used this extension to finite fields as part of the construction of μRNG, a remarkably efficient true random number generator. The finite field version of Erdős-Szemerédi leads to a way to combine three physical but non-uniform sources of randomness into a uniformly distributed stream of bits.

Bourgain, Katz, and Tao proved that the result

max{|A+A|, |A*A|} ≥ c|A|1+ε

extends to subsets A from a finite field F with conditions on the field F and the size of A relative to F.

It suffices for F to have prime order. More generally, and importantly for applications, it also holds for fields of order 2p for prime p.

Given a constant δ > 0, if the size of the set A satisfies

|F|δ < |A| < |F|1-δ

then the theorem holds where the constant c depends on δ.

Computing Legendre and Jacobi symbols

In a earlier post I introduce the Legendre symbol

\left(\frac{a}{p}\right)

where a is a positive integer and p is prime. It is defined to be 0 if a is a multiple of p, 1 if a has a square root mod p, and −1 otherwise.

The Jacobi symbol is a generalization of the Legendre symbol and uses the same notation. It relaxes the requirement that p be prime and only requires that p is odd.

If m has prime factors pi with exponents ei, then the Jacobi symbol is defined by

\left(\frac{n}{m}\right) = \prod \left(\frac{n}{p_i} \right )^{e_i}

Note that the symbol on the left is a Jacobi symbol while the symbols on the right are Legendre symbols.

The Legendre and Jacobi symbols are not fractions, but they act in some ways like fractions, and so the notation is suggestive. They come up in applications of number theory, so it’s useful to be able to compute them.

Algorithm for computing Jacobi symbols

Since the Legendre symbol is a special case of the Jacobi symbol, we only need an algorithm for computing the latter.

In the earlier post mentioned above, I outline an algorithm for computing Legendre symbols. The code below is more explicit, and more general. It’s Python code, but it doesn’t depend on any libraries or special features of Python, so it could easily be translated to another language. The algorithm is taken from Algorithmic Number Theory by Bach and Shallit. Its execution time is O( (log a)(log n) ).

    def jacobi(a, n):
        assert(n > a > 0 and n%2 == 1)
        t = 1
        while a != 0:
            while a % 2 == 0:
                a /= 2
                r = n % 8
                if r == 3 or r == 5:
                    t = -t
            a, n = n, a
            if a % 4 == n % 4 == 3:
                t = -t
            a %= n
        if n == 1:
            return t
        else:
            return 0

Testing the Python code

To test the code we randomly generate positive integers a and odd integers n greater than a. We compare our self-contained Jacobi symbol function to the one in SymPy.

    N = 1000
    for _ in range(100):
        a = randrange(1, N)
        n = randrange(a+1, 2*N)
        if n%2 == 0:
            n += 1
            
        j1 = jacobi_symbol(a, n)
        j2 = jacobi(a, n)
        if j1 != j2:
            print(a, n, j1, j2)

This prints nothing, suggesting that we coded the algorithm correctly.

Related number theory posts

Dose finding ≠ dose escalation

You’ll often hear Phase I dose-finding trials referred to as dose escalation studies. This is because simple dose-finding methods can only explore in one direction: they can only escalate.

Three-plus-three rule

The most common dose finding method is the 3+3 rule. There are countless variations on this theme, but the basic idea is that you give a dose of an experimental drug to three people. If all three are OK, you go up a dose next time. If two out of three are OK, you give that dose again. If only one out of three is OK, you stop [1].

Deterministic thinking

The 3+3 algorithm implicitly assumes deterministic thinking, at least in part. The assumption is that if three out of three patients respond well, we know the dose is safe [2].

If you increase the dose level and the next three patients experience adverse events, you stop the trial. Why? Because you know that the new dose is dangerous, and you know the previous dose was safe. You can only escalate because you assume you have complete knowledge based on three samples.

But if we treat three patients at a particular dose level and none have an adverse reaction we do not know for certain that the dose level is safe, though we may have sufficient confidence in its safety to try the next dose level. Similarly, if we treat three patients at a dose and all have an adverse reaction, we do not know for certain that the dose is toxic.

Bayesian dose-finding

A Bayesian dose-finding method estimates toxicity probabilities given the data available. It might decide at one point that a dose appears safe, then reverse its decision later based on more data. Similarly, it may reverse an initial assessment that a dose is unsafe.

A dose-finding method based on posterior probabilities of toxicity is not strictly a dose escalation method because it can explore in two directions. It may decide that the next dose level to explore is higher or lower than the current level.

Starting at the lowest dose

In Phase I studies of chemotherapeutics, you conventionally start at the lowest dose. This makes sense. These are toxic agents, and you naturally want to start at a dose you have reason to believe isn’t too toxic. (NB: I say “too toxic” because chemotherapy is toxic. You hope that it’s toxic to a tumor without being too toxic for the patient host.)

But on closer inspection maybe you shouldn’t start at the lowest dose. Suppose you want to test 100 mg, 200 mg, and 300 mg of some agent. Then 100 mg is the lowest dose, and it’s ethical to start at 100 mg. Now what if we add a dose of 50 mg to the possibilities? Did the 100 mg dose suddenly become unethical as a starting dose?

If you have reason to believe that 100 mg is a tolerable dose, why not start with that dose, even if you add a lower dose in case you’re wrong? This makes sense if you think of dose-finding, but not if you think only in terms of dose escalation. If you can only escalate, then it’s impossible to ever give a dose below the starting dose.

More clinical trial posts

[1] I have heard, but I haven’t been able to confirm, that the 3+3 method has its origin in a method proposed by John Tukey during WWII for testing bombs. When testing a mechanical system, like a bomb, there is much less uncertainty than when testing a drug in a human. In a mechanical setting, you may have a lot more confidence from three samples than you would in a medical setting.

[2] How do you explain the situation where one out of three has an adverse reaction? Is the dose safe or not? Here you naturally switch to probabilistic thinking because deterministic thinking leads to a contradiction.