John D. Cook Applied Mathematics Consulting Mon, 25 Mar 2019 18:56:25 +0000 en-US hourly 1 Finite rings Mon, 25 Mar 2019 18:56:25 +0000 It occurred to me recently that I rarely hear about finite rings. I did a Google Ngram search to make sure this isn’t just my experience.

Finite group, finite ring, finite field ngram


Why are finite groups and finite fields common while finite rings are not?

Finite groups have relatively weak algebraic structure, and demonstrate a lot of variety. Finite fields have very strong algebraic structure. Their complete classification has been known for a long time and is easy to state.

I imagine that most of the references to finite groups above have to do with classifying finite groups, and that most of the references to finite fields have to do with applications of finite fields, which are many.

You can see that references to finite groups hit their peak around the time of the Feit-Thompson theorem in 1962, and drop sharply after the classification of finite simple groups was essentially done in 1994. There’s a timeline of the progress toward the classification theorem on Wikipedia.

Rings have more structure than groups, but less structure than fields. Finite rings in particular are in a kind of delicate position: they easily become fields. Wedderburn’s little theorem says every finite domain is a field.

The classification of finite rings is much simpler than that of finite groups. And in applications you often want a finite field. Even if a finite ring (not necessarily a field) would do, you’d often use a finite field anyway.

In summary, my speculation as to why you don’t hear much about finite rings is that they’re not as interesting to classify as finite groups, and not as useful in application as finite fields.

Posts on finite simple groups

Posts on finite fields

]]> 2
Monads and generalized elements Sun, 24 Mar 2019 20:52:03 +0000 Paolo Perrone gives a nice, succinct motivation for monads in the introduction to his article on probability and monads.

… a monad is like a consistent way of extending spaces to include generalized elements of a specific kind.

He develops this idea briefly, and links to his dissertation where he gives a longer exposition (pages 8–14).

Related post: Monads are hard because …

]]> 0
Mixing error-correcting codes and cryptography Sat, 23 Mar 2019 16:58:05 +0000 Secret codes and error-correcting codes have nothing to do with each other. Except when they do!

Error-correcting codes

Error correcting code make digital communication possible. Without some way to detect and correct errors, the corruption of a single bit could wreak havoc. A simple example of an error-detection code is check sums. A more sophisticated example would be erasure codes, a method used by data centers to protect customer data against hard drive failures or even entire data centers going offline.

People who work in coding theory are quick to point out that they do not work in cryptography. “No, not that kind of code. Error-correcting codes, not secret codes.” The goal isn’t secrecy. The goal is maximize the probability of correctly transmitting data while minimizing the amount of extra information added.

Codes and ciphers

You don’t hear the word “code” used in connection with cryptography much anymore. People used to refer to “codes and ciphers” in one breath. Historically, the technical distinction was that a code operated on words, while a cipher operated on characters. Codes in this sense have long been obsolete, but people still speak of codes colloquially.

David Kahn’s classic book on pre-modern cryptography is entitled The Codebreakers, not the Cipherbreakers, because the public at the time was more familiar with the term code than the term cipher. Maybe that’s still the case because, for example, Jason Fagone entitled his biography of Elizabeth Friedman The Woman Who Smashed Codes. Perhaps the author suggested The Woman Who Smashed Ciphers and an editor objected.

Code-based cryptography

If you’re accustomed to the older use of “codes,” the term “code-based cryptography” is redundant. But it means something specific in modern usage: cryptographic systems that incorporate error-correction codes. So error-correcting codes and secret “codes” do have something to do with each other after all!

Robert McEliece had this idea back in 1978. His encryption method starts with a particular error-correcting code, a binary Goppa code, and scrambles it with an invertible linear transformation. At a very high level, McEliece’s method boils down to a secret factorization, sorta like RSA but even more like oil and vinegar. The public key is the product of the Goppa code and the linear transformation, but only the owner knows the factorization of this key.

To encrypt a message with McEliece’s method, the sender adds a specific amount of random noise, noise that the Goppa code can remove. An attacker faces a challenging computational problem to recover the message without knowing how to factor the public key.

Post-quantum cryptography

McEliece’s method did not attract much interest at the time because it requires much larger public keys than other methods, such as RSA. However, there is renewed interest in McEliece’s approach because his scheme is apparently quantum-resistant whereas RSA and other popular public key systems are not.

If and when large quantum computers become practical, they could factor primes efficiently, and thus break RSA. They could also solve the discrete logarithm and elliptic discrete logarithm problems, breaking Diffie-Hellman and elliptic curve cryptosystems. All public key cryptosystems now in common use would be broken.

Why worry about this now while quantum computers don’t exist? (They exist, but only as prototypes. So far the largest number a quantum computer has been able to factor is 21.) The reason is that it takes a long time to develop, analyze, standardize, and deploy encryption methods. There’s also the matter of forward security: someone could store encrypted messages with the hope of decrypting them in the future. This doesn’t matter for cat photos transmitted over TLS, but it could matter for state secrets; governments may be encrypting documents that they wish to keep secret for decades.

NIST is sponsoring a competition to develop and standardize quantum-resistant encryption methods. Two months ago NIST announced the candidates that advanced to the second round. Seven of these methods use code-based cryptography, including the classic McEliece method and six variations: BIKE, HQC, LEDAcrypt, NTS-KEM, ROLLO, and RQC.

Related posts

]]> 1
US Army applying new areas of math Thu, 21 Mar 2019 14:27:01 +0000 Many times on this blog I’ve argued that the difference between pure and applied math is motivation. As my graduate advisor used to say, “Applied mathematics is not a subject classification. It’s an attitude.”

Uncle Sam wants homotopy type theory

Traditionally there was general agreement regarding what is pure math and what is applied. Number theory and topology, for example, are pure, while differential equations and numerical analysis are applied.

But then public key cryptography and topological data analysis brought number theory and topology over into the applied column, at least for some people. And there are people working in differential equations and numerical analysis that aren’t actually interested in applications. It would be more accurate to say that some areas of math are more directly and more commonly applied than others. Also, some areas of math are predominantly filled with people interested in applications and some are not.

The US Army is interested in applying some areas of math that you would normally think of as very pure, including homotopy type theory (HoTT).

From an Army announcement:

Modeling frameworks are desired that are able to eschew the usual computational simplification assumptions and realistically capture … complexities of real world environments and phenomena, while still maintaining some degree of computational tractability. Of specific interest are causal and predictive modeling frameworks, hybrid model frameworks that capture both causal and predictive features, statistical modeling frameworks, and abstract categorical models (cf. Homotopy Type Theory).

And later in the same announcement

Homotopy Type Theory and its applications are such an area that is of significant interest in military applications.

HoTT isn’t the only area of math the Army announcement mentions. There are the usual suspects, such as (stochastic) PDEs, but also more ostensibly pure areas of math such as topology; the word “topological” appears 23 times in the document.

This would be fascinating. It can be interesting when a statistical model works well in application, but it’s no surprise: that’s what statistics was developed for. It’s more interesting when something finds an unexpected application, such as when number theory entered cryptography. The applications the Army has in mind are even more interesting because the math involved is more abstract and, one would have thought, less likely to be applied.

Related posts

]]> 3
Riffing on mistakes Tue, 19 Mar 2019 16:32:34 +0000 I mentioned on Twitter yesterday that one way to relieve the boredom of grading math papers is to explore mistakes. If a statement is wrong, what would it take to make it right? Is it approximately correct? Is there some different context where it is correct? Several people said they’d like to see examples, so this blog post is a sort of response.


One famous example of this is the so-called Freshman’s Dream theorem:

(a + b)p = ap + bp

This is not true over the real numbers, but it is true, for example, when working with integers mod p.

(More generally, the Freshman’s Dream is true in any ring of characteristic p. This is more than an amusing result; it’s useful in applications of finite fields.)


A common misunderstanding in calculus is that a series converges if its terms converge to zero. The canonical counterexample is the harmonic series. It’s terms converge to zero, but the sum diverges.

But this can’t happen in the p-adic numbers. There if the terms of a series converge to zero, the series converges (though maybe not absolutely).


Here’s something sorta along these lines. It looks wrong, and someone might arrive at it via a wrong understanding, but it’s actually correct.

sin(xy) sin(x + y) = (sin(x) – sin(y)) (sin(x) + sin(y))


Odd integers end in odd digits, but that might not be true if you’re not working in base 10. See Odd numbers in odd bases.


You can misunderstand how percentages work, but still get a useful results. See Sales tax included.


When probabilities are small, you can often get by with adding them together even when strictly speaking they don’t add. See Probability mistake can make a good approximation.

]]> 3
A genius can admit finding things difficult Tue, 19 Mar 2019 13:58:57 +0000 Karen Uhlenbeck

Karen Uhlenbeck has just received the Abel Prize. Many say that the Fields Medal is the analog of the Nobel Prize for mathematics, but others say that the Abel Prize is a better analog. The Abel prize is a recognition of achievement over a career whereas the Fields Medal is only awarded for work done before age 40.

I had a course from Karen Uhlenbeck in graduate school. She was obviously brilliant, but what I remember most from the class was her candor about things she didn’t understand. She was already famous at the time, having won a MacArthur genius award and other honors, so she didn’t have to prove herself.

When she presented the definition of a manifold, she made an offhand comment that it took her a month to really understand that definition when she was a student. She obviously understands manifolds now, having spent her career working with them.

I found her comment about extremely encouraging. It shows it’s possible to become an expert in something you don’t immediately grasp, even if it takes you weeks to grok its most fundamental concept.

Uhlenbeck wasn’t just candid about things she found difficult in the past. She was also candid about things she found difficult at the time. She would grumble in the middle of a lecture things like “I can never remember this.” She was not a polished lecturer—far from it—but she was inspiring.

Related posts

(The connection between Karen Uhlenbeck, Ted Odell, and John Tate is that they were all University of Texas math faculty.)

Photo of Karen Uhlenbeck in 1982 by George Bergman [GFDL], via Wikimedia Commons

]]> 0
Thermocouple polynomials and other sundries Tue, 19 Mar 2019 01:00:32 +0000 I was looking up something on the NIST (National Institute of Standards and Technology) web site the other day and ran across thermocouple polynomials. I wondered what that could be, assuming “thermocouple” was a metaphor for some algebraic property. No, it refers to physical thermocouples. The polynomials are functions for computing voltage as a function of temperature, and temperature as a function of voltage, for a variety of types of thermocouples. See the NIST ITS-90 Thermocouple Database.

I keep running into NIST’s eclectic collection of useful information. Three examples:

I wonder what’s going to take me back to NIST next.

]]> 2
Digital signatures with oil and vinegar Mon, 18 Mar 2019 11:43:20 +0000 “Unbalanced oil and vinegar” is a colorful name for a cryptographic signature method. This post will give a high-level description of the method and explain where the name comes from.

The RSA encryption algorithm depends on the fact that computers can easily multiply enormous numbers, but they cannot efficiently factor the product of two enormous primes. Whenever you have something that’s easy to do but hard to undo, you might be able to make an encryption algorithm out of it.

The unbalanced oil and vinegar (UOV) digital signature algorithm is analogous to RSA in that it also depends on the difficulty of factoring. But UOV is based on the difficulty of factoring the composition of a linear and nonlinear operator, not multiplying prime numbers. One advantage of UOV over RSA is that UOV is quantum-resistant. That is, if large quantum computers become practical, UOV signatures will remain hard to forge (or so it is currently believed) whereas RSA signatures would be easy to forge.

Solving large systems of multivariate polynomial equations over finite fields is hard, provably NP-hard, unless there’s some special structure that makes things easier. Several proposed post-quantum digital signature algorithms are based on this, such as the LUOV variant on UOV.

The idea behind UOV is to create systems of equations that have a special structure, with some “oil” variables and some “vinegar” variables, so named because they do not mix, or rather mix in a very simple, convenient way. This special structure is kept secret, and is obscured by composition with an invertible linear operator. This operator acts like a blender, thoroughly mixing the oil and vinegar. The term “unbalanced” refers to the fact that the scheme is more secure if you do not have equal numbers of “oil” and “vinegar” variables.

Polynomials over finite fields. Polynomials over finite fields everywhere!

Someone wanting to sign a file with the UOV algorithm knows the oil-and-vinegar structure and produces a vector that is mapped to a specified value, inverting the composition of the linear operator and the polynomials. They can do this because they know the factorization into this special structure. Someone wanting to verify a UOV signature only knows the (apparently unstructured) composition. They just see a large system of multivariate polynomial equations. They can stick a signature in and verify that the output is what it’s supposed to be, but they couldn’t produce a signature because they can’t invert the system. [1]

How large do these systems of polynomials need to be? On the order of a hundred equations and variables, though with more variables than polynomials. Not that large compared to linear systems, where one can efficiently solve systems with millions of equations and variables. And the polynomial are only quadratic. So in one sense the systems are small. But it takes several kilobytes [2] to describe such systems, which makes the public keys for UOV large relative to currently popular digital signature algorithms such as ECDSA. The signatures produced by UOV are small, but the public keys are large.

Related posts

[1] The system is not invertible in the sense of being one-to-one because it’s underdetermined. By inverting the system we mean producing any input that maps to the desired output. This solution is not generally unique.

[2] Representing m quadratic polynomials in n variables over a field of size b bits requires bmn²/2 bits. So 80 quadratic polynomials in 120 variables over GF(28) would require 8 × 80 × 120²/2 = 4,608,000 bits = 576 kilobytes. The LUOV variation on UOV mentioned above reduces the key sizes quite a bit, but it still requires larger public keys than ECDSA.

]]> 0
Counting irreducible polynomials over finite fields Thu, 14 Mar 2019 17:40:48 +0000 You can construct a finite field of order pn for any prime p and positive integer n. The elements are polynomials modulo an irreducible polynomial of degree n, with coefficients in the integers mod p. The choice of irreducible polynomial matters, though the fields you get from any two choices will be isomorphic.

For example, the AES encryption algorithm uses the finite field GF(28), i.e. the finite field with 28 = 256 elements. Except we need to be a little careful about saying “the” field. Since we’re doing concrete calculations, the choice of irreducible polynomial matters, and AES dictates the polynomial

x8 + x4 + x3 + x + 1.

Another example from cryptography is Galois Counter Mode (GCM) which uses the finite field GF(2128), specifying the irreducible polynomial

x128 + x7 + x2 + x + 1.

How many other irreducible polynomials are there over GF(28) or any other field for that matter? We’ll assume the leading coefficient is 1, i.e. we’ll count monic polynomials, because otherwise we can just divide by the leading coefficient.

The number of monic irreducible polynomials of degree n over a field with q elements is given by

I_q(n) = \frac{1}{n} \sum_{d | n} \mu(d) q^{n/d}

where μ is the Möbius function and the sum is over all positive integers that divide n. We can implement this function succinctly in Python.

    from sympy import mobius, divisors

    def I_q(n, q):
        list = [mobius(d)*q**(n/d) for d in divisors(n)]
        return sum(list)//n

We can compute I_q(8, 2) to find out there are 30 monic irreducible polynomials of degree 8 with coefficients in GF(2), i.e. with one-bit coefficients. There are 256 monic polynomials—the coefficient of xk can be either 0 or 1 for k = 0 … 7—but only 30 of these are irreducible. Similarly, there are 2128 monic polynomials of degree 128 with binary coefficients, and 2121 of them are irreducible. Clearly it’s convenient in applications like GCM to use a polynomial of low weight, i.e. one with few non-zero coefficients.

Note that in the paragraph above we count the number of monic irreducible polynomials with coefficients in GF(2) that we could use in constructing GF(28). We haven’t considered how many monic irreducible polynomials there are in GF(28), i.e. with coefficients not just in GF(2) but in GF(28). That would be a much larger number. If we call I_q(8, 256) we get 2,305,843,008,676,823,040.

]]> 5
Scaling up differential privacy: lessons from the US Census Thu, 14 Mar 2019 15:52:41 +0000 The paper Issues Encountered Deploying Differential Privacy describes some of the difficulties the US Census Bureau has run into while deploying differential privacy for the 2020 census. It’s not surprising that they would have difficulties. It’s surprising that they would even consider applying differential privacy on such an enormous scale.

If your data project is smaller than the US Census, you can probably make differential privacy work.

Related posts

]]> 0
Average distance between planets Wed, 13 Mar 2019 03:17:21 +0000 What is the closest planet to Earth?

The planet whose orbit is closest to the orbit of Earth is clearly Venus. But what planet is closest? That changes over time. If Venus is between the Earth and the sun, Venus is the closest planet to Earth. But if Mercury is between the Earth and the sun, and Venus is on the opposite side of the sun, then Mercury is the closest planet to Earth.

On average, Mercury is the closest planet to the Earth, closer than Venus! In fact, Mercury is the closest planet to every planet, on average. A new article in Physics Today gives a detailed explanation.

The article gives two explanations, one based on probability, and one based on simulated orbits. The former assumes planets are located at random points along their orbits. The latter models the actual movement of planets over the last 10,000 years. The results are agree to within 1%.

It’s interesting that the two approaches agree. Obviously planet positions are not random. But over time the relative positions of the planets are distributed similarly to if they were random. They’re ergodic.

My first response would be to model this as if the positions were indeed random. But my second thought is that maybe the actual motion of the planets might have resonances that keep the distances from being ergodic. Apparently not, or at least the deviation from being ergodic is small.

Related posts

]]> 8
All elliptic curves over fields of order 2 and 3 Mon, 11 Mar 2019 15:52:02 +0000 Introductions to elliptic curves often start by saying that elliptic curves have the form

y² = x³ + ax + b.

where 4a³ + 27b² ≠ 0. Then later they say “except over fields of characteristic 2 or 3.”

What does characteristic 2 or 3 mean? The order of a finite field is the number of elements it has. The order is always a prime or a prime power. The characteristic is that prime. So another way to phrase the exception above is to say “except over fields of order 2n or 3n.”

If we’re looking at fields not just of characteristic 2 or 3, but order 2 or 3, there can’t be that many of them. Why not just list them? That’s what I plan to do here.

General form of elliptic curves

All elliptic curves over a finite field have the form

y² + a1xy + a3y = x³ + a2x² + a4x + a6,

even over fields of characteristic 2 or 3.

When the characteristic of the field is not 2, this can be simplified to

y² = 4x³ + b2x² + 2b4x + b6


b2 = a1² + 4a4,
b4 = 2a4 + a1a3, and
b6 = a3² + 4a6.

When the characteristic is at least 5, the form can be simplified further to the one at the top with just two parameters.

General form of the discriminant

The discriminant of an elliptic curve is something like the discriminant of a quadratic equation. You have an elliptic curve if and only if it is not zero. For curves of characteristic at least five, the condition is 4a³ + 27b², but it’s more complicated for characteristic 2 and 3. To define the discriminant, we’ll need to use b2, b4, and b6 from above, and also

b8 = a1²a6 + 4a2a6a1a3a4 + a2a3² – a4².

Now we can define the discriminant Δ in terms of all the b‘s.

Δ = –b2²b8 – 8b4³ – 27b6² + 9b2b4b6.

See Handbook of Finite Fields page 423.

Enumerating coefficients

Now we can enumerate which parameter combinations yield elliptic curves with the following Python code.

from itertools import product

def discriminant(a1, a2, a3, a4, a6):
    b2 = a1**2 + 4*a4
    b4 = 2*a4 + a1*a3
    b6 = a3**2 + 4*a6
    b8 = a1**2*a6 + 4*a2*a6 - a1*a3*a4 + a2*a3**2 - a4**2
    delta = -b2**2*b8 - 8*b4**3 - 27*b6**2 + 9*b2*b4*b6
    return delta

p = 2
r = range(p)
for (a1, a2, a3, a4, a6) in product(r,r,r,r,r):
    if discriminant(a1, a2, a3, a4, a6)%p != 0:
        print(a1, a2, a3, a4, a6)

The code above does return the values of the a‘s that yield an elliptic curve, but in some sense it returns too many. For example, there are 32 possible combinations of the a‘s when working over GF(2), the field with two elements, and 16 of these lead to elliptic curves. But some of these must lead to the same set of points because there are only 4 possible (x, y) affine points on the curve, plus the point at infinity.

Now we get into a subtle question: when are two elliptic curves the same? Can two elliptic curves have the same set of points and yet be algebraically different? Sometimes, but not usually. Lenstra and Pila [1] proved that two elliptic curves can be equal as sets but not equal as groups if and only if the curve has 5 points and the field has characteristic 2. [2]

Lenstra and Pila give the example of the two equations

y² + y = x³ + x²


y² + y = x³ + x

over GF(2). Both determine the same set of points, but the two curves are algebraically different because (0,0) + (0,0) equals (1,1) on the first curve and (1,0) on the second.

Enumerating points on curves

The following Python code will enumerate the set of points on a given curve.

def on_curve(x, y, a1, a2, a3, a4, a6, p):
    left = y**2 + a1*x*y + a3*y
    right = x**3 + a2*x**2 + a4*x + a6
    return (left - right)%p == 0

def affine_points(a1, a2, a3, a4, a6, p):
    pts = set()
    for x in range(p):
        for y in range(p):
            if on_curve(x, y, a1, a2, a3, a4, a6, p):
    return pts

We can use this code, along with Lenstra and Pila’s result, to enumerate all elliptic curves of small order.

All elliptic curves over GF(2)

Now we can list all the elliptic curves over the field with two elements.

Curves of order 5

The two curves in the example of Lendstra and Pila are the only ones over GF(2) with five points. So the two curves of order 5 over GF(2) are

y² + y = x³ + x²
y² + y = x³ + x.

They determine the same set of points but are algebraically different.

Curves of order 4

There are four curves of order 4.They contain different sets of points, i.e. each omits a different one of the four possible affine points.

y² + xy = x³ + 1
y² + xy = x³ + x² + x
y² + xy + y = x³ + x²
y² + xy + y = x³ + x² + x

Curves of order 3

There are two distinct curves of order 3, each determined by two equations.

The first curve is determined by either of

y² + y = x³
y² + y = x³ + x² + x

and the second by either of

y² + xy + y = x³ + 1
y² + y = x³ + x² + x + 1

Curves of order 2

There are 4 curves of order two; each contains a different affine point.

y² + xy + y = x³ + 1
y² + xy + y = x³ + x + 1
y² + xy = x³ + x² + 1
y² + xy = x³ + x² + x

Curves of order 1

These are curves containing only the point at infinity

y² + y = x³ + x + 1
y² + y = x³ + x² + 1

There are no affine points because the left side is always 0 and the right side is always 1 for x and y in {0, 1}.

All elliptic curves over GF(3)

There are too many elliptic curves over GF(3) to explore as thoroughly as we did with GF(2) above, but I can report the following results that are obtainable using the Python code above.

An elliptic curve over GF(3) contains between 1 and 7 points. Here are the number of parameter combinations that lead to each number of points.

    1:  9
    2: 22
    3: 26
    4: 15
    5: 26
    6: 22
    7:  9

Obviously there’s only one curve with one point, the point at infinity, so the nine coefficient combinations that lead to a curve of order 1 determine the same curve.

There are 9 distinct curves of order 2 and 12 distinct curves of order 3. All the curves of orders 4, 5, 6, and 7 are distinct.

Related posts

[1] H. W. Lenstra, Jr and J. Pila. Does the set of points of an elliptic curve determine the group? Computational Algebra and Number Theory, 111-118.

[2] We are not considering isomorphism classes here. If two curves have a different set of points, or the same set of points but different group properties, we’re considering them different.

]]> 0
US Census Bureau embraces differential privacy Sun, 10 Mar 2019 14:11:38 +0000 The US Census Bureau is convinced that traditional methods of statistical disclosure limitation have not done enough to protect privacy. These methods may have been adequate in the past, but it no longer makes sense to implicitly assume that those who would like to violate privacy have limited resources or limited motivation. The Bureau has turned to differential privacy for quantifiable privacy guarantees that are independent of the attacker’s resources and determination.

John Abowd, chief scientist for the US Census Bureau, gave a talk a few days ago (March 4, 2019) in which he discusses the need for differential privacy and how the bureau is implementing differential privacy for the 2020 census.

Absolutely the hardest lesson in modern data science is the constraint on publication that the fundamental law of information recovery imposes. I usually call it the death knell for traditional method of publication, and not just in statistical agencies.

Related posts

]]> 1
Efficient modular arithmetic technique for Curve25519 Sat, 09 Mar 2019 20:17:40 +0000 Daniel Bernstein’s Curve25519 is the elliptic curve

y² = x³ + 486662x² + x

over the prime field with order p = 2255 – 19. The curve is a popular choice in elliptic curve cryptography because its design choices are transparently justified [1] and because cryptography over the curve can be implemented very efficiently. This post will concentrate on one of the tricks that makes ECC over Curve25519 so efficient.

Curve25519 was designed for fast and secure cryptography. One of the things that make it fast is the clever way Bernstein carries out arithmetic mod 2255 – 19 which he describes here.

Bernstein represents numbers mod 2255 – 19 by polynomials whose value at 1 gives the number. That alone is not remarkable, but his choice of representation seems odd until you learn why it was chosen. Each number is represented as a polynomial of the form

ui xi

where each ui is an integer multiple ki of 2⌈25.5i, and each ki is an integer between -225 and 225 inclusive.

Why this limitation on the k‘s? Pentium cache optimization. In Bernstein’s words:

Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or eight 32-bit pieces? Answer: The coefficients of a polynomial product do not fit into the Pentium M’s fp registers if pieces are too large. The cost of handling larger coefficients outweighs the savings of handling fewer coefficients.

And why unevenly spaced powers of 2: 1, 226, 251, 277, …, 2230? Some consecutive exponents differ by 25 and some by 26. This looks sorta like a base 225 or base 226 representation, but is a mixture of both. Bernstein answers this in his paper.

Bernstein answers this question as well.

Given that there are 10 pieces, why use radix 225.5 rather than, e.g., radix 225 or radix 226? Answer: My ring R contains 2255x10 − 19, which represents 0 in Z/(2255 − 19). I will reduce polynomial products modulo 2255x10 – 19 to eliminate the coefficients of x10, x11, etc. With radix 225 , the coefficient of x10 could not be eliminated. With radix 226, coefficients would have to be multiplied by 2519 rather than just 19, and the results would not fit into an fp register.

There are a few things to unpack here.

Remember that we’re turning polynomials in to numbers by evaluating them at 1. So when x = 1, 2255x10 – 19  = p = 2255 – 19, which is the zero in the integers mod  2255 – 19.

If we were using base (radix) 225 , the largest number we could represent with a 9th degree polynomial with the restrictions above would be 2250 , so we’d need a 10th degree polynomial; we couldn’t eliminate terms containing x10.

I don’t yet see why working with radix 226 would overflow an fp register. If you do see why, please leave an explanation in the comments.

Related posts

[1] When a cryptographic method has an unjustified parameter, it invites suspicion that the parameter was chosen to create an undocumented back door. This is not the case with Curve25519. For example, why does it use p = 2255 – 19? It’s efficient to use a prime close to a large power of 2, and this p is the closes prime to 2255. The coefficient 486662 is not immediately obvious, but Bernstein explains in his paper how it was the smallest integer that met his design criteria.

]]> 1
Why isn’t CPU time more valuable? Thu, 07 Mar 2019 16:27:06 +0000 Here’s something I find puzzling: why isn’t CPU time more valuable?

I first thought about this when I was working for MD Anderson Cancer Center, maybe around 2002. Our research in adaptive clinical trial methods required bursts of CPU time. We might need hundreds of hours of CPU time for a simulation, then nothing while we figure out what to do next, then another hundreds hours to run a modification.

We were always looking for CPU resources, and we installed Condor to take advantage of idle PCs, something like the SETI at Home or GIMPS projects. Then we had CPU power to spare, sometimes. What could we do between simulations that was worthwhile but not urgent? We didn’t come up with anything.

Fast forward to 2019. You can rent CPU time from Amazon for about 2.5 cents per hour. To put it another way, it’s about 300 times cheaper per hour to rent a CPU than to hire a minimum wage employee in the US. Surely it should be possible to think of something for a computer to do that produces more than 2.5 cents per CPU hour of value. But is it?

Well, there’s cryptocurrency mining. How profitable is that? The answer depends on many factors: which currency you’re mining and its value at the moment, what equipment you’re using, what you’re paying for electricity, etc. I did a quick search, and one person said he sees a 30 to 50% return on investment. I suspect that’s high, but we’ll suppose for the sake of argument there’s a 50% ROI [1]. That means you can make a profit of 30 cents per CPU day.

Can we not thinking of anything for a CPU to do for a day that returns more than 30 cents profit?! That’s mind boggling for someone who can remember when access to CPU power was a bottleneck.

Sometimes computer time is very valuable. But the value of surplus computer time is negligible. I suppose it all has to do with bottlenecks. As soon as CPU time isn’t the bottleneck, its value plummets.

Update: According to the latest episode of the Security Now podcast, it has become unprofitable for hackers to steal CPU cycles in your browser for crypto mining, primarily because of a change in Monero. Even free cycles aren’t worth using for mining! Mining is only profitable on custom hardware.


[1] I imagine this person isn’t renting time from Amazon. He probably has his own hardware that he can run less expensively. But that means his profit margins are so thin that it would not be profitable to rent CPUs at 2.5 cents an hour.

]]> 7
Chaos + Chaos = Order Thu, 07 Mar 2019 14:20:07 +0000 If you take these chaotic-looking values for your x-coordinates

and these chaotic-looking values for your y coordinates

you get this image that looks more ordered.

The image above is today’s exponential sum.

]]> 2
An attack on RSA with exponent 3 Wed, 06 Mar 2019 18:30:30 +0000 As I noted in this post, RSA encryption is often carried out reusing exponents. Sometimes the exponent is exponent 3, which is subject to an attack we’ll describe below [1]. (The most common exponent is 65537.)

Suppose the same message m is sent to three recipients and all three use exponent e = 3. Each recipient has a different modulus Ni, and each will receive a different encrypted message

ci = m³ mod Ni.

Someone with access to c1, c2, and c3 can recover the message m as follows. We can assume each modulus Ni is relatively prime to the others, otherwise we can recover the private keys using the method described here. Since the moduli are relatively prime, we can solve the three equations for m³ using the Chinese Remainder Theorem. There is a unique x < N1 N2 N3 such that

x = c1 mod N1
x = c2 mod N2
x = c3 mod N3

and m is simply the cube root of x. What makes this possible is knowing m is a positive integer less than each of the Ns, and that x < N1 N2 N3. It follows that we can simply take the cube root in the integers and not the cube root in modular arithmetic.

This is an attack on “textbook” RSA because the weakness in this post could be avoiding by real-world precautions such as adding random padding to each message so that no two recipients are sent the exact same message.

By the way, a similar trick works even if you only have access to one encrypted message. Suppose you’re using a 2048-bit modulus N and exchanging a 256-bit key. If you message m is simply the key without padding, then m³ is less than N, and so you can simply take the cube root of the encrypted message in the integers.

Python example

Here we’ll work out a specific example using realistic RSA moduli.

    from secrets import randbits, randbelow
    from sympy import nextprime
    from sympy.ntheory.modular import crt
    def modulus():
        p = nextprime(randbits(2048))
        q = nextprime(randbits(2048))
        return p*q
    N = [modulus() for _ in range(3)]
    m = randbelow(min(N))
    c = [pow(m, 3, N[i]) for i in range(3)]
    x = crt(N, c)[0]
    assert(cbrt(x) == m) # integer cube root

Note that crt is the Chinese Remainder Theorem. It returns a pair of numbers, the first being the solution we’re after, hence the [0] after the call.

The script takes a few seconds to run. Nearly all the time goes to finding the 2048-bit (617-digit) primes that go into the moduli. Encrypting and decrypting m takes less than a second.

Related posts

[1] I don’t know who first discovered this line of attack, but you can find it written up here. At least in the first edition; the link is to the 2nd edition which I don’t have.

]]> 1
Public key encryption based on squares and non squares Wed, 06 Mar 2019 13:00:09 +0000 The RSA encryption algorithm depends indirectly on the assumption that factoring the product of large primes is hard. The algorithm presented here, invented by Shafi Goldwasser and Silvio Micali, depends on the same assumption but in a different way. The Goldwasser-Micali algorithm is more direct than RSA, thought it is also less efficient.

One thing that makes GM interesting is that allows a form of computing on encrypted data that we’ll describe below.

GM in a nutshell

To create a public key, find two large primes p and q and publish N = pq. (There’s one more piece we’ll get to shortly.) You keep p and q private, but publish N, much like with RSA.

Someone can send you a message, one bit at a time, by sending you numbers that either do or do not have a square root mod N.

Sending a 0

If someone wants to send you a 0, they send you a number that has a square root mod N. This is easy to do: they select a number between 1 and N at random, square it mod N, and send you the result.

Determining whether a random number is a square mod N is easy if and only if you know how to factor N. [1]

When you receive the number, you can quickly tell that it is a square because you know how to factor N. The sender knows that it’s a square because he got it by squaring something. You can produce a square without knowing how to factor N, but it’s computationally infeasible to start with a given number and tell whether it’s a square mod N, unless you know the factorization of N.

Sending a 1

Sending a 1 bit is a little more involved. How can someone who cannot factor N produce a number that’s not a square? That’s actually not feasible without some extra information. The public key is not just N. It’s also a number z that is not a square mod N. So the full public key is two numbers, N and z.

To generate a non-square, you first generate a square then multiply it by z.


Suppose you choose p = 314159 and q = 2718281. (Yes, p is a prime. See the post on pi primes. And q comes from the first few digits of e.) In practice you’d choose p and q to be very large, hundreds of digits, and you wouldn’t pick them to have a cute pattern like we did here. You publish N = pq = 853972440679 and imagine it’s too large for anyone to factor (which may be true for someone armed with only pencil and paper).

Next you need to find a number z that is not a square mod N. You do that by trying numbers at random until you find one that is not a square mod p and not a square mod q. You can do that by using Legendre symbols, It turns out z = 400005 will work.

So you tell the world your public key is (853972440679, 400005).

Someone wanting to send you a 0 bit chooses a number between 1 and N = 853972440679, say 731976377724. Then they square it and take the remainder by N to get 592552305778, and so they send you 592552305778. You can tell, using Legendre symbols, that this is a square mod p and mod q, so it’s a square mod N.

If they had wanted to send you a 1, they could have sent us 592552305778 * 400005 mod N = 41827250972, which you could tell isn’t a square mod N.

Homomorphic encryption

Homomorphic encryption lets you compute things on encrypted data without having to first decrypt it. The GM encryption algorithm is homomorphic in the sense that you can compute an encrypted form of the XOR of two bits from an encrypted form of each bit. Specifically, if c1 and c2 are encrypted forms of bits b1 and b2, then c1 c2 is an encrypted form of b1b2. Let’s see why this is, and where there’s a small wrinkle.

Suppose our two bits are both 0s. Then c1 and c2 are squares mod N, and c1 c2 is a square mod N.

Now suppose one bit is a 0 and the other is a 1. Then either c1 is a square mod N and c2 isn’t or vice versa, but in either case their product is not a square mod N.

Finally suppose both our bits are 1s. Since 1⊕1 = 0, we’d like to say that c1 c2 is a square mod N. Is it?

The product of two non-squares is not necessarily a non-square. For example, 2 and 3 are not squares mod 35, and neither is their product 6 [2]. But if we followed the recipe above, and calculated c1 and c2 both by multiplying a square by the z in the public key, then we’re OK. That is, if c1 = x²z and c2 = y²z, then c1c2 = x²y²z², which is a square. So if you return non-squares that you find as expected, you get the homomorphic property. If you somehow find your own non-squares, they might not work.

Related posts

[1] As far as we know. There may be an efficient way to tell whether x is a square mod N without factoring N, but no such method has been published. The problem of actually finding modular square roots is equivalent to factoring, but simply telling whether modular square roots exist, without having to produce the roots, may be easier.

If quantum computing becomes practical, then factoring will be efficient and so telling whether numbers are squares modulo a composite number will be efficient.

[2] You could find all the squares mod 35 by hand, or you could let Python do it for you:

>>> set([x*x % 35 for x in range(35)])
{0, 1, 4, 9, 11, 14, 15, 16, 21, 25, 29, 30}
]]> 2
An infinite product challenge Tue, 05 Mar 2019 13:46:29 +0000 Gil Kalai wrote a blog post yesterday entitled “Test Your Intuition (or knowledge, or programming skills) 36.” The challenge is to evaluate the infinite product

\prod_{p\,\, \mathrm{prime}} \frac{p^2+1}{p^2 - 1}

I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used primerange from SymPy to generate the list of primes up to 200, and cumprod from NumPy to generate the list of partial products.

        [(p*p+1)/(p*p-1) for p in primerange(1,200)]

Apparently the product converges to 5/2, and a plot suggests that it converges very quickly.

Plot of partial products

Here’s another plot to look more closely at the rate of convergence. Here we look at the difference between 5/2 and the partial products, on a log scale, for primes less than 2000.

Plot of 2.5 minus partial products, log scale


]]> 1
Base85 encoding Tue, 05 Mar 2019 13:00:55 +0000 I wrote a while back about Base32 and Base64 encoding, and yesterday I wrote about Bitcoin’s Base58 encoding. For completeness I wanted to mention Base85 encoding, also known as Ascii85. Adobe uses it in PostScript and PDF files, and git uses it for encoding patches.

Like Base64, the goal of Base85 encoding is to encode binary data printable ASCII characters. But it uses a larger set of characters, and so it can be a little more efficient. Specifically, it can encode 4 bytes (32 bits) in 5 characters.

Why 85?

There are 95 printable ASCII characters, and

log95(232) = 4.87

and so it would take 5 characters encode 4 bytes if you use all possible printable ASCII characters. Given that you have to use 5 characters, what’s the smallest base that will still work? It’s 85 because

log85(232) = 4.993


log84(232) = 5.006.

(If you’re not comfortable with logarithms, see an alternate explanation in the footnote [1].)

Now Base85 is different from the other bases I’ve written about because it only works on 4 bytes at a time. That is, if you have a number larger than 4 bytes, you break it into words of 4 bytes and convert each word to Base 85.

Character set

The 95 printable ASCII characters are 32 through 126. Base 85 uses characters 33 (“!”) through 117 (‘u’). ASCII character 32 is a space, so it makes sense you’d want to avoid that one. Since Base85 uses a consecutive range of characters, you can first convert a number to a pure mathematical radix 85 form, then add 33 to each number to find its Base85 character.


Suppose we start with the word 0x89255d9, equal to 143807961 in decimal.

143807961 = 2×854 + 64×853 + 14×852 + 18×85 + 31

and so the radix 85 representation is (2, 64, 14, 18, 31). Adding 33 to each we find that the ASCII values of the characters in the Base85 representation are (35, 97, 47, 51, 64), or (‘#’, ‘a’, ‘/’, ‘3’, ‘@’) and so #a/3@ is the Base85 encoding of 0x89255d9.


The Z85 encoding method is also based on a radix 85 representation, but it chose to use a different subset of the 95 printable characters. Compared to Base85, Z85 adds seven characters

    v w x y z { }

and removes seven characters

    ` \ " ' _ , ;

to make the encoding work more easily with programming languages. For example, you can quote Z85 strings with single or double quotes because neither kind of quote is a valid Z85 character. And you don’t have to worry about escape sequences since the backslash character is not part of a Z85 representation.


There are a couple things that could trip someone up with Base85. First of all, Base 85 only works on 32-bit words, as noted above. For larger numbers it’s not a base conversion in the usual mathematical sense.

Second, the letter z can be used to denote a word consisting of all zeros. Since such words come up disproportionately often, this is a handy shortcut, though it means you can’t just divide characters into groups of 5 when converting back to binary.

Related posts

[1] 954 = 81450625 < 232 = 4294967296, so four characters from an alphabet of 95 elements is not enough to represent 232 possibilities. So we need at least five characters.

855 = 4437053125 > 232, so five characters is enough, and in fact it’s enough for them to come from an alphabet of size 85. But 845 = 4182119424 < 232, so an alphabet of 84 characters isn’t enough to represent 32 bits with five characters.

]]> 4
Base 58 encoding and Bitcoin addresses Mon, 04 Mar 2019 19:07:18 +0000 A few weeks ago I wrote about base32 and base64 encoding. I’ll review these quickly then discuss base58 and its use in Bitcoin.

Base32 and Base64

All three methods have the goal of compactly representing large numbers while maintaining readability. Douglas Crockford’s base32 encoding is the most conservative: it’s case-insensitive and it does not use the letters I, L, O, or U. The first three letters are omitted because of visual similarity to digits, and the last to avoid “accidental obscenities.”

Base 64 is not concerned with avoiding visual similarities, and uses the full upper and lower case alphabet, plus two more symbols, + and /.


Base58 is nearly as efficient as base64, but more concerned about confusing letters and numbers.The number 1, the lower case letter l, and the upper case letter I all look similar, so base58 retains the digit 1 and does not use the lower case letter l or the capital letter I.

The number 0 looks like the lower case letter o and the upper case letter O. Here base58 makes an unusual choice: it keeps the lower case letter o, but does not use the digit 0 or the capital letter O. This is odd because every other encoding that I can think of keep the 10 digits and differs over what letters to use.

Bases like 32 and 64 have the advantage of being trivial to convert back and forth with binary. To convert a binary number to base 2n, you start at the least significant end and convert groups of n bits. Since 58 is not a power of 2, converting to base 58 is more involved.

Bitcoin addresses

Bitcoin addresses are written in base58, and in fact base58 was developed for Bitcoin.

A Bitcoin address is a 25 byte (200 bit) number. Now

log582200 = 34.14

and so it may take up to 35 characters to represent a Bitcoin address in base58. Using base64 would have taken up to 34 characters, so base58 pays a very small price for preventing a class of errors relative to base64. Base32 would require 40 characters.

As noted above, converting between binary and base58 is more complicated than converting between binary and either base32 or base64. However, converting to base58 is trivial compared to everything else that goes into forming a Bitcoin address. The steps, documented here, involve taking an ECDSA public key, applying a secure hash function three times, and appending a checksum.

Related posts

]]> 0
Implementing the ChaCha RNG in Python Sun, 03 Mar 2019 23:11:33 +0000 My previous post talked about the ChaCha random number generator and how Google is using it in a stream cipher for encryption on low-end devices. This post talks about how to implement ChaCha in pure Python.

First of all, the only reason to implement ChaCha in pure Python is to play with it. It would be more natural and more efficient to implement ChaCha in C.

RFC 8439 gives detailed, language-neutral directions for how to implement ChaCha, including test cases for intermediate results. At its core is the function that does a “quarter round” operation on four unsigned integers. This function depends on three operations:

  • addition mod 232, denoted +
  • bitwise XOR, denoted ^, and
  • bit rotation, denoted <<<=n.

In C, the += operator on unsigned integers would do what the RFC denotes by +=, but in Python working with (signed) integers we need to explicitly take remainders mod 232. The Python bitwise-or operator ^ can be used directly. We’ll write a function roll that corresponds to <<<=.

So the following line of pseudocode from the RFC

    a += b; d ^= a; d <<<= 16;


    a = (a+b) % 2**32; d = roll(d^a, 16)

in Python. One way to implement roll would be to use the bitstring library:

    from bitstring import Bits

    def roll(x, n):
        bits = Bits(uint=x, length=32)
        return (bits[n:] + bits[:n]).uint

Another approach, a little harder to understand but not needing an external library, would be

    def roll2(x, n):
        return (x << n) % (2 << 31) + (x >> (32-n))

So here’s an implementation of the ChaCha quarter round:

    def quarter_round(a, b, c, d):
        a = (a+b) % 2**32; d = roll(d^a, 16)
        c = (c+d) % 2**32; b = roll(b^c, 12)
        a = (a+b) % 2**32; d = roll(d^a,  8)
        c = (c+d) % 2**32; b = roll(b^c,  7)
        return a, b, c, d

ChaCha has a state consisting of 16 unsigned integers. A “round” of ChaCha consists of four quarter rounds, operating on four of these integers at a time. All the details are in the RFC.

Incidentally, the inner workings of the BLAKE2 secure hash function are similar to those of ChaCha.

Related posts

]]> 1
Google Adiantum and the ChaCha RNG Sat, 02 Mar 2019 22:11:09 +0000 The ChaCha cryptographic random number generator is in the news thanks to Google’s Adiantum project. I’ll discuss what’s going on, but first a little background.

Adiantum maidenhead fern

The name of the project comes from a genus of fern. More on that below as well.

One-time pads

The one-time pad is a provably unbreakable way to encrypt things. You create a sheet of random bits and give your counterpart an exact copy. Then when it comes time for you to send an encrypted message, you convert your message to a stream of bits, XOR your message with the random bits you exchanged previously, and send the result. The recipient then takes the XOR of the received message with the pad of random bits, and recovers the original message.

This is called a one-time pad because it’s a pad of bits that you can only use one time. If you reuse a pad, it’s no longer unbreakable.

One-time pads are impractical for a couple reasons. First, it’s hard to generate truly random bits, especially in bulk. Second, exchanging the pads is almost as difficult as exchanging messages.

Stream ciphers

So here’s a bright idea: we’ll get around both of the problems with one-time pads by using pseudorandom bits rather than random bits! The both parties can generate their own random bits.

Many people have had this idea, and it’s not necessarily a bad one. It’s called a stream cipher. The problem is that most pseudorandom number generators are not up to the task. You need a cryptographically secure RNG, and most RNGs are far from secure. The ChaCha RNG, however, appears to be good enough to use in a stream cipher, given enough rounds of scrambling [1], and Google is using it for full disk encryption in Android devices.

Full disk encryption

If you forget your password to your computer, you may not be able to access your data, but a thief still could by removing the hard drive and accessing it from another computer. That is, unless the disk is encrypted.

Full disk encryption on a laptop, such as BitLocker on Windows or FileVault on OSX, is usually implemented via AES encryption with hardware acceleration. If you don’t have special hardware for encryption, AES can be too slow.

Adiantum: ChaCha encryption on Android

On low-end devices, ChaCha encryption can be around 5x faster than AES. So Google is using ChaCha for Android devices, using what it calls Adiantum.

You can read the technical details in [2], and you can read more about the ChaCha random number generator in [3].

So where does the name Adiantum come from? It’s a Victorian name for a genus of maidenhair ferns, symbolic of sincerity and discretion.

Related posts

[1] Adiantum using ChaCha with 12 rounds. TLS 1.3 uses ChaCha with 20 rounds.

[2] Adiantum: length-preserving encryption for entry-level processors by Google employees Paul Crowley and Eric Biggers.

[3] IRTF RFC 8439: ChaCha20 and Poly1305 for IETF Protocols

]]> 3
Congress and the Equifax data breach Sat, 02 Mar 2019 16:24:32 +0000 Dialog from a congressional hearing February 26, 2019.

Representative Katie Porter: My question for you is whether you would be willing to share today your social security, your birth date, and your address at this public hearing.

Equifax CEO Mark Begor: I would be a bit uncomfortable doing that, Congresswoman. If you’d so oblige me, I’d prefer not to.

KP: Could I ask you why you’re unwilling?

MB: Well that’s sensitive information. I think it’s sensitive information that I like to protect, and I think consumers should protect theirs.

KP: My question is then, if you agree that exposing this kind of information, information like that you have in your credit reports, creates harm, therefore you’re unwilling to share it, why are your lawyers arguing in federal court that there was no injury and no harm created by your data breach?

Related posts

]]> 2
Sharing secrets with polynomials Fri, 01 Mar 2019 02:18:56 +0000 This post will present a couple ways to share secrets using polynomials. We have a group of n people who want to share a secret between them so that k of them will have to cooperate in order to unlock the secret. For example, maybe a committee of n = 5 wants to require the cooperation of at least k = 3 members.

Shamir’s method

Adi Shamir came up with the idea of using polynomials to share secrets as follows. First, encode the secret you want to share as an integer a0. Next, generate m = k-1 other random integers a1 through am and use these as coefficients of a polynomial f of degree m:

f(x) = a_0 + a_1x + a_2x^2 + \cdots + a_mx^m

A trusted party generates n random integers values of x and gives each person an x and the corresponding value of f(x). Since m+1 points completely determine a mth degree polynomial, if k = m+1 people share their data, they can recover f, and thus recover the secret number a0. This can be efficiently, for example, by using Lagrange interpolation. But with fewer than k data points, the polynomial remains undetermined.

In practice we’d work over the integer modulo a large prime p. While fewer than k data points will not let someone completely determine the polynomial f, it will narrow down the possible coefficients if we’re working over the integers. Working modulo a large prime instead reveals less information.

Verifiable secret sharing

There’s a possible problem with Shamir’s method. Maybe the trusted party made a mistake. Or maybe the trusted party was dishonest and shouldn’t have been trusted. How can the parties verify that they have been given valid data without unlocking the secret? Seems we’re at a logical impasse since you’d have to recover the polynomial to know if your points are on the polynomial.

Paul Feldman came up with a way to assure the participants that the secret can be unlocked without giving them the information to unlock it. The trick is to give everyone data that in principle would let them determine the polynomial, but in practice would not.

We choose a large prime p such that p-1 has a large prime factor q [1]. Then the multiplicative group of non-zero integers mod p has a subgroup of order q. Let g be a generator of that group. The idea is to let everyone verify that

y_i = f(x_i)

for their given (xi, yi) by letting them verify that

g^{y_i} = g^{f(x_i)}

where all calculations are carried out mod p. Our trusted party does this by computing

A_i \equiv g^{a_i}\pmod{p}

for each coefficient ai and letting everyone know g and each of the Ai‘s.

In principle, anyone could solve for a0 if they know A0. But in practice, provided q is large enough, this would not be possible because doing so would require solving the discrete logarithm problem, which is computationally difficult. It’s possible to compute discrete logarithms for small q, but the difficulty goes up quickly as q gets larger.

How do the the Ai‘s let everyone verify that their (xi, yi) data is correct?

Each person can verify that

g^{y_i} = \prod_{j=0}^m A_j^{x_i^j} 

using the public data and their personal data, and so they can verify that

g^{y_i} = \prod_{j=0}^m A_j^{x_i^j} = \prod_{j=0}^m g^{a_j x_i^j} = g^{f(x_i)} 

Related posts

[1] Conceptually you pick p‘s until you find one so that p-1 has a large prime factor q. In practice, you’d do it the other way around: search for large primes q until you find one such that, say, 2q + 1 is also prime.

]]> 5
Miscellaneous Wed, 27 Feb 2019 14:51:45 +0000 Image editor

Image editing software is complicated, and I don’t use it often enough to remember how to do much. I like Paint.NET on Windows because it is in a sort of sweet spot for me, more powerful than Paint and much less complicated than Photoshop.

I found out there’s a program Pinta for Linux that was inspired by Paint.NET. (Pinta runs on Windows, Mac, and BDS as well.)

Exponential sum of the day

I have a page that draws a different image every day, based on putting the month, day, and the laws two digits of the year into an exponential sum. This year’s images have been more intricate than last year’s because 19 is prime.

I liked today’s image.

exponential sum for 2019-02-27

The page has a link to details explaining the equation behind the image, and an animate link to let you see the sequence in which the points are traversed.

Podcast interview

Rebecca Herold posted a new episode of her podcast yesterday in which she asks me questions about privacy and artificial intelligence.

Entropy update

I updated my blog post on solving for probability from entropy because Sjoerd Visscher pointed out that a crude approximation I used could be made much more accurate with a minor tweak.

As a bonus, the new error plot looks cool.

approximation error on log scale

My monthly newsletter comes out tomorrow. This newsletter highlights the most popular blog posts of the month.

I used to say something each month about what I’m up to. Then I stopped because it got to be repetitive. Tomorrow I include a few words about projects I have coming up.

The letter S

I was helping my daughter with physics homework last night and she said “Why do they use s for arc length?!” I said that I don’t know, but that it is conventional.

By the way, this section heading is a reference to Donald Knuth’s essay The Letter S where he writes in delightful Knuthian detail about the design of the letter S in TeX. You can find the essay in his book Literate Programming.

]]> 2
What sticks in your head Tue, 26 Feb 2019 14:40:36 +0000 This morning I read an article by Dennis Felsing about his impressive/intimidating Linux desktop setup. He uses a lot of tools that are not the easiest way to get things done immediately but are long-term productivity investments.

Remembrance of syntax past

Felsing apparently is able to remember the syntax of scores of tools and programming languages. I cannot. Part of the reason is practice. I cannot remember the syntax of any software I don’t use regularly. It’s tempting to say that’s the end of the story: use it or lose it. Everybody has their set of things they use regularly and remember.

But I don’t think that’s all. I remember bits of math that I haven’t used in 30 years. Math fits in my head and sticks. Presumably software syntax sticks in the heads of people who use a lot of software tools.

There is some software syntax I can remember, however, and that’s software closely related to math. As I commented here, it was easy to come back to Mathematica and LaTeX after not using them for a few years.


Imprinting has something to do with this too: it’s easier to remember what we learn when we’re young. Felsing says he started using Linux in 2006, and his site says he graduated college in 2012, so presumably he was a high school or college student when he learned Linux.

When I was a student, my software world consisted primarily of Unix, Emacs, LaTeX, and Mathematica. These are all tools that I quit using for a few years, later came back to, and use today. I probably remember LaTeX and Mathematica syntax in part because I used it when I was a student. (I also think Mathematica in particular has an internal consistency that makes its syntax easier to remember.)

Picking your memory battles

I see the value in Felsing’s choice of tools. For example, the xmonad window manager. I’ve tried it, and I could imagine that it would make you more productive if you mastered it. But I don’t see myself mastering it.

I’ve learned a few tools with lots of arbitrary syntax, e.g. Emacs. But since I don’t have a prodigious memory for such things, I have to limit the number of tools I try to keep loaded in memory. Other things I load as needed, such as a language a client wants me to use that I haven’t used in a while.

Revisiting a piece of math doesn’t feel to me like revisiting a programming language. Brushing up on something from differential equations, for example, feels like pulling a book off a mental shelf. Brushing up on C# feels like driving to a storage unit, bringing back an old couch, and struggling to cram it in the door.

Middle ground

There are things you use so often that you remember their syntax without trying. And there are things you may never use again, and it’s not worth memorizing their syntax just in case. Some things in the middle, things you don’t use often enough to naturally remember, but often enough that you’d like to deliberately remember them. Some of these are what I call bicycle skills, things that you can’t learn just-in-time. For things in this middle ground, you might try something like Anki, a flashcard program with spaced repetition.

However, this middle ground should be very narrow, at least in my experience/opinion. For the most part, if you don’t use something often enough to keep it loaded in memory, I’d say either let it go or practice using it regularly.

Related posts

]]> 2
Testing for primes less than a quintillion Mon, 25 Feb 2019 13:15:40 +0000 The most common way to test whether a large number is prime is the Miller-Rabin test. If the test says a number is composite, it’s definitely composite. Otherwise the number is very likely, but not certain, to be prime. A pseudoprime is a composite number that slips past the Miller-Rabin test. (Actually, a strong pseudoprime. More on that below.)

Miller-Rabin test

The Miller-Rabin test is actually a sequence of tests, one for each prime number. First you run the test associated with 2, then the test associated with 3, then the one associated with 5, etc. If we knew the smallest numbers for which these tests fail, then for smaller numbers we know for certain that they’re prime if they pass. In other words, we can turn the Miller-Rabin test for probable primes into test for provable primes.

Lower bound on failure

A recent result by Yupeng Jiang and Yingpu Deng finds the smallest number for which the Miller-Rabin test fails for the first nine primes. This number is

N = 3,825,123,056,546,413,051

or more than 3.8 quintillion. So if a number passes the first nine Miller-Rabin tests, and it’s less than N, then it’s prime. Not just a probable prime, but definitely prime. For a number n < N, this will be more efficient than running previously known deterministic primality tests on n.

Python implementation

Let’s play with this in Python. The SymPy library implements the Miller-Rabin test in a function mr.
The following shows that N is composite, and that it is a false positive for the first nine Miller-Rabin tests.

    from sympy.ntheory.primetest import mr

    N = 3825123056546413051
    assert(N == 149491*747451*34233211)
    ps = [2, 3, 5, 7, 11, 13, 17, 19, 23]
    print( mr(N, ps) )

This doesn’t prove that N is the smallest number with these properties; we need the proof of Jiang and Deng for that. But assuming their result is right, here’s an efficient deterministic primality test that works for all n less than N.

    def is_prime(n):
        N = 3825123056546413051
        assert(n < N)
        ps = [2, 3, 5, 7, 11, 13, 17, 19, 23]
        return mr(n, ps)

Jiang and Deng assert that N is also the smallest composite number to slip by the first 10 and 11 Miller-Rabin tests. We can show that N is indeed a strong pseudoprime for the 10th and 11th primes, but not for the 12th prime.

    print( mr(N, [29, 31]) )
    print( mr(N, [37]) )

This code prints True for the first test and False for the second. That is, N is a strong pseudoprime for bases 29 and 31, but not for 37.

Pseudoprimes and strong pseudoprimes

Fermat’s little theorem says that if n is prime, then

an-1 = 1 mod n

for all 0 < an.  This gives a necessary but not sufficient test for primality. A (Fermat) pseudoprime for base a is a composite number n such that the above holds, an example of where the test is not sufficient.

The Miller-Rabin test refines Fermat’s test by looking at additional necessary conditions for a number being prime. Often a composite number will fail one of these conditions, but not always. The composite numbers that slip by are called strong pseudoprimes or sometimes Miller-Rabin pseudoprimes.

Miller and Rabin’s extra testing starts by factoring n-1 into 2sd where d is odd. If n is prime, then for all 0 < a < n either

ad = 1 mod n


a2kd = -1 mod n

for all k satisfying 0 ≤ k < s. If one of these two conditions holds for a particular a, then n passes the Miller-Rabin test for the base a.

It wouldn’t be hard to write your own implementation of the Miller-Rabin test. You’d need a way to work with large integers and to compute modular exponents, both of which are included in Python without having to use SymPy.


561 is a pseudoprime for base 2. In fact, 561 is a pseudoprime for every base relatively prime to 561, i.e. it’s a Carmichael number. But it is not a strong pseudoprime for 2 because 560 = 16*35, so d = 35 and

235 = 263 mod 561,

which is not congruent to 1 or to -1. In Python,

    >>> pow(2, 560, 561)
    >>> pow(2, 35, 561)

Related posts

]]> 2
The point at infinity Mon, 25 Feb 2019 00:30:52 +0000 As I explained in an earlier post, a first pass at the definition of an elliptic curve is the set of points satisfying

y² = x³ + ax + b.

There are a few things missing from this definition, as indicated before, one being the mysterious “point at infinity.” I gave a hand-waving explanation that you could get rid of this exceptional point by adding an additional coordinate. Here I’ll describe that in more detail.

Projective coordinates

You could add another coordinate z that’s a sort of silent partner to x and y most of the time. Instead of pairs of points (x, y), we consider equivalence classes of points (x, y, z) where two points are equivalent if each is a non-zero multiple of the other [1]. It’s conventional to use the notation (x : y : z) to denote the equivalence class of (x, y, z).

In this construction, the equation of an elliptic curve is

y²z = x³ + axz² + bz³.

Since triples are in the same equivalence class if each is a multiple of the other, we can usually set z equal to 1 and identify the pair (x, y) with (x : y : 1). The “point at infinity” corresponds to the equivalence class (0 : 1 : 0).

Programming hack

From a programming perspective, you could think of z as a finiteness flag, a bit that is set to indicate that the other two coordinates can be taken at face value.

Projective space

This three-coordinate version is called projective coordinates. Textbooks usually start out by defining projective space and then say that an elliptic curve is a set of points in this space. But if you’re focused on the elliptic curve itself, you can often avoid thinking of the projective space it sits in.

One way to think of projective space is that we add a dimension, the extra coordinate, then subtract a dimension by taking equivalence classes. By doing so we almost end up back where we started, but not quite. We have a slightly larger space that includes a couple “points at infinity,” one of which will be on our curve.

Alternating tools

It’s inconvenient to carry around an extra coordinate that mostly does nothing. But it’s also inconvenient to have a mysterious extra point. So which is better? Much of the time you can ignore both the point at infinity and the extra coordinate. When you can’t, you have a choice which way you’d rather think of things. The point at infinity may be easier to think about conceptually, and projective coordinates may be better for doing proofs.

Concrete example

Let’s get concrete. We’ll look at the curve

y² = x³ + x + 1

over the integers mod 5. There are nine points on this curve: (0, ±1), (2, ±1), (3, ±1), (4, ±2), and ∞. (You could replace -1 with 4 and -2 with 3 if you’d like since we’re working mod 5.)

In the three-coordinate version, the points are (0 : ±1 : 1), (2 : ±1 : 1), (3 : ±1 : 1), (4 : ±2 : 1), and (0 : 1 : 0).

Related posts

[1] We leave out (0, 0, 0). It doesn’t exist in the world we’re constructing, i.e. projective space.

]]> 2
More of everything Fri, 22 Feb 2019 18:50:57 +0000 If you want your music to have more bass, more mid-range, and more treble, then you just want the music louder. You can increase all three components in absolute terms, but not in relative terms. You can’t increase the proportions of everything.

Would you like more students to major in STEM subjects? OK, what subjects would you like fewer students to major in? English, perhaps? Administrators are applauded when they say they’d like to see more STEM majors, but they know better than to say which majors they’d like to see fewer of.

We have a hard time with constraints.

I’m all for win-win, make-the-pie-bigger solutions when they’re possible. And often they are. But sometimes they’re not.

]]> 2