Recovering a permuted seed phrase

As mentioned in the previous post, crypto wallets often turn a list of words, known as a seed phrase, into private keys. These words come from a list of 211 = 2048 words chosen to be distinct and memorable. You can find the list here. Typically a seed phrase will contain 12 words. For example:

prize, differ, year, cloth, require, wild, clean, kit, cart, knock, biology, above

Each word carries 11 bits of information, so in total this represents 132 bits, which is 128 bits of content and 4 bits of checksum.

Now suppose you memorized your list of 12 words, but then later you don’t remember them in the correct order. Then you have about half a billion permutations to try because

12! = 479,001,600.

However, not all of these permutations are equally likely to be correct. You are far more likely to transpose a couple words, for example, than to completely scramble the list. So you would start with the list of words in the remembered order, then explore further and further afield from initial sequence. In mathematical terms, you’ll try the permutations of your remembered sequence in order of Cayley distance.

There are 66 permutations of a list of 12 things that are a transposition of two elements. So if the sequence you started with was

abcdefghijkl

then you’d try things like bacdefghijkl or ebcdafhghijkl. There are 66 possibilities because there are 66 ways to choose 2 items from a set of 12; you’re choosing the two items to swap.

If the above didn’t work, you could next explore permutations that are at a Cayley distance of 2 from what you remember. Now there are 1,925 possibilities. At a Cayley distance of 3 there are 32,670 possibilities.

In general, the number of arrangements at a Cayley distance k from the original sequence of n items equals

|S1(n, nk)|

where S1 denotes the Stirling number of the first kind.

There are other approaches to measuring how far apart two permutations are. Another possibility is Kendall distance which counts the number of adjacent transpositions needed to turn one sequence into another. Kendall distance is sometimes called bubble sort distance by analogy with the bubble sort algorithm. Kendall distance may be more appropriate than Cayley distance because people are more likely to accidentally transpose adjacent elements of a sequence.

Neither Cayley distance nor Kendall distance exactly correspond to the corruption of human memories, but both would guide you in correcting likely changes before exploring unlikely changes.

Related posts

What’s in your wallet?

What’s in your Bitcoin wallet? Very little. I don’t mean very little value, but very little data. If you’re a Bitcoin billionaire, your wallet still doesn’t contain very many bits.

You might reasonably expect that a crypto wallet is a container, given the analogy with an ordinary wallet, but it’s not much of a container. It’s more like a bank teller than a physical wallet. It’s primarily a piece of software that facilitates transactions.

It’s misleading to speak of a wallet as holding cryptocurrency. It holds private keys that allow you to access cryptocurrency, so in that sense it’s a password manager. Not only that, these days it’s effectively a password manager containing only one password.

When you back up a wallet, say the Exodus wallet, it asks you to store a seed phrase somewhere safe. Fine, but how do you back up your wallet, not just your seed phrase? How do you back up the data in your wallet? You don’t, because there isn’t any data in your wallet other than the seed phrase [1]. Everything is derived from your seed phrase, at least for the current style of wallet, technically known as a hierarchical deterministic (HD) wallet. The derivation algorithm is documented in BIP39.

The first crypto wallets held a set of private keys. But now wallets derive private keys from your seed phrase. So even though you may have used multiple private keys, the wallet doesn’t store multiple keys. It could store them, but it doesn’t need to. That’s why you can back up your wallet by only backing up your seed phrase.

Related posts

[1] A wallet may contain metadata, such as notes on transactions or software configuration preferences. This data isn’t recovered if you restore you wallet from just your seed phrase. But all your private keys can be regenerated. Think of the private keys as traditional metal house keys and the metadata as a sticky note on top of the keys. If you lose your keys and the note, you’d be happy to get the keys back.

Randomly generated dragon

Here’s a simple way to generate a fractal known as the Twin Dragon. Start with random values of x and y and repeatedly update the according to the rule

xnew = (− xold + yold)/2 − b
ynew = (− xoldyold)/2

where b is randomly chosen to be 0 or 1 with equal probability. The plot of these points fills in the Twin Dragon.

Here’s what the plot looks like after 10,000 iterations.

And after 100,000 iterations.

Here’s the Python script I used to make the plots.

import matplotlib.pyplot as plt
from numpy.random import random, choice

x, y = random(), random()
for _ in range(100000):
    x, y = (-x + y)/2, (-x - y)/2
    x -= choice([0, 1])
    plt.plot(x, y, 'bo', markersize=1)
plt.show()

The algorithm used here is a particularly special case of a more general algorithm for generating similar fractals found in [1].

Update: Here’s a different way to produce the same fractal from Donald Knuth.

Here’s a similar post from several years ago:
The chaos game and the Sierpinski triangle

[1] Darst, Palagallo, and Price. Fractal Tilings in the Plane. Mathematics Magazine [71]:1, 1998.

Converting very long strings to integers in Python

In the process of writing the previous post, I wanted to confirm that the number in the post

really is prime. This was useful in debugging my manual conversion of the image to text: errors did not result in a prime number. For example, I didn’t see the 9’s in the image at first, and I didn’t get a prime number until I changed four of the 8’s to 9’s.

But there was a problem along the way. Simply converting the string to an integer didn’t work. It produced the following error:

SyntaxError: Exceeds the limit (4300) for integer string conversion: value has 5382 digits; use sys.set_int_max_str_digits() to increase the limit – Consider hexadecimal for huge integer literals to avoid decimal conversion limits.

Note that the limitation is not on the size of a Python integer. The only limitation on the size of an integer is available memory. But there is a limitation on the size of string that can be converted to an integer.

The fix suggested in the error message didn’t work. But storing the number as several strings, i.e. each row of the image, and doing my own radix conversion did work.

from sympy import isprime

flaglines = [
    "888888888888888888888888888888888888888888888888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118888811188988111888881118888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118898811188888111888881118888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111889881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111888881118898811188888111888881118888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888888888888888888888888888888888888888888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888888888888888888888888888888888888888888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
]

m = int(flaglines[0])
for i in range(1, len(flaglines)):
    line = flaglines[i]
    x = int(line)
    m = m * 10**len(line) + x
print(isprime(n))

American Flag Prime

The following prime number looks like a black-and-white image of an American flag.

The number mostly consists of the digits 1, 3, and 8, but there are a few 9’s. The following image colors the 8’s blue, the 3’s red, and the 1’s white. The background is gray so you can see the 1s.

I found this in [1]. The article includes a link to a text version of the number, but the link is broken, so I had to convert the image to text.

Unfortunately tesseract did a poor job, and so did Grok, so I ended up more or less doing the conversion by hand. I made a text version and posted it here.

See the next post for a calculation showing that the number is indeed prime.

[1] The United States of America Prime. Vadim Ponomarenko. Math Horizons, Vol. 28, No. 4 (April 2021)

Uppercase Eszett

I’ve typed Karl Weierstrass’ name quite a few times lately and thought about how you’ll sometimes see his name written as Weierstraß in English text. That led me to look up the rules for when to use ß and when to use ss. The rules are moderately complicated, and have varied over time and location. Here I just want to look at the capitalization rules.

Typically ß is replaced with SS when converting from lower case to upper case. This means that the length of a string can change when changing case. Surely this has caused numerous software bugs.

>>> w = "Weierstraß"
>>> len(w), len(w.upper())
(10, 11)

There was no uppercase counterpart of ß (U+00DF) until ẞ (U+1E9E) was introduced in 2008. I wondered whether the code above would run differently if I set my locale to de_DE (Germany). Would w.upper() return WEIERSTRASS or WEIERSTRAẞ?

It turns out that Python follows Unicode’s case mapping rules, and these rules say ß becomes SS when changing to uppercase. The code will run the same everywhere, independent of locale. So if you want ß to convert to uppercase as ẞ you’ll have to use customized code.

ASCII was designed so that uppercase and lowercase English letters differed by 32 (i.e. 0x20 in hex). This convention was carried over into Unicode for other alphabets, with a few exceptions, and it almost holds for German as the following code shows.

upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜẞ"
lower = "abcdefghijklmnopqrstuvwxyzäöüß"
for u, l in zip(upper, lower):
    if ord(l) - ord(u) != 32:
        print("Exception:", u, l)

This prints

Exception: ẞ ß

The code points for Ä and ä, Ö and ö, and Ü and ü were spaced 32 points apart in extensions of ASCII that predate Unicode and the spacing carried over into Unicode. But the uppercase ẞ could not have Unicode value U+00BF because that code point was already occupied by the inverted question mark ¿.

Weierstrass, Montgomery, and Edwards elliptic curve forms

All elliptic curves can be written in Weierstrass form

y² = x³ + ax + b

with a few exceptions [1].

Montgomery elliptic curves have the form

B y² = x³ + A x² + x

and twisted Edwards curves have the form

a x² + y² = 1 + d x² y²

Every Montgomery curve has a twisted Edwards form, and vice versa, but not all curves have a Montgomery or twisted Edwards form.

Converting coefficients

Here is Python code for converting coefficients between the various forms. See [2].

def Montgomery_to_Twisted_Edwards(B, A, p):
    # By^2 = x^3 + Ax^2 + x
    a = ((A + 2)*pow(B, -1, p)) % p
    d = ((A - 2)*pow(B, -1, p)) % p          
    return a, d

def Twisted_Edwards_to_Montgomery(a, d, p):
    # ax^2 + y^2 = 1 + d x^2 y^2
    x = pow(a - d, -1, p)
    B = (4*x) % p
    A = (2*(a + d)*x) % p
    return B, A

def Montgomery_to_Weierstrass(B, A, p):
    # By^2 = x^3 + Ax^2 + x
    a = (B**2 * (1 - A**2*pow(3, -1, p))) % p
    b = (B**3 * (2*A**3 - 9*A) * pow(27, -1, p)) % p
    return a, b

Tiny Jubjub curve

The code below confirms that the forms of the Tiny Jubjub curve (TJJ) given in the previous post are consistent.

# Twisted Edwards definition
a, d, p = 3, 8, 13

B, A = Twisted_Edwards_to_Montgomery(a, d, p)
assert(B == 7)
assert(A == 6)

wa, b = Montgomery_to_Weierstrass(B, A)
assert(wa == 8)
assert(b == 8)

Baby Jubjub

I was looking up the Baby Jubjub curve and found incompatible definitions. All agreed that the field is integers modulo the prime

p = 21888242871839275222246405745257275088548364400416034343698204186575808495617

and that the Montgomery form of the curve is

y² = x³ + 168698 x² + x

But the sources differed in the twisted Edwards form of the curve. The code above shows that a correct twisted Edwards form is the one given in [2]:

168700 x² + y² = 1 + 168696 x² y²

Now it is possible to find multiple Edwards forms for a given Montgomery form, so an alternate form is not necessarily wrong. But when I converted both to Weierstrass form and computed the j-invariant, the results were different, and so the curves were not equivalent.

Related posts

[1] All elliptic curves can be written in Weierstrass form as long as the underlying field does not have characteristic 2 or 3. Cryptography is primarily interest in fields whose characteristic is a gargantuan prime, not 2 or 3.

[2] Marta Bellés-Muñoz, Barry Whitehat, Jordi Baylina, Vanesa Daza, and Jose Luis Muñoz-Tapia. Twisted edwards elliptic curves for zero-knowledge circuits. Mathematics, 9(23), 2021.

Tiny Jubjub

A few days ago I wrote about the Jubjub elliptic curve that takes its name from Lewis Carroll’s poem Jabberwocky. I’ve also written about a slightly smaller but still enormous curve named Baby Jubjub.

This post introduces Tiny Jubjub, an elliptic curve with 20 elements, one designed to have some properties in common with its gargantuan counterparts, but small enough to work with by hand. As far as I know, the curve was created for the book The Moon Math Manual to zk-SNARKs and may not be known anywhwere outside that book.

First, a word about the book. The math behind zero-knowledge proofs is complicated and unfamiliar to most, and has been called “moon math.” The “zk” in the title stands for zero-knowledge. SNARK stands for “succinct non-interactive argument of knowledge.” The book is an introduction to some of the mathematics and computer science used in the Zcash privacy coin.

Equation and size

The Tiny Jubjub curve, abbreviated TJJ,  has Weierstrass form

y² = x³ + 8x + 8

and is defined over the field F13, the integers mod 13. As mentioned above, it has 20 points. This is interesting because it is close to Hesse’s upper bound on the number of points on an elliptic curve as a function of the field size. In all the examples in my post from a couple days ago the number of points on each curve was equal to slightly below the middle of the possible range.

Here’s a checkerboard plot of TJJ, like the plots given here.

Note that there are 19 blue squares above. Elliptic curves always have an extra point at infinity not shown.

Montgomery form

One thing Tiny Jubjub has in common with the better known Jubjub curves is that it has a Montgomery form. That is, there is a change of variables that puts the curve in the form

By² = x³ + Ax + b

with B(A² − 4) ≠ 0. Every elliptic curve in Montgomery form can be put in Weierstrass form, but not vice versa; Montgomery curves are special.

TJJ can be put in Montgomery form with A = 6 and B = 7.

Twisted Edwards form

Every elliptic curve that can be put into Montgomery form can also be put into twisted Edwards form

a x² + y² = 1 + d x² y²

with a ≠ 0 ≠ d.

TJJ can be put into Montgomery form with a = 3 and d = 8.

It’s useful when a curve can be put in twisted Edwards form because elliptic curve arithmetic can be implemented without branching logic for special cases.

Note that this plot has 20 blue squares. That’s because for Montgomery curves, the point at infinity is (0, 1), and so it falls inside the grid we’re plotting.

SNARK friendly

TJJ is a SNARK-friendly curve, which is why it appears often in a book on zk-SNARKs. A curve is said to be SNARK friendly if in its twisted Edwards form a is a quadratic residue and d is not. In our case, that means

x² = 3 mod 13

has a solution but

x² = 8 mod 13

does not. We can verify this with the following Python code.

    >>> [x**2 % 13 for x in range(13)]
    [0, 1, 4, 9, 3, 12, 10, 10, 12, 3, 9, 4, 1]

Notice that 3 is in the output and 8 is not.

Related posts

Two ways of generalizing π

The constant π is the ratio of a circle’s circumference to its diameter. We can generalize π by generalizing what a circle is. We will define a p-circle to be the solution to

|x|^p + |y|^p = 1
for 1 ≤ p ≤ ∞. The case of p = ∞ is defined as the limit of the cases of p as p → ∞.

Here’s what p-circles look like for p = 1, 1.5, 2, 4, and ∞.

First approach

We could define πp as the ratio of the perimeter of a p-circle to its diameter. When p = 2 we get π, so π2 is just π. And we can see that π1 = 2√2 and π = 4.

We might ask for what value of p does πp = 3. Evidently for some value between 1 and 2, which we can find numerically to be 1.5815 using the Python code from this post.

Confession: I stretched the truth a little in the plot above. The curve labeled p = 1.5 actually corresponds to p = 1.5815, so the circumference of the orange curve above is exactly 6. (The difference between a 1.5-circle and a 1.5815-circle is hardly noticeable.)

The approach above is one way to generalize π, and it is the approach taken in “squigonometry.”

Second approach

There’s an inconsistency in the approach above. We’ve changed our definition of circle without changing our definition of arc length. We could say a p-circle is the set of points with constant distance to the center, as measured in the p-norm metric.

d_p\left((x_1, y_1), (x_2, y_2)\right) = \left(|x_1 - x_2|^p + |y_1 - y_2|^p\right)^{1/p}

This is consistent with the definition above. However, when we also measure the circumference of the p-circle in the p-norm metric we get a different circumference and hence a different definition of πp.

As before, π2 = π and π = 4. But with this new definition π1 = 4. How is this? The 1-circle is a diamond, and the length of one side of this diamond is √2 in the Euclidean (2-norm) metric, but the length is 2 in the 1-norm metric.

Comparison plots

Under the first definition of πp, computing the perimeter of a p-circle using the 2-norm, πp is an increasing function of p.

Under the second definition πp, computing the perimeter of a p-circle using the p-norm, πp is not a monotone function of p. Instead, πp starts at 4 when p = 1, decreases to a minimum at p = 2, and increases asymptotically to 4 as p → ∞. Or as the title of [1] puts it, “π is the minimum value for pi.”

[1] C. L. Adler, James Tanton. π is the Minimum Value for Pi. The College Mathematics Journal, Vol. 31, No. 2 (Mar., 2000), pp. 102-106

Misleading plots of elliptic curves

The elliptic curves used in cryptography are over finite fields. They’re not “curves” at all in the colloquial sense of the word. But they are defined by analogy with continuous curves, and so most discussions of elliptic curves in cryptography start by showing a plot of a real elliptic curve.

Here’s a plot of y² = x³ + 2 for real x and y.

That’s fine as far as it goes. Resources quickly go on to say that the curves they’ll be looking at are discrete, and so they may then add something like the following. Here’s the same elliptic curve as above over the integers mod 11.

And again mod 997.

These plots are informative in a couple ways. First, they show that the elements of the curve are discrete points. Second, they show that the points are kinda randomly distributed, which hints at why elliptic curves might be useful in cryptography.

But the implied density of points is entirely wrong. It implies that the density of elliptic curve points increases with the field size increases, when in fact the opposite is true. The source of the confusion is using dots of constant size. A checkerboard graph would be better. Here’s a checkerboard plot for the curve mod 11.

If we were to do the same for the curve mod 997 we’d see nothing. The squares would be too small and too few to see. Here’s a plot for the curve mod 211 that gives a hint of the curve elements as dust in the plot.

An elliptic curve over the integers modulo a prime p lives in a p by p grid, and the number of points satisfying the equation of an elliptic curve is roughly p. So the density of points in the grid that belong to the curve is 1/p.

The elliptic curves used in cryptography are over fields of size 2255 or larger, and so the probability that a pair of x and y values chosen at random would be on the curve is on the order of 2−255, virtually impossible.

We can be more precise than saying the number of points on the curve is roughly p. If p isn’t too large, we can count the number of points on an elliptic curve. For example, the curve

y² = x³ + 2 mod p

has 12, 199, and 988 points when p = 11, 211, and 997. (If you count the points in the plots above for p = 11 you’ll get 11 points. Elliptic curves always have an extra point at infinity not shown.)

Hasse’s theorem gives upper and lower bounds for the number of points N on an elliptic curve over a field of p elements with p prime:

|N − (p + 1)| ≤ 2 √p.

The heuristic for this is that the right hand side of the equation defining an elliptic curve

x³ + ax + b

is a square mod p above half the time. When it is a square, it corresponds to two values of y and when it is not it corresponds to zero points. So on average an elliptic curve mod p has around p ordinary points and one point at infinity.

Related posts