Connecting partial sums

Today’s exponential sum, like all the exponential sums on the site, is formed by drawing a line between consecutive partial sums of a series involving complex exponentials. The exponential sum page makes an image each day by putting the day’s month, day, and year into a formula.

Here’s today’s sum.

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{8} + \frac{n^2}{19} + \frac{n^3}{25} \right ) \right )

I use American-style dates—month, day, year—because that increases the day-to-day variety of the images compared to using the day in the first denominator.

A lot of seed phrase words are similar

A couple days ago I wrote about how you might go about trying to recover a seed phrase that you had remembered out of order. I said that the list of seed phrase words had been designed to be distinct. Just out of curiosity I computed how similar the words are using Levenshtein distance, also known as edit distance, the number of single character edits it takes to turn one word into another.

A lot of the words—484 out of 2048—on the BIP39 list differ from one or more other words by only a single character, such as angle & ankle, or loud & cloud. The word wine is one character away from each of wing, wink, wire, and wise.

Other kinds of similarity

Edit distance may not the best metric to use because it measures differences in text representation. It’s more important for words to be conceptually or phonetically distinct than to be distinct in their spelling. For example, the pair donkey & monkey differ by one letter but are phonetically and conceptually distinct, as are the words liveolive.

Some pairs of words are very similar phonetically. For example, I wouldn’t want to have to distinguish or cannon & canyon over a phone call. The list is not good for phonetic distinction, unlike say the NATO alphabet.

Memorization

For ease of memorization, you want words that are vivid and concrete, preferably nouns. That would rule out pairs like either & neither.

The BIP39 list of words is standard. But other approaches, such as Major system encoding, are more optimized for memorability.

Design

It’s hard to make a long list of words distinct by any criteria, and 2048 is a lot of words. And the words on the list are intended to be familiar to everyone. Adding more vivid or distinct words would risk including words that not everyone would know. But still, it seems like it might have been possible to create a better word list.

Recovery

The earlier post discussed how to recover a seed phrase assuming that all the words are correct but in the wrong order. It would make sense to explore sequences in order of permutation distance, assuming that small changes to the order are more likely than large changes.

But if it’s possible that the words are not correct, you might try looking at words in edit distance order. For example, “You said one of the words was race. Could it have been rice?”

Related posts

Factoring Stencils

I recently ran across an article by Katherine Stange [1] on Lehmer’s factoring stencils [2]. These stencils were the basis of an effective method for factoring moderately large numbers before the advent of electronic computers.

This post will describe the stencils and simulate their use with a Python script.

Misconceptions

When I started reading [1] I had three misconceptions that slowed down my understanding of the article. I’ll address these first in case any of you come to this article with the same unhelpful expectations.

First, when I saw something about using stencils for factoring I thought they would be something like a Sieve of Eratosthenes, stencils with multiples of 2, 3, 5, etc. cut out. That is not what’s going on; Lehmer’s method is more sophisticated than that.

Second, related to the first, I thought that the factoring method would involve placing stencils on top of a large grid of numbers that included the number you wanted to factor. That is also not the case. The stencils can be used to factor nine-digit numbers, and a sheet of over a billion numbers would not be practical.

Third, I thought there might be some geometric significance to the position of the holes in the circular stencils. There isn’t. The stencils were circular for convenience.

How the stencils worked

Lehmer made a set of 295 stencils. Katherine Stange has put a set of SVG files for such stencils on her Github repo. The image below is the stencil corresponding to the number 42.

Stencil 42

I want focus on how the stencils work, not why they work.

To factor a number N, you would first find a handful of numbers X1, X2, X3, … Xk, related to N, numbers that are fairly easy to compute by hand. Then you would stack the stencils corresponding to the Xi and hold the stack up to the light.

If y is a quadratic residue mod N, then y is a quadratic residue mod each of the prime factors of N. Each stencil cuts the number of potential prime factors by about a half.

Once you found a probable prime factor, you would test it by long division, and then you know with certainty whether the number is a factor.

What is each stencil?

So what are these Xi and what are the holes in them?

The Xi are quadratic residues mod N, i.e. numbers such that

Xi = x² mod N

has a solution. This would have required some hand calculation, but not nearly as much calculation as trying to factor N unaided. A couple algorithms for finding quadratic residues are described in this video and in this PDF by Katherine Stange.

NB: The hard part isn’t finding quadratic residues mod N; you could do that by squaring random integers mod N. The hard part is finding quadratic residues that correspond to your set of available stencils.

The holes in the stencil correspond to the prime numbers up to 48593. Disk X has a hole for prime p if X is a quadratic residue mod X. If X is a quadratic residue mod p and mod N, there’s a 50-50 chance p divides N.

The number 48593 is the 4999th prime, so there are locations on each disk corresponding to 4999 primes. Lehmer arranged these locations in a spiral, but they could have been laid out in a grid or any other pattern.

Python simulation

See the Github repo referenced above for Python code to make the SVG files for the physical stencils. The Python code below simulates the operation of the stencils, not their geometry. I wrote the code to be illustrative, not to be practical efficient code for factoring integers.

First, here’s code to make our stencils. Here a 1 corresponds to a hole.

from sympy import primerange, legendre_symbol
from numpy import array

primes = array([p for p in primerange(3, 48594)])

def stencil(x):
    # stencil has a 1 in position k if x is a quadratic residue
    # modulo the kth odd prime and a 0 otherwise
    return array([1 if legendre_symbol(x, p) >= 0 else 0 for p in primes])

Next we pick an N and a few small quadratic residues mod N.

N = 19972009 
x = [2, 61, 79, 185, 238, 255, 313, 421, 491]

Multiplying the stencils together produces a 1 where every stencil has a hole.

mask = stencil(x[0])
for i in range(1, len(x)):
    mask *= stencil(x[i])
ps = mask*primes
print(ps[ps != 0])

This prints the following:

[ 97 103 1367 1999 15241 28351 35353 36847 44953]

We then test whether each of these primes is a factor of N, and in fact

19972009 = 97 × 103 × 1999.

Each stencil has 4999 primes, and if each of the 9 stencils eliminates about half the primes, we’d expect around 4999/29 ≈ 10 candidate primes, which is very close to what we got.

Related posts

[1] The Ingenious Physical Factoring Devices of D. N. Lehmer. Math Horizons. Volume 30 Issue 2. November 2022.

[2] D. N. Lehmer and his son D. H. Lehmer were both number theorists. I’ve referred to the later several times on the blog, most recently here. I also cited Katherine Stange a couple weeks ago.

Recovering a permuted seed phrase

As mentioned in the previous post, crypto wallets often turn a list of words, known as a seed phrase, into private keys. These words come from a list of 211 = 2048 words chosen to be distinct and memorable. You can find the list here. Typically a seed phrase will contain 12 words. For example:

prize, differ, year, cloth, require, wild, clean, kit, cart, knock, biology, above

Each word carries 11 bits of information, so in total this represents 132 bits, which is 128 bits of content and 4 bits of checksum.

Now suppose you memorized your list of 12 words, but then later you don’t remember them in the correct order. Then you have about half a billion permutations to try because

12! = 479,001,600.

However, not all of these permutations are equally likely to be correct. You are far more likely to transpose a couple words, for example, than to completely scramble the list. So you would start with the list of words in the remembered order, then explore further and further afield from initial sequence. In mathematical terms, you’ll try the permutations of your remembered sequence in order of Cayley distance.

There are 66 permutations of a list of 12 things that are a transposition of two elements. So if the sequence you started with was

abcdefghijkl

then you’d try things like bacdefghijkl or  ebcdafhghijkl. There are 66 possibilities because there are 66 ways to chose 2 items from a set of 12; you’re choosing the two items to swap.

If the above didn’t work, you could next explore permutations that are at a Cayley distance of 2 from what you remember. Now there are 1,925 possibilities. At a Cayley distance of 3 there are 32,670 possibilities.

In general, the number of arrangements at a Cayley distance k from the original sequence of n items equals

|S1(n, nk)|

where S1 denotes the Stirling number of the first kind.

There are other approaches to measuring how far apart two permutations are. Another possibility is Kendall distance which counts the number of adjacent transpositions needed to turn one sequence into another. Kendall distance is sometimes called bubble sort distance by analogy with the bubble sort algorithm. Kendall distance may be more appropriate than Cayley distance because people are more likely to accidentally transpose adjacent elements of a sequence.

Neither Cayley distance nor Kendall distance exactly correspond to the corruption of human memories, but both would guide you in correcting likely changes before exploring unlikely changes.

Related posts

What’s in your wallet?

What’s in your Bitcoin wallet? Very little. I don’t mean very little value, but very little data. If you’re a Bitcoin billiionaire, your wallet still doesn’t contain very many bits.

You might reasonably expect that a crypto wallet is a container, given the analogy with an ordinary wallet, but it’s not much of a container. It’s more like a bank teller than a physical wallet. It’s primarily a piece of software that facilitates transactions.

It’s misleading to speak of a wallet as holding cryptocurrency. It holds private keys that allow you to access cryptocurrency, so in that sense it’s a password manager. Not only that, these days it’s effectively a password manager containing only one password.

When you back up a wallet, say the Exodus wallet, it asks you to store a seed phrase somewhere safe. Fine, but how do you back up your wallet, not just your seed phrase? How do you back up the data in your wallet? You don’t, because there isn’t any data in your wallet other than the seed phrase [1]. Everything is derived from your seed phrase, at least for the current style of wallet, technically known as a hierarchical deterministic (HD) wallet. The derivation algorithm is documented in BIP39.

The first crypto wallets held a set of private keys. But now wallets derive private keys from your seed phrase. So even though you may have used multiple private keys, the wallet doesn’t store multiple keys. It could store them, but it doesn’t need to. That’s why you can back up your wallet by only backing up your seed phrase.

Related posts

[1] A wallet may contain metadata, such as notes on transactions or software configuration preferences. This data isn’t recovered if you restore you wallet from just your seed phrase. But all your private keys can be regenerated. Think of the private keys as traditional metal house keys and the metadata as a sticky note on top of the keys. If you lose your keys and the note, you’d be happy to get the keys back.

Randomly generated dragon

Here’s a simple way to generate a fractal known as the Twin Dragon. Start with random values of x and y and repeatedly update the according to the rule

xnew = (− xold + yold)/2 − b
ynew = (− xoldyold)/2

where b is randomly chosen to be 0 or 1 with equal probability. The plot of these points fills in the Twin Dragon.

Here’s what the plot looks like after 10,000 iterations.

And after 100,000 iterations.

Here’s the Python script I used to make the plots.

import matplotlib.pyplot as plt
from numpy.random import random, choice

x, y = random(), random()
for _ in range(100000):
    x, y = (-x + y)/2, (-x - y)/2
    x -= choice([0, 1])
    plt.plot(x, y, 'bo', markersize=1)
plt.show()

The algorithm used here is a particularly special case of a more general algorithm for generating similar fractals found in [1].

Here’s a similar post from several years ago:
The chaos game and the Sierpinski triangle

[1] Darst, Palagallo, and Price. Fractal Tilings in the Plane. Mathematics Magazine [71]:1, 1998.

Converting very long strings to integers in Python

In the process of writing the previous post, I wanted to confirm that the number in the post

really is prime. This was useful in debugging my manual conversion of the image to text: errors did not result in a prime number. For example, I didn’t see the 9’s in the image at first, and I didn’t get a prime number until I changed four of the 8’s to 9’s.

But there was a problem along the way. Simply converting the string to an integer didn’t work. It produced the following error:

SyntaxError: Exceeds the limit (4300) for integer string conversion: value has 5382 digits; use sys.set_int_max_str_digits() to increase the limit – Consider hexadecimal for huge integer literals to avoid decimal conversion limits.

Note that the limitation is not on the size of a Python integer. The only limitation on the size of an integer is available memory. But there is a limitation on the size of string that can be converted to an integer.

The fix suggested in the error message didn’t work. But storing the number as several strings, i.e. each row of the image, and doing my own radix conversion did work.

from sympy import isprime

flaglines = [
    "888888888888888888888888888888888888888888888888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118888811188988111888881118888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118898811188888111888881118888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111889881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111888881118898811188888111888881118888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888888888111888881118888811188888111888881118888888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888881111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "888881118888811188888111888881118888811188888111888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888888888888888888888888888888888888888888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "888888888888888888888888888888888888888888888888888883333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
    "333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
]

m = int(flaglines[0])
for i in range(1, len(flaglines)):
    line = flaglines[i]
    x = int(line)
    m = m * 10**len(line) + x
print(isprime(n))

American Flag Prime

The following prime number looks like a black-and-white image of an American flag.

The number mostly consists of the digits 1, 3, and 8, but there are a few 9’s. The following image colors the 8’s blue, the 3’s red, and the 1’s white. The background is gray so you can see the 1s.

I found this in [1]. The article includes a link to a text version of the number, but the link is broken, so I had to convert the image to text.

Unfortunately tesseract did a poor job, and so did Grok, so I ended up more or less doing the conversion by hand. I made a text version and posted it here.

See the next post for a calculation showing that the number is indeed prime.

[1] The United States of America Prime. Vadim Ponomarenko. Math Horizons, Vol. 28, No. 4 (April 2021)

Uppercase Eszett

I’ve typed Karl Weierstrass’ name quite a few times lately and thought about how you’ll sometimes see his name written as Weierstraß in English text. That led me to look up the rules for when to use ß and when to use ss. The rules are moderately complicated, and have varied over time and location. Here I just want to look at the capitalization rules.

Typically ß is replaced with SS when converting from lower case to upper case. This means that the length of a string can change when changing case. Surely this has caused numerous software bugs.

>>> w = "Weierstraß"
>>> len(w), len(w.upper())
(10, 11)

There was no uppercase counterpart of ß (U+00DF) until ẞ (U+1E9E) was introduced in 2008. I wondered whether the code above would run differently if I set my locale to de_DE (Germany). Would w.upper() return WEIERSTRASS or WEIERSTRAẞ?

It turns out that Python follows Unicode’s case mapping rules, and these rules say ß becomes SS when changing to uppercase. The code will run the same everywhere, independent of locale. So if you want ß to convert to uppercase as ẞ you’ll have to use customized code.

ASCII was designed so that uppercase and lowercase English letters differed by 32 (i.e. 0x20 in hex). This convention was carried over into Unicode for other alphabets, with a few exceptions, and it almost holds for German as the following code shows.

upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜẞ"
lower = "abcdefghijklmnopqrstuvwxyzäöüß"
for u, l in zip(upper, lower):
    if ord(l) - ord(u) != 32:
        print("Exception:", u, l)

This prints

Exception: ẞ ß

The code points for Ä and ä, Ö and ö, and Ü and ü were spaced 32 points apart in extensions of ASCII that predate Unicode and the spacing carried over into Unicode. But the uppercase ẞ could not have Unicode value U+00BF because that code point was already occupied by the inverted question mark ¿.

Weierstrass, Montgomery, and Edwards elliptic curve forms

All elliptic curves can be written in Weierstrass form

y² = x³ + ax + b

with a few exceptions [1].

Montgomery elliptic curves have the form

B y² = x³ + A x² + x

and twisted Edwards curves have the form

a x² + y² = 1 + d x² y²

Every Montgomery curve has a twisted Edwards form, and vice versa, but not all curves have a Montgomery or twisted Edwards form.

Converting coefficients

Here is Python code for converting coefficients between the various forms. See [2].

def Montgomery_to_Twisted_Edwards(B, A, p):
    # By^2 = x^3 + Ax^2 + x
    a = ((A + 2)*pow(B, -1, p)) % p
    d = ((A - 2)*pow(B, -1, p)) % p          
    return a, d

def Twisted_Edwards_to_Montgomery(a, d, p):
    # ax^2 + y^2 = 1 + d x^2 y^2
    x = pow(a - d, -1, p)
    B = (4*x) % p
    A = (2*(a + d)*x) % p
    return B, A

def Montgomery_to_Weierstrass(B, A, p):
    # By^2 = x^3 + Ax^2 + x
    a = (B**2 * (1 - A**2*pow(3, -1, p))) % p
    b = (B**3 * (2*A**3 - 9*A) * pow(27, -1, p)) % p
    return a, b

Tiny Jubjub curve

The code below confirms that the forms of the Tiny Jubjub curve (TJJ) given in the previous post are consistent.

# Twisted Edwards definition
a, d, p = 3, 8, 13

B, A = Twisted_Edwards_to_Montgomery(a, d, p)
assert(B == 7)
assert(A == 6)

wa, b = Montgomery_to_Weierstrass(B, A)
assert(wa == 8)
assert(b == 8)

Baby Jubjub

I was looking up the Baby Jubjub curve and found incompatible definitions. All agreed that the field is integers modulo the prime

p = 21888242871839275222246405745257275088548364400416034343698204186575808495617

and that the Montgomery form of the curve is

y² = x³ + 168698 x² + x

But the sources differed in the twisted Edwards form of the curve. The code above shows that a correct twisted Edwards form is the one given in [2]:

168700 x² + y² = 1 + 168696 x² y²

Now it is possible to find multiple Edwards forms for a given Montgomery form, so an alternate form is not necessarily wrong. But when I converted both to Weierstrass form and computed the j-invariant, the results were different, and so the curves were not equivalent.

Related posts

[1] All elliptic curves can be written in Weierstrass form as long as the underlying field does not have characteristic 2 or 3. Cryptography is primarily interest in fields whose characteristic is a gargantuan prime, not 2 or 3.

[2] Marta Bellés-Muñoz, Barry Whitehat, Jordi Baylina, Vanesa Daza, and Jose Luis Muñoz-Tapia. Twisted edwards elliptic curves for zero-knowledge circuits. Mathematics, 9(23), 2021.