Physical Keys and Encryption Keys

A physical key, such as a house key, is a piece of metal with cuts of differing depths. Typically there may be around 6 cuts, with five different possible depths for each cut. This allows 56 = 15,625 possible keys.

Encryption keys, such as AES keys, are a string of bits, often 128 bits, for a total of 2128 possible keys.

How long would a physical key have to be to have the same level of security as an encryption key? We’d need to solve

5n = 2128

which means

n = 128 / log25 = 55.12.

So we’d need a key with around 55 notches.

metal key with 55 notches

This only takes into account combinatorial possibilities, not the difficulty of attacking a physical key or a binary key. There are incomparably more possibilities for binary keys, but encryption attacks can be automated and carried out remotely (unless a computer is air gapped). A physical lock can only be attacked in person. It takes a lock picker orders of magnitude more time to try a key than a password cracking program. On the other hand, locks aren’t picked by trying thousands of keys.

Related post: Measuring cryptographic strength in liters of boiling water

Why and how Bitcoin uses Merkle trees

Yesterday’s post looked at a recently mined Bitcoin block, the 920,994th block in the blockchain, and verified that it contains the hash of the previous block. This post will look at the same block and verify its Merkle tree root.

Before getting down to the bytes, we’ll back up and say what a Merkle tree is and why Bitcoin uses one.

What is a Merkle tree?

A Merkle tree is a binary tree. The leaves of the tree are labeled with a hash of the data associated with each leaf. The label of each interior node is the hash of the concatenation of the two child hashes. The label of the root of the tree is the Merkle tree root.

The smallest change to any data in the tree will change the Merkle tree root [1]. If you weren’t concerned with efficiency, you wouldn’t need a Merkle tree. You could simply concatenate all the data in the leaves and compute a hash. But the Merkle tree makes it possible to verify a node without having to verify the whole tree.

To see this, imagine starting at the bottom of the tree and working your way up to compute the Merkle root. You need the hash value of the node you’re interested in and its sibling. Then you can compute the label of the parent node. If you know the label of the parent’s sibling node (you could call it the aunt or uncle) then you can compute the label of the grandparent node, and so forth until you get to the root. If you get the expected value for the root, the data is correct.

How Bitcoin uses Merkle trees

Bitcoin makes a Merkle tree with the data for each transaction being a leaf node. Simple Payment Verification (SPV) lets you verify a transaction knowing only the hash of the transaction, the hash of its sibling, and the hashes of the siblings moving up the tree: the aunts, great aunts, great great aunts etc.

If a block has 2n transactions, you only need n hash values to verify a single transaction. The binary tree has n levels, and you need two hashes at the bottom, and one hash for each of the levels between the bottom and the top. So in a block of 1024 transactions, you need 10 hash values. Note that this saves space two ways: you don’t need transaction data per se, only hashes of transaction data, and the number of hashes you need is log2 of the number of transactions.

What if the number of transactions is not a power of 2? If there is an odd number of nodes at a given level of the Merkle tree, the last node is paired with itself.

For example, the block that we’ll examine in detail contains 1279 transactions. At the bottom of our Merkle tree there are effectively 1280 nodes, with the 1280th node being a duplicate of the 1279th. Moving up the tree we have 640 nodes, 320 nodes, 160 nodes, … 10 nodes, and 5 nodes. The 5th node is paired with itself, and no the next level up has 3 nodes. The third node is paired with itself, making 2 nodes on the next level, and then the root above that.

Down to the bytes

The command

xxd -l 80 920994.dat

gives a hex dump of the block header:

0000 002e c2df b57e f582 75c2 7a64 33c5
edfd dfc1 af6a b9ae 708c 0000 0000 0000
0000 0000 bb15 accc 8940 3811 0b9b e1cf
b125 c0fa a65e fb1b 2fa4 61ed 989f 8d6f
e41e 04cf 5522 ff68 21eb 0117 f2bc ab1a

This time we’re looking at bytes 37 through 68, highlighted in orange. This is the Merkle tree root.

A miner verifying the block would recompute the Merkle tree root and confirm that it matches the value given in the header.

A Bitcoin block does not contain a Merkle tree per se, only its root. The hash of the data for each transaction, what Bitcoin calls a txid, is redundant since it can be computed. And if it were included in the block, a miner would need to recompute it to verify that it is correct.

We can fetch a list of txids for a block by calling

https://blockstream.info/api/block/{block_hash}/txids

with the hash of the block we want, which in our case is

00000000000000000001c2f89cee9f8c2fe88b4a93c2c2b75192918fc438ca32

This returns a JSON string with 1279 hash values:

fc1d784c62600565d4b4fbfe9cd45a7abc5f1c3273e294fd2b44450c5c9b9e9a
d4bce41744156a8f0e6588e42efef3519904a19169212c885930e908af1457ec
7bc6428c7d023108910a458c99359a2664de7c02343d96b133986825297e518d
…
c9132f178830d0c7a781e246bb5c2b3f4a9686c3d2b6f046b892a073d340eaff

Note that the API call did not pull the txids from the block; it computed them. Presumably the server computed the list of txids once and cached the results rather than computing them when we called the API.

We can compute the Merkle tree root by concatenating pairs of txids and hashing the results. Since the number of transactions is odd, the last txid is paired with itself. So moving one level up the tree, the first node is labeled with the hash of

fc1…e9ad4b…7ec

and the last node is labeled with the hash of

c91…affc91…aff

We continue this process until we compute the Merkle tree root. Then we can confirm that it matches the value given in the header. As with the block hash in the previous post, the Merkle tree root is stored in the header in little endian form.

Related posts

[1] There is a vanishingly small chance that a change in the data would not change the root. With a 256-bit hash, the chances of a change in the data not changing the root are 1 in 2256.

Random samples from a tetrahedron

Exactly one month ago I wrote about sampling points from a triangle. This post will look at the three dimensional analog, sampling from a tetrahedron. The generalization to higher dimensions works as well.

Sampling from a triangle

In the triangle post, I showed that a naive approach doesn’t work but a variation does. If the vertices of the triangle are AB, and C, then the naive approach is to generate three uniform random numbers, normalize them, and use them to form a linear combination of the vertices. That is, generate

r1r2r3

uniformly from [0, 1], then define

r1′ = r1/(r1r2r3)
r2′ = r2/(r1r2r3)
r3′ = r3/(r1r2r3)

and return as a random sample the point

r1′ A + r2′ B + r3C.

This oversamples points near the middle of the triangle and undersamples points near the vertices. But everything above works if you replace uniform samples with exponential samples.

Sampling from a tetrahedron

The analogous method works for a tetrahedron with vertices ABC, and D. Generate exponential random variables ri for i = 1, 2, 3, 4. Then normalize, defining each ri′ to be ri divided by the sum of all the ri s. Then the random sample is

r1′ A + r2′ B + r3C + r4D.

In the case of the triangle, it was easy to visualize that uniformly generated rs did not lead to uniform samples of the triangle, but that exponentially generated rs did. This is harder to do in three dimensions. Even if the points were uniformly sampled from a tetrahedron, they might not look uniformly distributed from a given perspective.

So how might we demonstrate that our method works? One approach would be to take a cube inside a tetrahedron and show that the proportion of samples that land inside the cube is what we’d expect, namely the ratio of the volume of the cube to the volume of the tetrahedron.

This brings up a couple interesting subproblems.

  1. How do we compute the volume of a tetrahedron?
  2. How can we test whether our cube lies entirely within the tetrahedron?

Volume of a tetrahedron

To find the volume of a tetrahedron, form a matrix M whose columns are the vectors from three of the vertices to the remaining vertex. So we could take AD, BD, and CD. The volume of the tetrahedron is the determinant of M divided by 6.

Testing whether a cube is inside a tetrahedron

A tetrahedron is convex, and so the line segment joining any two points inside the tetrahedron lies entirely inside the tetrahedron. It follows that if all the vertices of a cube are inside the a tetrahedron, the entire cube is inside.

So this brings us to the problem of testing whether a point is inside a tetrahedron. We can do this by converting the coordinates of the point to barycentric coordinates, then testing whether all the coordinates are non-negative and add to 1.

Random sampling illustration

I made up four points as vertices of a tetrahedron.

import numpy as np

A = [0, 0, 0]
B = [5, 0, 0]
C = [0, 7, 0]
D = [0, 1, 6]
A, B, C, D = map(np.array, [A, B, C, D])

For my test cube I chose the cube whose vertex coordinates are all combinations of 1 and 2. So the vertex nearest the origin is (1, 1, 1) and the vertex furthest from the origin is (2, 2, 2). You can verify that all the vertices are inside the tetrahedron, and so the cube is inside the tetrahedron.

The tetrahedron has volume 35, and the cube has volume 1, so we expect 1/35th of the random points to land inside the cube. Here’s the code to sample the tetrahedron and calculate the proportion of points that land in the cube.

from scipy.stats import expon

def incube(v):
    return 1 <= v[0] <= 2 and 1 <= v[1] <= 2 and 1 <= v[2] <= 2

def good_sample(A, B, C, D):
    r = expon.rvs(size=4)
    r /= sum(r)
    return r[0]*A + r[1]*B + r[2]*C + r[3]*D

def bad_sample(A, B, C, D):
    r = np.random.random(size=4)
    r /= sum(r)
    return r[0]*A + r[1]*B + r[2]*C + r[3]*D

N = 1_000_000

badcount  = 0
goodcount = 0

for _ in range(N):
    v = bad_sample(A, B, C, D)
    if incube(v):
        badcount += 1        
    v = good_sample(A, B, C, D)
    if incube(v):
        goodcount += 1

print(badcount/N, goodcount/N, 1/35)

This produced

0.094388 0.028372 0.028571…

The good (exponential) method match the expected proportion to three decimal places but the bad (uniform) method put about 3.3 times as many points in the cube as expected.

Note that three decimal agreement is about what we’d expect via the central limit theorem since 1/√N = 0.001.

If you’re porting the sampler above to an environment where you don’t a function for generating exponential random samples, you can roll your own by returning −log(u) where u is a uniform sample from [0. 1].

More general 3D regions

After writing the post about sampling from triangles, I wrote a followup post about sampling from polygons. In a nutshell, divide your polygon into triangles, then randomly select a triangle in proportion to its area, then sample with that triangle. You can do the analogous process in three dimensions by dividing a general volume into tetrahedra.

Fitting a double exponential function to three points

The previous post needed to fit a double exponential function to data, a function of the form

a \exp(b \exp(cx))

By taking logs, we have a function of the form

f(x) = a + b \exp(cx)

where the a in the latter equation is the log of the a in the former.

In the previous post I used a curve fitting function from SciPy to find the parameters ab, and c. The solution in the post works, but I didn’t show all the false starts along the way. I ran into problems with overflow and with the solution method not converging before I came up with the right formulation of the problem and a good enough initial guess at the solution.

This post will look at fitting the function more analytically. I’ll still need to solve an equation numerically, but an equation in only one variable, not three.

If you need to do a regression, fitting a function with noisy data to more than three points, you could use the code below with three chosen data points to find a good starting point for a curve-fitting method.

Solution

Suppose we have x1 < x2 < x3 and we want to find a, b, and c such that

\begin{align*} y_1 &= a + b \exp(c x_1) \\ y_2 &= a + b \exp(c x_2) \\ y_3 &= a + b \exp(c x_3) \\ \end{align*}

where y1 < y2 < y3.

Subtract the equations pairwise to eliminate a:

\begin{align*} y_2 - y_1 &= b (\exp(c x_2) - \exp(c x_1)) \\ y_3 - y_1 &= b (\exp(c x_3) - \exp(c x_1)) \\ \end{align*}

Divide these to eliminate b:

\frac{y_2 - y_1}{y_3 - y_1} = \frac{\exp(c x_2) - \exp(c x_1)}{\exp(c x_3) - \exp(c x_1)}

Factor out exp(cx1) from the right side:

\frac{y_2 - y_1}{y_3 - y_1} = \frac{1 - \exp(c (x_2 - x_1))}{1 - \exp(c (x_3 - x_1))}

Now the task is to solve

L = \frac{1 - \exp(c d_2)}{1 - \exp(c d_3)}

where

\begin{align*} L &= \frac{y_2 - y_1}{y_3 - y_1} \\ d_2 &= x_2 - x_1 \\ d_3 &= x_3 - x_1 \end{align*}

Note that L, d2, and d3 are all positive.

Once we find c numerically,

b = \frac{y_2 - y_1}{\exp(c x_2) - \exp(c x_1)}

and

a = y_1 - b \exp(c x_1)

Python code

Here’s a Python implementation to illustrate and test the derivation above.

from scipy import exp
from scipy.optimize import newton

def fit(x1, x2, x3, y1, y2, y3):
    assert(x1 < x2 < x3)
    assert(y1 < y2 < y3)

    d2 = x2 - x1
    d3 = x3 - x1
    L = (y2 - y1)/(y3 - y1)
    g = lambda x: L - (1 - exp(x*d2))/(1 - exp(x*d3))
    c = newton(g, 0.315)

    b = (y2 - y1)/(exp(c*x2) - exp(c*x1))
    a = y1 - b*exp(c*x1)
    return a, b, c

a, b, c = fit(9, 25, 28, 42, 55, 92)
print([a + b*exp(c*x) for x in [9, 25, 28]])

Extrapolating quantum factoring

In 2001, a quantum computer was able to factor 15.

In 2012, a quantum computer was able to factor 21, sorta. They cheated by not implementing gates that they knew a priori would not be used. To this day, there still hasn’t been a quantum computer to factor 21 without taking some shortcuts. Some reasons why are given here.

Extrapolating from two data points is kinda ridiculous, but we only have two data points, and we’re going to do it anyway.

xkcd 605: Extrapolating

Some may object that extrapolating the size of the numbers that can be factored isn’t the right approach. Instead we should be extrapolating the number of qubits or something else. Fine, but that’s not what we’re doing in this post.

Linear interpolation

Using linear interpolation, a quantum computer wouldn’t be able to factor a cryptographically relevant prime number before the heat death of the universe.

Exponential interpolation

Let’s switch to exponential extrapolation. Instead of assuming the size of the numbers grows linearly, we’ll assume the number of bits in the numbers grows linearly, meaning the numbers grow exponentially.

In about 10 years, the size of number that could be factored increased by

21/15 = 1.4 ≈ √2.

This means the size of the numbers increased by about half a bit in a decade. By now we should be able to factor 35.

RSA keys have at least 1024 bits, so at half a bit per decade, quantum computers should be able to factor RSA keys 20,000 years from now.

Double exponential interpolation

Now let’s assume the number of bits in a number that a quantum computer can factor is itself growing exponentially, meaning that the numbers are growing doubly exponentially, doubling every decade. Then we should be able to factor 9-bit numbers by now, and in 100 years we will be able to factor RSA keys.

Breaking RSA in 10 years

The estimate above is kinda arbitrary because a double exponential function has three degrees of freedom and so we’d need three data points, which we don’t have.

We can fit a double exponential by making up a third data point. Some researchers speculate that a quantum computer will be able to factor 1024-bit RSA keys 10 years from now. If so, that gives us a third data point.

To make things easier, let’s work in terms of bits and set x to be the number of years since 2000. Then we have

f(x) = ab 2cx

with f(1) = 15, f(12) = 21, and f(35) = 1024. We can fit this function using Python [1].

from scipy.optimize import curve_fit

def f(x, a, b, c):
    return a +  b * 2**(c*x)

x_data = [1.0, 12.0, 35.0]
y_data = [log2(15), log2(21), 1024]  
initial_guess =  [4, 0.01, 0.3]

popt, pcov = curve_fit(f, x_data, y_data, p0=initial_guess, maxfev=10000)
a, b, c = popt
print(f"Fitted parameters: {a=}, {b=}, {c=}")

The parameter values are a = 3.893887005243357, b = 0.0093347992761344, c = 0.4782191085655488.

Here’s a plot of the fitted function f.

If we’re going to be able to factor 1024-bit integers by 2035, according to a double exponential model, we’re already behind schedule. We should be factoring 40-bit numbers by now, but so far we’ve only factored a 4-bit number.

Or to put it another way, those who believe we will be able to factor 1024-bit integers by 2035 are implicitly assuming a future rate of growth faster than double exponential. And maybe they’re right.

If we’re going to break 1024-bit RSA keys by 2035, at some point we’ll have to do better than the curve above, and so far we’re doing worse.

Bookmark this post and come back when a new quantum factoring record is set and rerun the code with new data.

Related posts

[1] The curve_fit function needed a lot of help to work. See the next post for a better way to fit the function.

Post-quantum RSA with gargantuan keys

If and when practical scalable quantum computers become available, RSA encryption would be broken, at least for key sizes currently in use. A quantum computer could use Shor’s algorithm factor n-bit numbers in time on the order of n². The phrase “quantum leap” is misused and overused, but this would legitimately be a quantum leap.

That said, Shor’s method isn’t instantaneous, even on a hypothetical machine that does not yet exist and may never exist. Daniel Bernstein estimates that RSA encryption with terabyte public keys would be secure even in a post-quantum world.

Bernstein said on a recent podcast that he isn’t seriously suggesting using RSA with terabyte keys. Computing the necessary key size is an indirect way of illustrating how impractical post-quantum RSA would be.

Related posts

An inverse problem for the quadratic equation

In elementary algebra we learn how to solve ax2 + bx + c = 0 for roots r = r1, r2. But how about the inverse problem: given a real-valued r, find integer coefficients a, b and c such that r is a root of the corresponding quadratic equation.

A simple approximation can be found by setting b=0, so r2 = – c/a. By appropriately choosing large values of a and c, one can approximate the solution to this inverse problem to arbitrary accuracy.

Are there any other solutions (possibly taking advantage of b for a better approximation)?

Two families of algorithms are available: PSLQ [1-3] and LLL [4-5]. The underlying problem formulation behind these algorithms can be understood geometrically. For fixed x, L(a, b, c) = ax2 + bx + c is a linear function of (a, b, c), here considering a, b and c as real-valued. Thus, for such x, the equation L(a, b, c) = 0 defines a known plane in 3-D space. However, we seek a, b, c that are integers. The problem then is to find a lattice point (a, b, c) with a, b, c integers that is “near” to this plane (for example, close to the plane within tolerance τ, which means a 2-D thin “slab” of thickness 2τ surrounding this plane hits a lattice point). We would also prefer if the magnitude of a, b and c is “small” (e.g., bounded by a constant H).

Given this formulation, the PSLQ algorithm attempts to find an exact solution, and gives up if it can’t find one. The LLL algorithm on the other hand finds approximate solutions within a tolerance.

Python has the mpmath package, which contains mpmath.pslq for the PSLQ algorithm. The Python SciPy package has Matrix.LLL() for the LLL reduction.

A colleague of mine was previously submitting overnight runs to compute solutions using  brute force search. More recently, using a Python code for the LLL algorithm, he is able to solve a large problem in seconds, with answers that exactly reproduce the brute force solution.

References

[1] Ferguson, Helaman R. P.; Bailey, David H. (1992). A Polynomial Time, Numerically Stable Integer Relation Algorithm. RNR Technical Report RNR-91-032, NASA Ames Research Center. URL: https://www.davidhbailey.com/dhbpapers/pslq.pdf

[2] Ferguson, Helaman R. P.; Bailey, David H.; Arno, Steve (1999). Analysis of PSLQ, an integer relation finding algorithm. Mathematics of Computation 68(225): 351–369. URL (AMS PDF): https://www.ams.org/mcom/1999-68-225/S0025-5718-99-00995-3/S0025-5718-99-00995-3.pdf

[3] Bailey, David H. (2020). PSLQ: An Algorithm to Discover Integer Relations. (Expository overview.) URL: https://www.davidhbailey.com/dhbpapers/pslq-comp-alg.pdf

[4] Lenstra, Arjen K.; Lenstra, Hendrik W., Jr.; Lovász, László (1982). Factoring Polynomials with Rational Coefficients. Mathematische Annalen 261: 515–534. DOI: https://doi.org/10.1007/BF01457454 (Springer page: https://link.springer.com/article/10.1007/BF01457454 (cf. https://www.math.ucdavis.edu/~deloera/MISC/LA-BIBLIO/trunk/Lovasz/LovaszLenstraLenstrafactor.pdf)

[5] Nguyen, Phong Q.; Vallée, Brigitte (eds.) (2010). The LLL Algorithm: Survey and Applications. Springer. URL: https://link.springer.com/book/10.1007/978-3-642-02295-1

Learning languages with the help of algorithms

Suppose you’re learning a new language and want to boost your vocabulary in a very time-efficient way.

People have many ways to learn a language, different for each person. Suppose you wanted to improve your vocabulary by reading books in that language. To get the most impact, you’d like to pick books that cover as many common words in the language as possible.

Here is a formalization. Suppose for a large set of m books of average length n words, you want to pick the one book that has the highest vocabulary impact from the set of books. This vocabulary impact of a book is measured by a weighted sum across all vocabulary words in the book, each word weighted by how common the word is in the language, as measured by word frequency across all books; this essentially gives a probability weight of each word in the language.

That’s an easy problem to solve. First, optionally filter out stop words like (in the case of English) “the“ and “and”  considered in some sense to have “not much meaning.“ Second, across all books build a unique word list along with counts of number of occurrences of each word. Finally, for each book evaluate the coverage of those words, computing a score as described above.

Computing the unique word list and scores can be done by a hashing process that runs in linear order mn time typically, worst case mn log(mn). To compute the score also costs average linear time. So the entire process can be done in linear time.

What if you want to find the best two books to read, with the best joint vocabulary coverage? To find the optimal solution, the best known time is not linear but is quadratic for the general case.

How about the best k books for arbitrary k > 0?

This is an NP-hard problem, meaning that the compute time to solve this problem exactly for the best known algorithms for the general case grows exponentially in k as the size k of your desired set of reading books increases.

So you cannot expect to solve this problem exactly for large k. All hope is not lost, however. This is a maximal weighted cover problem, residing in a subset of the NP hard problems known as submodular problems.  Because of this, approximate algorithms are known that guarantee approximation accuracy within a certain known factor of the true best solution (see [1-5]).

These algorithms are described in [6], [7], [8]. The basic idea of the algorithms is to add a single high-impact book at a time to the running set of high-impact books—a greedy algorithm. It is not guaranteed to be the best book list, but it is reasonable.

The helpful Python submodlib package runs very fast on this problem.

One can improve the quality of the result by spending more on computation.  First, you could use a blocking strategy to compute precisely the best two books to add at each step, or three books, and so forth (computational complexity is quadratic, cubic and so forth). Similarly one could use a look-ahead strategy: add two books to the list that are together the best by an exact computation, then leave one off and find the best second and third books, and so forth. These in general do not improve on the submodularity bound, however in practice the result is more accurate.

One can also use various heuristics to improve performance in some cases. For example, if a book has little or no vocabulary that is not present in the other books, it can sometimes be safely discarded. However, in general the exact case remains hard (unless P=NP).

References

[1] Abhimanyu Das and David Kempe. Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection. Proceedings of ICML, 2011.

[2] Abhimanyu Das and David Kempe. Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection.
Journal of Machine Learning Research (JMLR), 19(3):1–35, 2018.

[3] M. Conforti and G. Cornuéjols. Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds. Discrete Applied Mathematics, 7(3):251–274, 1984.

[4] Maxim Sviridenko, Jan Vondrák, and Justin Ward. Optimal approximation for submodular and supermodular optimization with bounded curvature. Proceedings of SODA, 2013.

[5] Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions. 2013.
https://arxiv.org/abs/1311.2110

[6] Nemhauser, George L., Laurence A. Wolsey, and Marshall L. Fisher. “An analysis of approximations for maximizing submodular set functions—I.” Mathematical programming 14, no. 1 (1978): 265-294.

[7] “Maximum coverage problem,” https://en.wikipedia.org/wiki/Maximum_coverage_problem.

[8] Wei, Kai, Rishabh Iyer, and Jeff Bilmes. “Fast multi-stage submodular maximization.” In International conference on machine learning, pp. 1494-1502. PMLR, 2014.

 

Monero’s seed phrase words

I wrote a couple posts last month about the seed phrase words used by Bitcoin and other cryptocurrencies. There are 2048 words on the BIP39 list. Monero uses a different word list, one with 1626 words [1]. You can find Monero’s list here.

Why 1626 words?

It’s not hard to guess why the BIP 39 list has 2048 words: each one encodes 11 bits of a key because 211 = 2048. It’s not as obvious where the number 1626 comes from. It is the smallest value of n such that

n24 > 2256

Monero uses a seed phrase of 25 words, but the last word is a checksum, so there are 24 words which are used to create a 256-bit private key.

Distinctiveness

I criticized the BIP 39 list for being less than ideal for memorization because some words are similar phonetically, such as angle and ankle. Other words are harder to remember because they are not vivid nouns or verbs, such as  either or neither.

The Monero list has 200 words that differ by one character, compared to 484 for BIP 39.

But as with the BIP 39 list, some of Monero’s words are similar and not easy to visualize, such as adapt, adept, and adopt.

Prefix uniqueness

One nice feature of the BIP 39 list is that the first four letters of each word are distinct. So if you’re typing the words, autocomplete can fill in the rest of the word by the time you’ve entered four letters.

Monero goes one step further: all words are uniquely determined by the first three letters.

Overlap with other lists

Only about 1/4 of the words on Monero’s list. And most of Monero’s words are not in Google’s list of 10,000 most common words.

venn diagram M: 1626, B: 2048, M∩B: 425, G: 10000, M∩G: 701, B∩G: 1666, M∩B∩G: 339

Related posts

[1] Monero has other ways to convert words to keys. The way described here is now known as the Legacy mnemonics. The new Polyseed algorithm uses the BIP 39 word list.

How quantum computing would affect Bitcoin

Bitcoin relies on two kinds of cryptography: digital signatures and hash functions. Quantum computing would be devastating to the former, but not the latter.

To be more specific, the kind of digital signatures used in Bitcoin could in theory be broken by quantum computer using Shor’s algorithm. Digital signatures could use quantum-resistant algorithms [1], but these algorithms are not used in Bitcoin (or much of anything else) at this time.

A quantum computer could use Grover’s algorithm to reduce the time required to reverse hash functions, but this is a much smaller threat. For this post, we will simply assume a quantum computer could break digital signatures but not hash functions.

Digital signatures rely on public key cryptography, but Bitcoin uses something you might call not-so-public key cryptography. Public keys are not necessarily made public. You can receive funds sent to a hash of your public key, but to spend funds you have to reveal your unhashed public key.

This asymmetry means that presumably you could receive funds and be safe from quantum computers as long as you never spend the funds. But once you spend the funds, your public key really is public, published on the blockchain, and a quantum computer could derive your private key from your public key. This is because digital signatures are verified when money is spent, not when it is received.

It is widely believed that practical quantum computing is at least several years away. Maybe large-scale quantum computing is decades away, and maybe it will never happen. Nobody knows. But if quantum computing becomes practical before the world shifts to quantum-resistant cryptography, it would be a catastrophe. It’s hard to appreciate how thoroughly public key cryptography is woven into contemporary life.

Could there be a quantum computer sitting in the basement of the NSA right now? I doubt the NSA could be a decade ahead of public quantum computing efforts, but this is just speculation. Classified cryptography research has historically been ahead of open research. For example, differential cryptanalysis was discovered long before it was made public. I expect the gap between classified and open research is smaller now that there is much more public research in cryptography, but presumably there is still some gap.

Related posts

[1] NIST recommends ML-DSA (Dilithium) and SLH-DSA (SPHINCS+) for post-quantum digital signature algorithms. See FIPS 204 and FIPS 205.