Monero’s seed phrase words

I wrote a couple posts last month about the seed phrase words used by Bitcoin and other cryptocurrencies. There are 2048 words on the BIP39 list. Monero uses a different word list, one with 1626 words [1]. You can find Monero’s list here.

Why 1626 words?

It’s not hard to guess why the BIP 39 list has 2048 words: each one encodes 11 bits of a key because 211 = 2048. It’s not as obvious where the number 1626 comes from. It is the smallest value of n such that

n24 > 2256

Monero uses a seed phrase of 25 words, but the last word is a checksum, so there are 24 words which are used to create a 256-bit private key.

Distinctiveness

I criticized the BIP 39 list for being less than ideal for memorization because some words are similar phonetically, such as angle and ankle. Other words are harder to remember because they are not vivid nouns or verbs, such as  either or neither.

The Monero list has 200 words that differ by one character, compared to 484 for BIP 39.

But as with the BIP 39 list, some of Monero’s words are similar and not easy to visualize, such as adapt, adept, and adopt.

Prefix uniqueness

One nice feature of the BIP 39 list is that the first four letters of each word are distinct. So if you’re typing the words, autocomplete can fill in the rest of the word by the time you’ve entered four letters.

Monero goes one step further: all words are uniquely determined by the first three letters.

Overlap with other lists

Only about 1/4 of the words on Monero’s list. And most of Monero’s words are not in Google’s list of 10,000 most common words.

venn diagram M: 1626, B: 2048, M∩B: 425, G: 10000, M∩G: 701, B∩G: 1666, M∩B∩G: 339

Related posts

[1] Monero has other ways to convert words to keys. The way described here is now known as the Legacy mnemonics. The new Polyseed algorithm uses the BIP 39 word list.

How quantum computing would affect Bitcoin

Bitcoin relies on two kinds of cryptography: digital signatures and hash functions. Quantum computing would be devastating to the former, but not the latter.

To be more specific, the kind of digital signatures used in Bitcoin could in theory be broken by quantum computer using Shor’s algorithm. Digital signatures could use quantum-resistant algorithms [1], but these algorithms are not used in Bitcoin (or much of anything else) at this time.

A quantum computer could use Grover’s algorithm to reduce the time required to reverse hash functions, but this is a much smaller threat. For this post, we will simply assume a quantum computer could break digital signatures but not hash functions.

Digital signatures rely on public key cryptography, but Bitcoin uses something you might call not-so-public key cryptography. Public keys are not necessarily made public. You can receive funds sent to a hash of your public key, but to spend funds you have to reveal your unhashed public key.

This asymmetry means that presumably you could receive funds and be safe from quantum computers as long as you never spend the funds. But once you spend the funds, your public key really is public, published on the blockchain, and a quantum computer could derive your private key from your public key. This is because digital signatures are verified when money is spent, not when it is received.

It is widely believed that practical quantum computing is at least several years away. Maybe large-scale quantum computing is decades away, and maybe it will never happen. Nobody knows. But if quantum computing becomes practical before the world shifts to quantum-resistant cryptography, it would be a catastrophe. It’s hard to appreciate how thoroughly public key cryptography is woven into contemporary life.

Could there be a quantum computer sitting in the basement of the NSA right now? I doubt the NSA could be a decade ahead of public quantum computing efforts, but this is just speculation. Classified cryptography research has historically been ahead of open research. For example, differential cryptanalysis was discovered long before it was made public. I expect the gap between classified and open research is smaller now that there is much more public research in cryptography, but presumably there is still some gap.

Related posts

[1] NIST recommends ML-DSA (Dilithium) and SLH-DSA (SPHINCS+) for post-quantum digital signature algorithms. See FIPS 204 and FIPS 205.

Why are CUDA kernels hard to optimize?

Explosive datacenter demand has caused developers to leave no stone unturned in search of higher efficiencies. The DeepSeek team, not satisfied with Nvidia’s CUDA libraries, used a virtualized form of assembly language (PTX) to write kernel codes to accelerate their AI computations. Others have attempted to generate optimized kernels using AI, though some results have been questioned (for various attempts, see also here, here, here, here and here).

Why is it hard to write peak-speed GPU code? Writing really fast code has always been arduous, but it seems especially so for modern GPUs.

To understand the issues, my colleagues and I performed a detailed study of GPU kernel performance, across eight different GPU models from three GPU vendors [1]. The test case we considered was low precision matrix multiply, a resource-intensive operation for LLM training. We ran many, many experiments to understand what causes performance variability and why kernels sometimes run slower than you’d think they should.

For the cases we studied, we found about half a dozen different factors, but the upshot is this: modern processors like GPUs have become so complex—notably their multi-layered hierarchical memory subsystems—that it is difficult to get consistently high performance across all problem sizes a user might want to run in practice. As a result, the performance for the target problem might be surprisingly and mysteriously less than the advertised peak performance for the operation in question. The reasons might be obvious—like cache line misalignment—or more opaque. For the matrix multiply case, various issues like the need for prefetching, caching, tiling and block size selection, make it difficult for the kernel developer to optimize for every input size a user might specify.

Below is an example graphic from our paper. The color indicates floating point operation rate (FLOPs) for a reduced precision matrix multiply on a representative GPU using a library call. The horizontal and vertical axes refer to the matrix dimensions for the problem (see paper for details). Though some regions show performance near the theoretical peak (red), other immediately adjacent regions show problem sizes that run dramatically less—in fact, only about half of peak performance, or less. Presumably this is because either individual kernel performance or the selection of kernels used by the library is suboptimal. The net outcome is, if your problem lands a “bad” region, you’re in for a big surprise, your performance will be much less than expected, and you may not understand why. All high-performing GPUs we tested showed irregular behaviors such as this [2] [3].

In the past this was not always a problem.  Older architectures like Sun Sparc or Cray vector processor, complex as they were, were simple enough that a reasonably well-tuned computational kernel might run well across most if not all inputs [4]. Today, performance is much harder to predict and can vary substantially based on the requested problem sizes.

This is a tough challenge for library developers. Whenever a new GPU model family comes out, new kernel optimization and tuning are required to give (hopefully) more consistently high performance, and some cases get more developer attention than others due to customer needs and limited developer resources. As a result, infrequently used operations do not get as much attention, but they may be the exact ones you need for your particular case [5].

Tools are available to help optimize for specific cases. The excellent Nvidia CUTLASS library exposes access to many more fine-grained options compared to the standard cuBLAS library. The not faint of heart can try programming Nvidia GPUs at the level of PTX, or (shudder) SASS. Superoptimization might help, but only for very small code fragments and even then there may be too many external factors influencing performance to make it effective.

Autotuning is a promising approach though it doesn’t seem to have reached its full potential in production. AI might really help here [6]; in our own paper we had some success using machine learning methods like decision trees and random forests to model performance as a function of problem size, though our work was exploratory and not production-ready. To make a well-crafted general solution it would seem would require a lot of effort to do right. Code sustainability and maintenance are also critical; a sustainable workflow would be needed to retrain on new GPUs, new CUDA releases and even site-specific and system-specific settings like GPU power and frequency cap policies.

Most recent AI-driven work focuses on optimizing performance for one or a few problem sizes only. A truly production-quality general purpose tool would give both 100% accurate results and also top achievable performance for any input problem size (even for corner cases) or data type. This would require both optimized GPU kernels and optimal kernel dispatcher for kernel selection. And the method would need to be robust to issues like power and frequency variabilities in production runs. This would seem to currently be an unsolved problem. Solving it would be of huge benefit to the hyperscaler community.

Notes

[1] For related work from a slightly different angle, see this excellent work from Matt Sinclair’s lab.

[2] It turned out this study was helpful to us for production runs, to help us to triage an odd performance conundrum we encountered when attempting an exascale run (see here, here).

[3] Incidentally this example shows the hazards of simplistic benchmark suites to measure GPU code performance. Unless the benchmark captures a truly large and varied set of input cases, any new optimization method proposed can artificially “overfit” performance on the tests and still underperform miserably on many user cases of interest.

[4] I once wrote a 1-D wavelet convolution kernel for a Sparc processor, using a circular register buffer and loop unrolling to minimize loads and stores, this achieving near-peak performance. The code was correctly compiled from C to assembly, and performance for a given problem was almost precisely predictable. That was before the days of complex memory hierarchies.

[5] One vendor I know of used to take customer requests for hand tuning expensive library calls and made them run fast at the specific customer problem sizes.

[6] LLM kernel generation seems like a natural fit, particularly since LLM-generated code quality has much improved in recent months. Kernel selection and parameter selection for block size, tiling etc. might be better solved by direct training of machine learning models, or methods like this. Comparative studies on this would be informative.

 

What’s hierarchical about a hierarchical wallet?

A few days ago I wrote about what’s in a crypto wallet. In that post I said that most crypto wallets now are hierarchical deterministic (HD) wallets.  And I said that HD wallets are deterministic in the sense that they derive all their keys from a seed phrase. But in what sense are HD wallets hierarchical? That’s the topic of this post.

A warm-up story

In the game of 20 questions, one person thinks of something and another tries to guess what it is by asking up to 20 yes-no questions. I once heard the physicist John Wheeler tell of a variation of this game in which the first person did not have a definite object in mind, but decided after each question what the answer should be. For example, if someone asks “Is this person a man?” the person would commit to the person being a man or woman, but would not decide on a particular man or woman yet.

Wheeler’s point was that quantum mechanics is like this variation on 20 questions in that the answers to questions don’t exist until the question is asked. What does this have to do with hierarchical deterministic wallets? Your private keys do not exist until you ask for them. But once you have created and used a key, a wallet will behave consistently with that creation.

The hierarchy

The hierarchy referred to in a hierarchical deterministic wallet is a set of five variables, as described in BIP-44:

  1. Purpose
  2. Coin type
  3. Account
  4. Change
  5. Address index

The meaning of the variables is explained in BIP-44. The lowest level, address index, is a sequential counter. So you can have separate sequential counters for each value of the four-tuple (purpose, coin type, account, change).

Your master key and the five variables above are inputs to a key derivation function used to create new keys as needed. Once you use a private key, a hash of its corresponding public key is memorialized on the blockchain. If it’s a Bitcoin transaction, it’s on the Bitcoin blockchain. If it’s an Ethereum transaction, it’s on the Ethereum blockchain, etc. (You can find a list of supported coin types here.)

You wallet does not (or at least logically need not) store all your keys. It can reason as follows. “If the master key and these hierarchical values were used, this would be the private key. And given this private key, this would be the public key, and this would be the corresponding address. Let me consult the blockchain to see whether in fact it was used.”

How would a wallet know how many transactions you’ve made under a particular branch of the hierarchy? It searches the corresponding blockchain. It first looks whether there is a ledger entry corresponding to address index 0. Then address index 1, etc. The algorithm allows for the possibility of gaps. If it cannot find a ledger entry corresponding to index 2, it looks for index 3, etc. up to a gap of 20. After looking ahead 20 index values and finding nothing, it concludes there is nothing else to be found.

Because everything is derived deterministically from the seed phrase and the hierarchical variables, you can back up a wallet by simply backing up the seed phrase.

In theory, you could carry out transactions using one brand of wallet, back it up by writing down the seed phrase, then restore the information to a different brand of wallet. In practice you may run into difficulty doing this.

Related posts

What’s in your wallet?

What’s in your Bitcoin wallet? Very little. I don’t mean very little value, but very little data. If you’re a Bitcoin billionaire, your wallet still doesn’t contain very many bits.

You might reasonably expect that a crypto wallet is a container, given the analogy with an ordinary wallet, but it’s not much of a container. It’s more like a bank teller than a physical wallet. It’s primarily a piece of software that facilitates transactions.

It’s misleading to speak of a wallet as holding cryptocurrency. It holds private keys that allow you to access cryptocurrency, so in that sense it’s a password manager. Not only that, these days it’s effectively a password manager containing only one password.

When you back up a wallet, say the Exodus wallet, it asks you to store a seed phrase somewhere safe. Fine, but how do you back up your wallet, not just your seed phrase? How do you back up the data in your wallet? You don’t, because there isn’t any data in your wallet other than the seed phrase [1]. Everything is derived from your seed phrase, at least for the current style of wallet, technically known as a hierarchical deterministic (HD) wallet. The derivation algorithm is documented in BIP39.

The first crypto wallets held a set of private keys. But now wallets derive private keys from your seed phrase. So even though you may have used multiple private keys, the wallet doesn’t store multiple keys. It could store them, but it doesn’t need to. That’s why you can back up your wallet by only backing up your seed phrase.

Related posts

[1] A wallet may contain metadata, such as notes on transactions or software configuration preferences. This data isn’t recovered if you restore you wallet from just your seed phrase. But all your private keys can be regenerated. Think of the private keys as traditional metal house keys and the metadata as a sticky note on top of the keys. If you lose your keys and the note, you’d be happy to get the keys back.

Genesis Block Easter Egg

The White House put out a position paper Strengthening American Leadership in Digital Financial Technology a few days ago. The last page of the paper contains a hex dump.

Kinda surprising to see something like that coming out of the White House, but it makes sense in the context of cryptocurrency. Presumably Donald Trump has no idea what a hex dump is, but someone around him does.

My first thought was that something was wrong because the hex codes don’t correspond to the text on the side as it would if you were opening a text file in a hex editor. But it’s not a mistake; it’s an Easter Egg.

Extracting text from image

I tried to convert the image to text using tesseract but it fell down. I’ve had good experience with tesseract in the past, but this time was disappointing.

I was skeptical that an LLM would do a better job, because the LLMs use tesseract internally. Or at least at one time OpenAI did. Grok 4 initially did a poor job, but it worked after I gave it more help using the following prompt.

Convert the attached image to text. It is a hex dump: all characters are hexadecimal symbols: digits and the capital letters A, B, C, D, E, or F.

Here’s the result.

01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 3B A3 ED FD 7A 7B 12 B2 7A C7 2C 3E
67 76 8F 61 7F C8 1B C3 88 8A 51 32 3A 9F B8 AA
4B 1E 5E 4A 29 AB 5F 49 FF FF 00 1D 1D AC 2B 7C
01 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01 04 45 54 68 65 20 54 69 6D 65 73 20 30 33 2F
4A 61 6E 2F 32 30 30 39 20 43 68 61 6E 63 65 6C
6C 6F 72 20 6F 6E 20 62 72 69 6E 6B 20 6F 66 20
73 65 63 6F 6E 64 20 62 61 69 6C 6F 75 74 20 66
6F 72 20 62 61 6E 6B 73 FF FF FF FF 01 00 F2 05
2A 01 00 00 00 43 41 04 67 8A FD B0 FE 55 48 27
19 67 F1 A6 71 30 B7 10 5C D6 A8 28 E0 39 09 A6
79 62 E0 EA 1F 61 DE B6 49 F6 BC 3F 4C EF 38 C4
F3 55 04 E5 1E C1 12 DE 5C 38 4D F7 BA 0B 8D 57
8A 4C 70 2B 6B F1 1D 5F AC 00 00 00 00

The Genesis Block

The hex content is the header of the Bitcoin “Genesis Block,” the first block in the Bitcoin blockchain. You can find full breakdown of the bytes here.

The defining characteristic of a blockchain is that it is a chain of blocks. The blocks are connected by each block containing the cryptographic hash of the previous block’s header. For Bitcoin, the hash starts in the 5th byte and runs for the next 32 bytes. You see a lot of zeros at the top of the hex dump above because the Genesis Block had no predecessor on the chain.

Easter Egg Within an Easter Egg

Quoting the hex dump of the Genesis Block in the whitepaper was an Easter Egg for Bitcoin enthusiast. The Genesis Block contains a sort of Easter Egg of its own.

The section of the header

    54 69 6D ... 6E 6B 73

is the ASCII text

The Times 03/Jan/2009 Chancellor on brink of second bailout for banks

Satoshi Nakamoto quoted the headline from The Times from January 3, 2009 to prove that the genesis block was created on or after that date. The headline seems to also be a sort of Easter Egg, an implicit commentary on the instability of fractional-reserve banking.

Related posts

Looking back at Martin Gardner’s RSA article

Public key cryptography came to the world’s attention via Martin Gardner’s Scientific American article from August 1977 on RSA encryption.

The article’s opening paragraph illustrates what a different world 1977 was in regard to computation and communication.

… in a few decades … the transfer of information will probably be much faster and much cheaper by “electronic mail” than by conventional postal systems.

Gardner quotes Ron Rivest [1] saying that breaking RSA encryption by factoring the product of two 63-digit primes would take about 40 quadrillion years. The article included a challenge, a message encrypted using a 129-digit key, the product of a 64-digit prime and a 65-digit prime. Rivest offered a $100 prize for decrypting the message.

Note the tension between Rivest’s estimate and his bet. It’s as if he were saying “Based on the factoring algorithms and computational hardware now available, it would take forever to decrypt this message. But I’m only willing to bet $100 that that estimate remains valid for long.”

The message was decrypted 16 years later. Unbeknownst to Gardner’s readers in 1977, the challenge message was

THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE

encoded using 00 for space, 01 for A, 02 for B, etc.  It was decrypted in 1993 by a group of around 600 people using around 1600 computers. Here is a paper describing the effort. In 2015 Nat McHugh factored the key in 47 minutes using 8 CPUs on Google Cloud.

The RSA algorithm presented in Gardner’s article is much simpler than it’s current implementation, though the core idea remains unchanged. Now we use much larger public keys, the product of two 1024 bit (308 digit) primes or larger. Also, RSA isn’t used to encrypt messages per se; RSA is used to exchange symmetric encryption keys, such as AES keys, which are then used to encrypt messages.

RSA is still widely used, though elliptic curve cryptography (ECC) is taking its place, and eventually both RSA and ECC will presumably be replaced with post-quantum methods.

More RSA posts

[1] I met Ron Rivest at the Heidelberg Laureate Forum in 2013. When he introduced himself I said something like “So you’re the ‘R’ in RSA?” He’s probably tired of hearing that, but if so he was too gracious to show it.

Factoring RSA100

Earlier today I wrote about factoring four 255-bit numbers that I needed for a post. Just out of curiosity, I wanted to see how long it would take to factor RSA 100, the smallest of the factoring challenges posed by RSA Laboratories in 1991. This is a 100-digit (330-bit) number that is the product of two primes.

I used the CADO-NFS software. The software was developed in France, and CADO is a French acronym for Crible Algébrique: Distribution, Optimisation. NFS stands for number field sieve, the fastest algorithm for factoring numbers with over 100 digits.

RSA 100 was first factored in 1991 using a few days of compute time on an MP1 MasPar computer, a machine that cost $500,000 at the time, equivalent to around $1,250,000 today.

My effort took about 23 minutes (1376 seconds) on a System 76 Meerkat mini that I paid $600 for in 2022.

The MP1 was about the size of a refrigerator. The Meerkat is about 3″ × 3″ × 1.5″.

Most legible font for WIF

Bitcoin’s Wallet Import Format (WIF) is essentially Base58 encoding with a checksum. (See the next post for details.) It is meant to a human-friendly way to display cryptographic private keys. It’s not that friendly, but it could be worse.

The checksum (the first four bytes of SHA256 applied twice) is appended before the conversion to Base58, so the final result consists of only Base58 characters.

The Base58 alphabet is

    123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

and so some easily-confused characters have been removed. For example, the lower case letter o is included but the upper case O and the numeral 0 are not included. The lower case letter l has been removed so that it won’t be confused with the numeral 1.

But there are still a few letters that could be confused:

    1ij 2Zz Cc Kk 5Ss Uu Vv Ww Xx Yy Zz

I was curious what font might make these letters the most distinct, and the best I found was IBM Plex Mono Italic.

Similar letters in IBM Plex Mono italic

The pairs Cc and Ss are still similar, but the rest of the upper and lower case pairs are distinct. (Note the serif on the lower case u, for example.)

Without the italic, lower case v, x, and z are simply smaller versions of their upper case counterparts.

Here’s the whole Base58 alphabet in IBM Plex Mono italic. Note the “holes” in the alphabet where some letters were removed.

Related posts

Retrofitting error detection

Bitcoin addresses include a checksum. The Base58Check algorithm uses the first four bytes of a hash function as a checksum before applying Base58 encoding.

Ethereum addresses did not include a checksum, but it became apparent later that a checksum was needed. How can you retroactively fit a checksum into an address without breaking backward compatibility? Here’s what Ethereum did in adopting the EIP-55 proposal.

Ethereum addresses are the last 20 bytes (40 hexadecimal characters) of the Keccak-256 hash of the public key. The protocol allowed the letters in hexadecimal addresses to be either upper or lower case. This option provided the wiggle room to retrofit a checksum. You could mix upper and lower case letters deliberately to encode an extra message (i.e. a checksum) on top of the key. This is sorta like steganography, except that it’s out in the open rather than hidden.

Hexadecimal notation uses 10 digits and 6 letters, and so the probability that a hexadecimal character is a letter is 6/16. On average a string of 40 hexadecimal characters will contain 15 letters. This means you could add 15 bits of information to an Ethereum address, on average, by alternating the case of the letters.

It’s conceivable that an address won’t contain any letters, or will consist entirely as letters. The number of letters in a random string of 40 hexadecimal characters is a binomial random variable with parameters n = 40 and p = 3/8, and this is approximately a normal random variable with mean 15 and standard deviation 3. This says the number of letters in an Ethereum address will be fairly tightly clustered around 15, rarely more than 21 or less than 9.

OK, but how do you take advantage of an average of 15 bits of freedom? You can’t encode a 15-bit number because you might not have 15 bits at your disposal.

Here’s what EIP-55 did. Left-align the address and the Keccek-256 hash of the address (which was itself a hash: there are two hashes going on) and capitalize all letters in the address that align with a character in the hash greater than or equal to 8.

As an example, let’s suppose our address is

    7341e5e972fc677286384f802f8ef42a5ec5f03b

This address contains 13 letters, which would not be unusual. Now let’s compute its hash.

    >>> from Crypto.Hash import keccak
    >>> kh = keccak.new(digest_bits=256)
    >>> kh.update(b"7341e5e972fc677286384f802f8ef42a5ec5f03b").hexdigest()
    'd8e8fcb225fb835fdb89a5918e736820ec75b3efaf3cbb229f03cdc41bf9f254'

Now we line up the address and its hash.

    341e5e972fc677286384f802f8ef42a5ec5f03b
    d8e8fcb225fb835fdb89a5918e736820ec75b3e...

The characters in the address that line up with a hash character of 0x8 through 0xf are highlighted red. The digits will be left alone, but the red letters will be turned to upper case.

    341E5E972fc677286384F802F8ef42a5EC5f03B

Related posts