An integral theorem of Gauss

Gauss proved in 1818 that the value of integral

\int_0^{\pi/2} \left( x^2 \sin^2\theta + y^2 \cos^2 \theta \right)^{-1/2} \, d\theta

is unchanged if x and y are replaced by (xy)/2 and √(xy), i.e. if you replaced x and y with their arithmetic mean and geometric mean [1].

So, for example, if you wanted to compute

\int_0^{\pi/2} \left( 9 \sin^2\theta + 49 \cos^2 \theta \right)^{-1/2} \, d\theta

you could instead compute

\int_0^{\pi/2} \left( 25 \sin^2\theta + 21 \cos^2 \theta \right)^{-1/2} \, d\theta

Notice that the coefficients of sin² θ and cos² θ are more similar in the second integral. It would be nice if the two coefficients were equal because then the integrand would be a constant, independent of θ, and we could evaluate the integral. Maybe if we apply Gauss’ theorem again and again, the coefficients will become more nearly equal.

We started with x = 3 and y = 7. Then we had x = 5 and y = √21 = 4.5826. If we compute the arithmetic and geometric means again, we get x = 4.7913 and y = 4.7874.  If we do this one more time we get xy = 4.789013. The values of x and y still differ, but only after the sixth decimal place.

It would seem that if we keep replacing x and y by their arithmetic and geometric means, we will converge to a constant. And indeed we do. This constant is called the arithmetic-geometric mean of x and y, denoted AGM(xy). This means that the value of the integral above is π/ (2 AGM(xy)). Because the iteration leading to the AGM converges quickly, this provides an efficient numerical algorithm for computing the integral at the top of the post.

This is something I’ve written about several times, though in less concrete terms. See, for example, here. Using more advanced terminology, the AGM gives an efficient way to evaluate elliptic integrals.

Related posts

[1] B. C. Carlson, Invariance of an Integral Average of a Logarithm. The American Mathematical Monthly, Vol. 82, No. 4 (April 1975), pp. 379–382.

El Salvador’s Bitcoin and Quantum Computing

The treasury of El Salvador owns over 6,000 Bitcoins. Its total holdings are currently worth roughly $700,000,000. These coins had been associated with one private key. Yesterday El Salvador announced that it would split its funds into 14 wallets in order to protect the funds from quantum computing. You can confirm using a blockchain explorer that this has indeed happened.

The most oversimplified takes say El Salvador has protected its Bitcoin reserves from a quantum computing attack by moving from one wallet to 14 wallets. But moving money from one private key to 14 private keys does not it itself offer any protection against quantum computing. It only means that stealing the funds would take 14 times longer. Unsurprisingly the original announcement is more nuanced than the online commentary on the announcement.

For reasons explained in the previous post, Bitcoins are safe from quantum attack until they are spent. You don’t need to reveal your public key to receive Bitcoins, only a hash of your public key. But you do need to reveal your public key to spend Bitcoins.

As far as I know, the one key that until recently held all of El Salvador’s Bitcoin reserves had never spent coins, and so the public key was never made public. If so, the reserves were already safe from quantum computing attacks, at least to the extent that the reserves are safe now. So, strictly speaking, yesterday’s move did not increase the security of the reserves from quantum attack.

What yesterday’s move accomplished was making it easier to spend part of the reserves. Now one of the wallets could spend reserves, revealing the public key associated with part of the reserves rather than a single key associated with all the reserves.

Splitting one large wallet into multiple smaller wallets is a good move, for reasons that have nothing to do with quantum computing. But there is a quantum angle here, namely that the split reduces the risk of quantum computing attack when some of the funds are spent.

There are proposals to harden the Bitcoin protocol against quantum attacks, and perhaps El Salvador can avoid spending from some or all of its wallets until one of these proposals is implemented.

Related posts

How quantum computing would affect Bitcoin

Bitcoin relies on two kinds of cryptography: digital signatures and hash functions. Quantum computing would be devastating to the former, but not the latter.

To be more specific, the kind of digital signatures used in Bitcoin could in theory be broken by quantum computer using Shor’s algorithm. Digital signatures could use quantum-resistant algorithms [1], but these algorithms are not used in Bitcoin (or much of anything else) at this time.

A quantum computer could use Grover’s algorithm to reduce the time required to reverse hash functions, but this is a much smaller threat. For this post, we will simply assume a quantum computer could break digital signatures but not hash functions.

Digital signatures rely on public key cryptography, but Bitcoin uses something you might call not-so-public key cryptography. Public keys are not necessarily made public. You can receive funds sent to a hash of your public key, but to spend funds you have to reveal your unhashed public key.

This asymmetry means that presumably you could receive funds and be safe from quantum computers as long as you never spend the funds. But once you spend the funds, your public key really is public, published on the blockchain, and a quantum computer could derive your private key from your public key. This is because digital signatures are verified when money is spent, not when it is received.

It is widely believed that practical quantum computing is at least several years away. Maybe large-scale quantum computing is decades away, and maybe it will never happen. Nobody knows. But if quantum computing becomes practical before the world shifts to quantum-resistant cryptography, it would be a catastrophe. It’s hard to appreciate how thoroughly public key cryptography is woven into contemporary life.

Could there be a quantum computer sitting in the basement of the NSA right now? I doubt the NSA could be a decade ahead of public quantum computing efforts, but this is just speculation. Classified cryptography research has historically been ahead of open research. For example, differential cryptanalysis was discovered long before it was made public. I expect the gap between classified and open research is smaller now that there is much more public research in cryptography, but presumably there is still some gap.

Related posts

[1] NIST recommends ML-DSA (Dilithium) and SLH-DSA (SPHINCS+) for post-quantum digital signature algorithms. See FIPS 204 and FIPS 205.

Storing data in images

This post will connect a couple posts from yesterday and explore storing data in images.

Connections

There’s a connection between two blog posts that I wrote yesterday that I only realized today.

The first post was about the probability of sending money to a wrong Bitcoin address by mistyping. Checksums make it extremely unlikely that a typo would result in an invalid address. And if somehow you got past the checksum, you would very likely send money to an unused address rather than an unintended used address.

The second post was about QR codes that embed images. I give a couple examples in the post. The first is a QR code that looks like the symbol for a chess king that links to my Kingwood Data Privacy site. The second is a QR code that looks like the icon for @CompSciFact and links to that account’s page on X.

Someone responded to the first post saying that nobody types in Bitcoin addresses; they scan QR codes. The second post suggests QR codes are very robust. I wrote the post just having fun with the novelty of using QR codes in an unintended way, but there is a pragmatic point as well. The fact that you can play with the pixel sizes enough to create images means that there is a lot of safety margin in QR codes.

If you send Bitcoin by scanning a QR code, it’s very likely that you will scan the code correctly. The individual pixels are robust, and QR codes have checksums built in. If somehow you managed to incorrectly scan a QR code for an address, bypassing QR code checksums, then the protections mentioned in the typo post come into play. The checksums in a Bitcoin address apply whether mistype or mis-scan an address.

Storing data on paper

QR codes are a way to store a relatively small amount of data in an image. How much data could you store in an image the size of a sheet of paper?

Ondřej Čertík did an experiment to find out. He wanted to see how much data he could store on a piece of paper using a regular printer and reliably retrieve the data using a phone camera. You can find his results here.

QR code internals

I’ve thought about blogging about how the error detection and correction in QR codes works, but it’s too complicated. I like writing fairly short articles, and QR error correction is too complicated to describe in say 300 words. Other people have written long articles on how it works, and you can find these articles if you’re interested.

It might be possible to write a short article about why it takes a long article to describe QR code error correction. That is, it might be possible to describe briefly the reasons why the algorithm is so complicated. I imagine part of the complexity is just historical contingency, but I also imagine the problem has some intrinsic complexity that isn’t obvious at first glance.

Ondřej used Golay codes for error correction, and it worked fairly well. It would be interesting to compare the robustness of QR codes to a design that simply used Golay codes.

From Bitcoin to Jupiter

James Burke hosted a BBC series called Connections in which he would trace unexpected lines of connections. For example, I recall watching one episode in which he traced the connection between a touchstone and nuclear weapons.

In this post we’ve connected Jupiter to Bitcoin. How? Bitcoin addresses lead to a discussion of QR codes, which lead to discussing Ondřej’s experiment, which led to Golay codes, which were used on a Voyager probe to Jupiter.

 

An uncrossed knight’s tour

I’ve written several times about knight’s tours of a chessboard. The paths in these tours cross each other many times. What if you wanted to look tours that do not cross themselves? You can’t reach every square this way. You can reach half of them, but no more than half.

The following tour is part of the article Uncrossed Knight’s Tours in Donald Knuth’s book Selected Papers on Fun & Games.

Related posts

Dithered QR codes

I saw a post by Dave Richeson on Mastodon about making QR codes that look like images. Turns out you can shrink a black square in a QR code by up to a factor of three while keeping the code usable. This gives you the wiggle room to create dithered images.

I tried a few examples using this software. The code works with photos, but it works really well with simpler images like the ones below.

Here’s one for my data privacy company, Kingwood Data Privacy LLC.

kingwooddataprivacy.com

And here’s one for my account @CompSciFact on X.

https://x.com/CompSciFact

Probability of typing a wrong Bitcoin address

I heard someone say that Bitcoin is dangerous because you could easily make a typo when entering an address, sending money to the wrong person, and have no recourse. There are dangers associated with Bitcoin, such as losing a private key, but address typos are not a major concern.

Checksums

There are several kinds of Bitcoin addresses. Each is at least 20 bytes (160 bits) long, with at least 4 bytes (32 bits) of checksum. The chances of a typo resulting in a valid checksum are about 1 in 232.

Used addresses

Let’s ignore the checksum for this section.

Because addresses are formed by cryptographic hash functions, we can assume the values are essentially randomly distributed in the space of possible addresses. The addresses are deterministic, but for modeling purposes, random is as random does.

This means a typo of an actual address is no more or less likely to be another actual address than an address typed at random. This is unlike, say, English words: a mistyped English word is more likely to be another English word than random keystrokes would be.

There have been on the order of a billion Bitcoin addresses used, in a space of 2160 possibilities. (Actually more since some addresses have more than 160 bits.) There’s about a 1 in 1039 chance that a random 160-bit sequence corresponds to an address somewhere on the Bitcoin blockchain.

If you sent money to a non-existent address, you still lose your money, but it’s extremely unlikely that you would accidentally send money to a real address that you didn’t intend.

Addresses close in edit distance

Someone with the Caesarean handle Veni Vidi Vici on X asked

What about the odds that out of those 1B addresses, two of them are one character swap away from each other?

That’s an interesting question. Let’s assume the addresses are Base58-encoded strings of length 26. Addresses could be longer, but assuming the minimum length increases the probability of addresses being close.

How many addresses are within one or two character swaps of another? I addressed a similar question here a couple weeks ago. If all the characters were unique, the number of strings within k swaps of each other would be

|S1(26, 26 − k)|

where S1 denotes Stirling numbers of the first kind. For k = 1 this would be 325 and for k = 2 this would be 50,050. This assumes all the characters are unique; I haven’t thought through the case where characters are repeated.

For round numbers, let’s say there are a billion addresses, and for each address there are a million other addresses that are close in some sense, plausible typos of the address. That would be 1012 addresses and typos, spread out in a space of ≈1045 (i.e. 5826) possible addresses.

Now there’s an implicit Birthday Problem here. No particular address is likely to collide with another, even when you allow typos, but what about the likelihood that some address collides?

Say we partition our space of 1045 addresses into N = 1029 addresses with a million possible typos for each address. Then as a rule of thumb, you’d need around √N random draws before you have a 50-50 chance of seeing a collision. Since 109 is a lot less than 1014.5, it’s unlikely that any two addresses collide, even when you consider each address along with a million associated typos.

The biggest math symbol

The biggest math symbol that I can think of is the Riemann P-symbol

w(z)=P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

The symbol is also known as the Papperitz symbol because Erwin Papperitz invented the symbol for expressing solutions to Bernard Riemann’s differential equation.

Before writing out Riemann’s differential equation, we note that the equation has regular singular points at ab, and c. In fact, that is its defining feature: it is the most general linear second order differential equation with three regular singular points. The parameters ab, and c enter the equation in the as roots of an expression in denominators; that’s as it has to be if these are the singular points.

The way the Greek letter parameters enter Riemann’s equation is more complicated, but there is a good reason for the complication: the notation makes solutions transform as simply as possible under a bilinear transformation. This is important because Möbius transformations are the conformal automorphisms of the Riemann sphere.

To be specific, let

m(z) = \frac{Az + B}{Cz + D}

be a Möbius transformation. Then

P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\} = P \left\{ \begin{matrix} m(a) & m(b) & m(c) & \; \\ \alpha & \beta & \gamma & m(z) \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

Since the parameters on the top row of the P-symbol are the locations of singularities, when you transform a solution, moving the singularities, the new parameters have to be the new locations of the singularities. And importantly the rest of the parameters do not change.

Now with the motivation aside, we’ll write out Riemann’s differential equation in all its glory.

\frac{d^2w}{dz^2} + p(z) \frac{dw}{dz} + q(z) \frac{w}{(z-a)(z-b)(z-c)} = 0

where

p(z) = \frac{1-\alpha-\alpha^\prime}{z-a} + \frac{1-\beta-\beta^\prime}{z-b} + \frac{1-\gamma-\gamma^\prime}{z-c}

and

q(z) = \frac{\alpha\alpha^\prime (a-b)(a-c)} {z-a} +\frac{\beta\beta^\prime (b-c)(b-a)} {z-b} +\frac{\gamma\gamma^\prime (c-a)(c-b)} {z-c}

Related posts

You can’t have everything you want: beta edition

The beta distribution is a conjugate prior for a binomial likelihood function, so it makes posterior probability calculations trivial: you simply add your data to the distribution parameters. If you start with a beta(α, β) prior distribution on a proportion θ, then observe s successes and f failures, the posterior distribution on θ is

beta(α + s, β + f).

There’s an integration going on in the background, but you don’t have to think about it. If you used any other prior, you’d have to calculate an integral, and it wouldn’t have a simple closed form.

The sum α + β of the prior distribution parameters is interpreted as the number of observations contained in the prior. For example, a beta(0.5, 0.,5) prior is non-informative, having only as much information as one observation.

I was thinking about all this yesterday because I was working on a project where the client believes that a proportion θ is around 0.9. A reasonable choice of prior would be beta(0.9, 0.1). Such a distribution has mean 0.9 and is non-informative, and it would be fine for the use I had in mind.

Non-informative beta priors work well in practice, but they’re a little unsettling if you plot them. You’ll notice that the beta(0.9, 0.1) density has singularities at both ends of the interval [0, 1]. The singularity on the right end isn’t so bad, but it’s curious that a distribution designed to have mean 0.9 has infinite density at 0.

If you went further and used a completely non-informative, improper beta(0, 0) prior, you’d have strong singularities at both ends of the interval. And yet this isn’t a problem in practice. Statistical operations that are not expressed in Bayesian terms are often equivalent to a Bayesian analysis with an improper prior like this.

This raises two questions. Why must a non-informative prior put infinite density somewhere, and why is this not a problem?

Beta densities with small parameters have singularities because that’s just the limitation of the beta family. If you were using a strictly subjective Bayesian framework, you would have to say that such densities don’t match anyone’s subjective priors. But the beta distribution is very convenient to use, as explained at the top of this post, and so everyone makes a little ideological compromise.

As for why singular priors are not a problem, it boils down to the difference between density and mass. A beta(0.9, 0.1) distribution has infinite density at 0, but it doesn’t put much mass at 0. If you integrate the density over a small interval, you get a small probability. The singularity on the left is hard to see in the plot above, almost a vertical line that overlaps the vertical axis. So despite the curve going vertical, there isn’t much area under that part of the curve.

You can’t always get what you want. But sometimes you get what you need.

Related posts

More on seed phrase words

Last week I wrote about how the English seed phrase words for crypto wallets, proposed in BIP39, are not ideal for memorization. This post gives a few more brief thoughts based on these words.

Prefix uniqueness

The BIP39 words have a nice property that I didn’t mention: the words are uniquely determined by their first four letters. This means, for example, that you could type in a seed phrase on a phone by typing the first four letters of each word and a wallet interface could fill in the rest of each word.

Incidentally, although the BIP39 words are unique in how they begin, they are not unique in how they end. For example, cross and across are both on the list.

Creating a list

I wondered how hard it would be to come up with a list of 2048 common words that are unique in their first four letters. So I started with Google’s list of 10,000 most common words.

I removed one- and two-letter words, and NSFW words, to try to create a list similar to the BIP39 words. That resulted in a list of 4228 words. You could delete over half of these words and have a list of 2048 words uniquely determined by their first four letters.

Comparing lists

I was curious how many of the BIP39 list of 2048 words were contained in Google’s list of 10,000 most common words. The answer is 1666, or about 81%. (Incidentally, I used comm to answer this.)

Venn diagram

Vocabulary estimation and overlap

Here’s something else I was curious about. The average adult has an active vocabulary of between 20,000 and 35,000 words. So it seems reasonable that a typical adult would know nearly all the words on Google’s top 10,000 list. (Not entirely all. For example, I noticed one word on Google’s list that I hadn’t seen before.)

Now suppose you had a list of the 20,000 most common words and a person with a vocabulary of 20,000 words. How many words on the list is this person likely to know? Surely not all. We don’t acquire our vocabulary by starting at the top of a list of words, ordered by frequency, and working our way down. We learn words somewhat at random according to our circumstances. We’re more likely to know the most common words, but it’s not certain. So how would you model that?

It’s interesting to think how you would estimate someone’s vocabulary in the first place. You can’t give someone a quiz with every English word and ask them to check off the words they know, and estimation complicated by the fact that word frequencies are far from uniform. Maybe the literature on vocabulary estimation would answer the questions in the previous paragraph.

Related posts