You can find any integer you want as a substring of the digits in π. (Probably. See footnote for details.) So you could encode a number by reporting where it appears.

If you want to encode a single digit, the best you can do is break even: it takes at least one digit to specify the location of another digit. For example, 9 is the 5th digit of π. But 7 doesn’t appear until the 13th digit. In that case it would take 2 digits to encode 1 digit.

What if we work in another base *b*? Nothing changes as long as we also describe positions in base *b*. But there’s a *possibility* of compression if we look for digits of base *b* but describe their position using base *p* numbers where *p* < *b*. For example, 15 is the 2nd base-100 “digit” in π.

## Blocks of digits

For the rest of this post we’ll describe blocks of *k* digits by their position as a base 10 number. That is, we’ll use *p* = 10 and *b* = 10^{k} in the notation above.

There are ten 2-digit numbers that can be described by a 1-digit position in π: 14, 15, 32, …, 92. There are 57 more 2-digit numbers that can be described by a 2-digit position. The other 33 2-digit numbers require 3 digits to describe their position.

So if we encode a 2-digit number by its position in pairs of digits in π, there’s a 1/10 chance that we’ll only need 1 digit, i.e. a 10% chance of compression. The chance that we’ll break even by needing two digits is 57/100, and the chance that we’ll need three digits, i.e. more digits than what we’re encoding, is 33/100. On average, we expect to use 2.23 digits to encode 2 digits.

If we look at 3-digit numbers, there are 10 that can be described by a 1-digit position, 83 that require 2 digits, 543 that require 3 digits, and 364 that require 4 digits. So on average we’d need 3.352 digits to encode 3 digits. Our “compression” scheme makes things about 8% larger on average.

With 4-digit numbers, we need up to 5 digits to describe the position of each, and on average we need 4.2611 digits.

With 5-digit numbers, we need at least 7 digits to describe each position. As I write this, my program is still running. The positions of 99,994 five-digit numbers can be described by numbers with at most 6 digits. The remaining 6 need at least 7 digits. Assuming they need exactly seven digits, we need an average of 5.2623 digits to encode 5-digit numbers by their position in π.

## Compression scheme efficiency

If we’re assuming the numbers we’re wishing to encode are uniformly distributed, the representation as a location in π will take more digits than the number itself, on average. But **all compression algorithms expand their contents on average** if by “average” you mean picking all possible inputs with the same probability. Uniform randomness is not compressible. It takes *n* bits to describe *n* random bits.

Compression methods are practical because their inputs are *not* completely random. For example, English prose compresses nicely because it’s not random.

If we had some reason to suspect that a number came from a block of digits in π, and one not too far our, then the scheme here could be a form of compression. The possibility of efficient compression comes from an assumption about our input.

## Extensions

You could extend the ideas in this post theoretically or empirically, i.e. you could write a theorem or a program.

Suppose you look at blocks of *k* base-*b* numbers whose position is described as a base *b* number. The case *b* = 2 seems particularly interesting. What is the compression efficiency of the method as *k* varies? What is it for particular *k* and what is it in the limit as *k* does to infinity?

You could look at this empirically for digits of π, or some other number, or you could try to prove it theoretically for digits of a normal number.

## Footnote on normal numbers

It might not be true that you can find any integer in the digits of π, though we know it’s true for numbers of the size we’re considering here. You can find any number in the digits of π if it is a “normal number.” Roughly speaking, a number is normal if you can find any pattern in its digits with the probability you’d expect. That is, 1/10 of the digits in its decimal expansion will be 7’s, for example. Or if you grouped the digits in pairs, thinking of the number as being in base 100, 1/100 of the pairs to be 42’s, etc.

Almost all numbers are normal, so in a sense, the probability that π is normal is 1. Even though almost all numbers are normal, nobody has been able to prove that that any familiar number is normal: π, *e*, √2, etc.

It’s possible that every integer appears in the decimal expansion of π, but not necessarily with the expected probability. This would be a weaker condition than normality. Maybe this has been proven for π, but I don’t know. It would be strange if every number appeared in the digits of π but with some bias.

I remember being completely blown away when I first learned about normal numbers, and that almost all real numbers are normal.

After a while, I came to think of this as a fact about how few nonrepeating patterns exist, because the only way to get a non-normal number is for there to be a pattern to the digits (after possibly some finite prefix) that never stops, inducing the frequency bias. The repeating patterns (the rationals) have measure zero; this result just says that the nonrepeating patterns with digit bias also have measure zero.

David – I don’t think the only way to get a non-normal number is for there to be some pattern after some finite prefix. Consider all of the irrational numbers whose decimal expansion does not contain the number 5. Non of these numbers are normal and they never fall into a pattern. I also think there are an uncountable number of these numbers..

Let me add that after posting I went to wiki to read up in this and my exact example is in there! Promise I posted first 🙂

Jeremy – I think perhaps we just use the word ‘pattern’ differently. Cantor sets are the kind of ‘pattern’ I was thinking of. If we were looking at a string of digits, and I asked you “Do you see any patterns?” and you replied “There aren’t any fives”, I would think you had answered me directly and well.

If a ‘pattern’ means that there is a way to describe an aspect of the sequence that is shorter than the sequence itself, that gets back to information theory and the discussion of compression above. “Almost all numbers are normal” translates as “almost all digit sequences are random”, in the Kolmogorov sense.

Same idea

https://github.com/philipl/pifs