New prime number record: 50th Mersenne prime

A new record for the largest known prime was announced yesterday:

2^{77,232,917} - 1

This number has 23,249,425 digits when written in base 10.

In base 2, 2p – 1 is a sequence of p ones. For example, 31 = 25 -1  which is 11111 in binary. So in binary, the new record prime is a string of 77,232,917 ones.

77,232,917 ones

You can convert a binary number to hexadecimal (base 16) by starting at the right end and converting blocks of 4 bits into hexadecimal. For example, to convert 101101111 to hex, we break it into three blocks: 1, 0110, and 1111. These blocks convert to 1, 6, and F, and so 101101111 in binary corresponds to 16F in hex.

Now 77,232,917  = 19,308,229 * 4 + 1, so if we break our string of 77,232,917 ones into groups of four, we have a single bit followed by 19,308,229 groups of 4. This means that in hexadecimal, the new prime record is 1FFF…FFF, a 1 followed by 19,308,229 F’s.

19,308,229 Fs

The new record is the 50th Mersenne prime. A Mersenne prime is a prime number that is one less than a power of 2, i.e. of the form 2p – 1. It turns out p must be prime before 2p – 1 can be prime. In the case of the new record, 77,232,917 is prime.

It is unknown whether there are infinitely many Mersenne primes. But now we know there are at least 50 of them.

All the recent prime number records have been Mersenne primes because there is an algorithm for testing whether a number of the special form 2p – 1 is prime, the Lucas-Lehmer test.

Mersenne primes are closely related to perfect numbers. A number n is perfect if the sum of the numbers less than n that divide n equals n. For example, 28 is divisible by 1, 2, 4, 7, and 14, and 1 + 2 + 4 + 7 + 14 = 28.

Finding a new Mersenne prime means finding a new perfect number. If m is a Mersenne prime, then nm(m + 1)/2 is a perfect number. Conversely, if n is an even perfect number, n has the form m(m + 1)/2, Nobody knows whether odd perfect numbers exist. Since we don’t know whether there are infinitely many Mersenne primes, we don’t know whether there are infinitely many perfect numbers. But there are at least 50 of them.

Related posts:

Distribution of Fibonacci numbers mod m

The last digits of Fibonacci numbers repeat with period 60. This is something I’ve written about before.

The 61st Fibonacci number is 2504730781961. The 62nd is 4052739537881. Since these end in 1 and 1, the 63rd Fibonacci number must end in 2, etc. and so the pattern starts over.

It’s not obvious that the cycle should have length 60, but it is fairly easy to see that there must be a cycle.

In any base, the last digits must cycle. The length of these cycles varies erratically:

In this post I want to as a different question: how often do Fibonacci numbers take on each of the possible last digits in each base? In other words, how are the Fibonacci numbers distributed mod m for various values of m?

I haven’t tried to prove any theorems. I’ve just run some examples using the following Python code.

    def histogram(base):

        prev = 1
        F = 2
    
        counts = [0]*base
        counts[F] += 1
    
        while F != 1 or prev != 1:
            temp = F
            F = (F + prev) % base
            counts[F] += 1    
            prev = temp

        return counts

In base 10, the last digits of Fibonacci numbers have period 60. Do all digits appear in the cycle? Do they all appear equally often?

Yes and no. Here are the results for base 10:

    >>> histogram(10) 
    >>> [4, 8, 4, 8, 4, 8, 4, 8, 4, 8]

Each even digits appears 4 times and each odd digit appears 8 times.

What happens if we look at the last two digits, i.e. if we look at Fibonacci numbers mod 100?

    >>> histogram(100)
    >>> [2, 6, 2, 2, …, 2, 6, 2, 2]

Each two-digit number n appears six times if n = 1 mod 4. Otherwise it appears two times.

What about the last three digits? Now we see some zeros. For example, no Fibonacci number is congruent to 4 or 6 mod 1000. (Or mod 8.)

    >>> histogram(1000)
    >>> [2, 3, 2, 1, 0, 3, 0, 1, …, 2, 3, 2, 1, 0, 3, 0, 1]

Here’s something curious. The Fibonacci numbers are exactly evenly distributed mod 5.

    >>> histogram(5)
    >>> [4, 4, 4, 4, 4]

The same is apparently true for all powers of 5. Not only are all the frequencies the same, they’re all 4’s. I’ve tested this for powers of 5 up to 510. And conversely, it seems the Fibonacci numbers are not evenly distributed mod m unless m is a power of 5. I’ve tested this for m up to 100,000.

Conjecture: The Fibonacci numbers are uniformly distributed mod m if and only if m is a power of 5.

Update: The conjecture is correct, and was proved in 1972 by Niederreiter.

 

Recent exponential sums

The exponential sum of the day draws a line between consecutive partial sums of

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

where md, and y are the current month, day, and two-digit year. The four most recent images show how different these plots can be.

exponential sums

These images are from 10/30/17, 10/31/17, 11/1/17, and 11/2/17.

Consecutive dates often produce very different images for a couple reasons. First, consecutive integers are relatively prime. From a number theoretic perspective, 30 and 31 are very different, for example. (This touches on the motivation for p-adic numbers: put a different metric on integers, one based on their prime factorization.)

The other reason consecutive dates produce qualitatively different images is that you might roll over a month or year, such as going from October (10) to November (11). You’ll see a big change when we roll over from 2017 to 2018.

The partial sums are periodic [1] with period lcm(mdy). The image for 10/31/17 above has the most points because 10, 31, and 17 are relatively prime. It’s also true that 11, 2, and 17 are relatively prime, but these are smaller numbers.

You could think of the month, day, and year components of the sum as three different gears. The sums repeat when all three gears return to their initial positions. In the image for yesterday, 11/1/17, the middle gear is effectively not there.

[1] The terms of the sums are periodic. The partial sums are periodic if the full sum is zero. So if the partial sums are periodic, the lcm is a period.

Poisson distribution and prime numbers

Let ω(n) be the number of distinct prime factors of x. A theorem of Landau says that for N large, then for randomly selected positive integers less than N, ω-1 has a Poisson(log log N) distribution. This statement holds in the limit as N goes to infinity.

Apparently N has to be extremely large before the result approximately holds. I ran an experiment for N = 10,000,000 and the fit is not great.

Here’s the data:

|----+----------+-----------|
|    | Observed | Predicted |
|----+----------+-----------|
|  1 |   665134 |  620420.6 |
|  2 |  2536837 | 1724733.8 |
|  3 |  3642766 | 2397330.6 |
|  4 |  2389433 | 2221480.4 |
|  5 |   691209 | 1543897.1 |
|  6 |    72902 |  858389.0 |
|  7 |     1716 |  397712.0 |
|  8 |        1 |  157945.2 |
|  9 |        0 |   54884.8 |
| 10 |        0 |   16952.9 |
|----+----------+-----------|

And here’s a plot:

bar chart of actual vs predicted

I found the above theorem in these notes. It’s possible I’ve misunderstood something or there’s an error in the notes. I haven’t found a more formal reference on the theorem yet.

Update: According to the results in the comments below, it seems that the notes I cited may be wrong. The notes say “distinct prime factors”, i.e. ω(n), while numerical results suggest they meant to say Ω(n), the number of prime factors counted with multiplicity. I verified the numbers given below. Here’s a plot.

 

Here’s the Python code I used to generate the table. (To match the code for the revised graph, replace omega which calculated ω(n) with the SymPy function primeomega which calculates Ω(n).)

from sympy import factorint
from numpy import empty, log, log2
from scipy.stats import poisson

N = 10000000

def omega(n):
    return len(factorint(n))

table = empty(N, int)
table[0] = table[1] = 0
for n in range(2, N):
    table[n] = omega(n)

# upper bound on the largest value of omega(n) for n < N.
u = int(log2(N))

for n in range(1, u+1):
    print(n, len(table[table==n]), poisson(log(log(N))).pmf(n-1)*N)

Related posts

Finding numbers in pi

You can find any integer you want as a substring of the digits in π. (Probably. See footnote for details.) So you could encode a number by reporting where it appears.

If you want to encode a single digit, the best you can do is break even: it takes at least one digit to specify the location of another digit. For example, 9 is the 5th digit of π. But 7 doesn’t appear until the 13th digit. In that case it would take 2 digits to encode 1 digit.

What if we work in another base b? Nothing changes as long as we also describe positions in base b. But there’s a possibility of compression if we look for digits of base b but describe their position using base p numbers where p < b. For example, 15 is the 2nd base-100 “digit” in π.

Blocks of digits

For the rest of this post we’ll describe blocks of k digits by their position as a base 10 number. That is, we’ll use p = 10 and b = 10k in the notation above.

There are ten 2-digit numbers that can be described by a 1-digit position in π: 14, 15, 32, …, 92. There are 57 more 2-digit numbers that can be described by a 2-digit position. The other 33 2-digit numbers require 3 digits to describe their position.

So if we encode a 2-digit number by its position in pairs of digits in π, there’s a 1/10 chance that we’ll only need 1 digit, i.e. a 10% chance of compression. The chance that we’ll break even by needing two digits is 57/100, and the chance that we’ll need three digits, i.e. more digits than what we’re encoding, is 33/100. On average, we expect to use 2.23 digits to encode 2 digits.

If we look at 3-digit numbers, there are 10 that can be described by a 1-digit position, 83 that require 2 digits, 543 that require 3 digits, and 364 that require 4 digits. So on average we’d need 3.352 digits to encode 3 digits. Our “compression” scheme makes things about 8% larger on average.

With 4-digit numbers, we need up to 5 digits to describe the position of each, and on average we need 4.2611 digits.

With 5-digit numbers, we need at least 7 digits to describe each position. As I write this, my program is still running. The positions of 99,994 five-digit numbers can be described by numbers with at most 6 digits. The remaining 6 need at least 7 digits. Assuming they need exactly seven digits, we need an average of 5.2623 digits to encode 5-digit numbers by their position in π.

Compression scheme efficiency

If we’re assuming the numbers we’re wishing to encode are uniformly distributed, the representation as a location in π will take more digits than the number itself, on average. But all compression algorithms expand their contents on average if by “average” you mean picking all possible inputs with the same probability. Uniform randomness is not compressible. It takes n bits to describe n random bits.

Compression methods are practical because their inputs are not completely random. For example, English prose compresses nicely because it’s not random.

If we had some reason to suspect that a number came from a block of digits in π, and one not too far our, then the scheme here could be a form of compression. The possibility of efficient compression comes from an assumption about our input.

Extensions

You could extend the ideas in this post theoretically or empirically, i.e. you could write a theorem or a program.

Suppose you look at blocks of k base-b numbers whose position is described as a base b number. The case b = 2 seems particularly interesting. What is the compression efficiency of the method as k varies? What is it for particular k and what is it in the limit as k does to infinity?

You could look at this empirically for digits of π, or some other number, or you could try to prove it theoretically for digits of a normal number.

Footnote on normal numbers

It might not be true that you can find any integer in the digits of π, though we know it’s true for numbers of the size we’re considering here. You can find any number in the digits of π if it is a “normal number.” Roughly speaking, a number is normal if you can find any pattern in its digits with the probability you’d expect. That is, 1/10 of the digits in its decimal expansion will be 7’s, for example. Or if you grouped the digits in pairs, thinking of the number as being in base 100, 1/100 of the pairs to be 42’s, etc.

Almost all numbers are normal, so in a sense, the probability that π is normal is 1. Even though almost all numbers are normal, nobody has been able to prove that that any familiar number is normal: π, e, √2, etc.

It’s possible that every integer appears in the decimal expansion of π, but not necessarily with the expected probability. This would be a weaker condition than normality. Maybe this has been proven for π, but I don’t know. It would be strange if every number appeared in the digits of π but with some bias.

Empirically testing the Chowla conjecture

Terry Tao’s most recent blog post looks at the Chowla conjecture theoretically. This post looks at the same conjecture empirically using Python. (Which is much easier!)

The Liouville function λ(n) is (-1)Ω(n) where Ω(n) is the number of prime factors of n counted with multiplicity. So, for example, Ω(9) = 2 because even though 9 has only one distinct prime factor, that factor counts twice.

Given some set of k strides h1, h2, …, hk, define

f(n) = λ(n + h1) λ(n + h1) … λ(n + hk).

The Chowla conjecture says that the average of the first N values of f(n) tends to zero as N goes to infinity. This has been proven for k = 1 but not for larger k.

If f(n) acts like a Bernoulli random variable, our experiments might increase our confidence in the conjecture, but they wouldn’t prove anything. Unexpected results wouldn’t prove anything either, but a departure from random behavior might help find a counterexample if the conjecture is false.

We’re going to be evaluating the Liouville function repeatedly at the same arguments, so it will save some compute time if we tabulate its values. This will also let us use some compact Python notation to average f. We’ll need to tabulate the function up to Nhk.

In the code below, maxstride is an upper bound on the strides hk we may look at. SymPy has a function primeomega that calculates Ω(n) so we might as well use it. If you wanted to use a very large value of N, you might want to fill the array of Liouville function values using a more direct approach that avoids all the factoring implicit in calls to primeomega.

    from sympy import primeomega
    from numpy import empty

    N = 10000
    maxstride = 100

    liouville = empty(N + maxstride)
    liouville[0] = 1
    for n in range(1, len(liouville)):
        liouville[n] = (-1)**primeomega(n)

The following code looks at the Chowla conjecture for h1 = 0 and h2 ranging over 1, 2, …, 99.

    average = empty(maxstride-1)
    for s in range(1, maxstride):
        average[s-1] = (liouville[0:N] * liouville[s:N+s]).sum()/N

If the Liouville function is distributed like a random variable taking on -1 and 1 with equal probability, we’d expect the averages to vary like samples form a normal distribution with mean 0 and variance 1/(2N).

    print(average.mean())
    print(average.std())
    print( (2*N)**-0.5 )

This returns

    0.00141
    0.00851
    0.00707

and so the means are indeed near zero, and the standard deviation of the samples is about what we’d expect.

What if we replace Ω(n), the number of prime factors with multiplicity, with ω(n), the number of distinct prime factors? The averages above are still small, but the sample variance is about twice as big as before. I’ve seen this pattern with several different large values of N, so I suspect there’s something going on here.

(I didn’t see a function in SymPy corresponding to ω(n), but you can create your own with len(factorint(n)).

Exponential sums make pretty pictures

Exponential sums are a specialized area of math that studies series with terms that are complex exponentials. Estimating such sums is delicate work. General estimation techniques are ham-fisted compared to what is possible with techniques specialized for these particular sums. Exponential sums are closely related to Fourier analysis and number theory.

Exponential sums also make pretty pictures. If you make a scatter plot of the sequence of partial sums you can get surprising shapes. This is related to the trickiness of estimating such sums: the partial sums don’t simply monotonically converge to a limit.

The exponential sum page at UNSW suggests playing around with polynomials with dates in the denominator. If we take that suggestion with today’s date, we get the curve below:

f(n) = n/10 + n**2/7 + n**3/17

These are the partial sums of exp(2πi f(n)) where f(n) = n/10 + n²/7 + n³/17.

[Update: You can get an image each day for the current day’s date here.]

Here’s the code that produced the image.

    import matplotlib.pyplot as plt
    from numpy import array, pi, exp, log

    N = 12000
    def f(n):
        return n/10 + n**2/7 + n**3/17 

    z = array( [exp( 2*pi*1j*f(n) ) for n in range(3, N+3)] )
    z = z.cumsum()

    plt.plot(z.real, z.imag, color='#333399')
    plt.axes().set_aspect(1)
    plt.show()

If we use logarithms, we get interesting spirals. Here f(n) = log(n)4.1.

f(n) = log(n)**4.1

And we can mix polynomials with logarithms. Here f(n) = log(n) + n²/100.

f(n) = log(n) + n**2/100

In this last image, I reduced the number of points from 12,000 to 1200. With a large number of points the spiral nature dominates and you don’t see the swirls along the spiral as clearly.

 

Pascal’s triangle and Fermat’s little theorem

I was listening to My Favorite Theorem when Jordan Ellenberg said something in passing about proving Fermat’s little theorem from Pascal’s triangle. I wasn’t familiar with that, and fortunately Evelyn Lamb wasn’t either and so she asked him to explain.

Fermat’s little theorem says that for any prime p, then for any integer a,

ap = a (mod p).

That is, ap and a have the same remainder when you divide by p. Jordan Ellenberg picked the special case of a = 2 as his favorite theorem for the purpose of the podcast. And it’s this special case that can be proved from Pascal’s triangle.

The pth row of Pascal’s triangle consists of the coefficients of (xy)p when expanded using the binomial theorem. By setting x = y = 1, you can see that the numbers in the row must add up to 2p. Also, all the numbers in the row are divisible by p except for the 1’s on each end. So the remainder when 2p is divided by p is 2.

It’s easy to observe that the numbers in the pth row, except for the ends, are divisible by p. For example, when p = 5, the numbers are 1, 5, 10, 10, 5, 1. When p = 7, the numbers are 1, 7, 28, 35, 35, 28, 7, 1.

To prove this you have to show that the binomial coefficient C(p, k) is divisible by p when 0 < k < p. When you expand the binomial coefficient into factorials, you see that there’s a factor of p on top, and nothing can cancel it out because p is prime and the numbers in the denominator are less than p.

By the way, you can form an analogous proof for the general case of Fermat’s little theorem by expanding

(1 + 1 + 1 + … + 1)p

using the multinomial generalization of the binomial theorem.

Predicting when an RNG will output a given value

A few days ago I wrote about how to pick the seed of a simple random number generator so that a desired output came n values later. The number n was fixed and we varied the seed. In this post, the seed will be fixed and we’ll solve for n. In other words, we ask when a pseudorandom sequence will produce a given value.

In principle you could just run the RNG until you get the output you’re looking for, but we’ll assume such a brute force approach is not feasible or at least not fast enough.

If a LCG (linear congruential generator) has seed z, multiplier a, and modulus m, then the nth output is an z reduced mod m. So our task is to solve

x = an z mod m

for n. If we forget for a moment that we’re working with integers mod m, we see that the solution is

n = loga (x / z)

We can actually make this work if we interpret division by z to mean multiplication by the inverse of z mod m and if we interpret the logarithm to be a discrete logarithm. For more on discrete logarithms and one algorithm for computing them, see this post.

In an earlier post I used  a = 742938285 and m = 231 – 1 = 2147483647. We set n = 100 and solved for z to make the 100th output equal to 20170816, the date of that post. It turned out that z = 1898888478.

Now let’s set the seed z = 1898888478 and ask when the LCG sequence will take on the value x = 20170816. Of course we know that n will turn out to be 100, but let’s pretend we don’t know that. We’ll write a little Python script to find n.

I expect there’s a simple way to compute modular inverses using SymPy, but I haven’t found it, so I used some code from StackOverflow.

The following code produces n = 100, as expected.

from sympy.ntheory import discrete_log

def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modinv(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m

a = 742938285
z = 1898888478
m = 2**31 - 1
x = 20170816
zinv = modinv(z, m)

n = discrete_log(m, x*zinv, a)
print(n)

Reverse engineering the seed of a linear congruential generator

The previous post gave an example of manipulating the seed of a random number generator to produce a desired result. This post will do something similar for a different generator.

A couple times I’ve used the following LCG (linear congruential random number generator) in examples. An LCG starts with an initial value of z and updates z at each step by multiplying by a constant a and taking the remainder by m. The particular LCG I’ve used has a = 742938285 and m = 231 – 1 = 2147483647.

(I used these parameters because they have maximum range, i.e. every positive integer less than m appears eventually, and because it was one of the LCGs recommended in an article years ago. That is, it’s very good as far as 32-bit LCGs go, which isn’t very far. An earlier post shows how it quickly fails the PractRand test suite.)

Let’s pick the seed so that the 100th output of the generator is today’s date in ISO form: 20170816.

We need to solve

a100z = 20170816 mod 2147483647.

By reducing  a100 mod 2147483647 we  find we need to solve

160159497 z = 20170816 mod 2147483647

and the solution is z = 1898888478. (See How to solve linear congruences.)

The following Python code verifies that the solution works.

    a = 742938285
    z = 1898888478
    m = 2**31 - 1

    for _ in range(100):
        z = a*z % m
    print(z)

Update: In this post, I kept n = 100 fixed and solved for the seed to give a specified output n steps later. In a follow up post I do the opposite, fixing the seed and solving for n.

Least common multiple of the first n positive integers

Here’s a surprising result: The least common multiple of the first n positive integers is approximately exp(n).

More precisely, let φ(n) equal the log of the least common multiple of the numbers 1, 2, …, n. There are theorems that give upper and lower bounds on how far φ(n) can be from n. We won’t prove or even state these bounds here. See [1] for that. Instead, we’ll show empirically that φ(n) is approximately n.

Here’s some Python code to plot φ(n) over n. The ratio jumps up sharply after the first few values of n. In the plot below, we chop off the first 20 values of n.

from scipy import arange, empty
from sympy.core.numbers import ilcm
from sympy import log
import matplotlib.pyplot as plt

N = 5000
x = arange(N)
phi = empty(N)

M = 1
for n in range(1, N):
    M = ilcm(n, M)
    phi[n] = log(M)

a = 20
plt.plot(x[a:], phi[a:]/x[a:])
plt.xlabel("$n$")
plt.ylabel("$\phi(n) / n$")
plt.show()

Here’s the graph this produces.

[1] J. Barkley Rosser and Lowell Schoenfeld. Approximate formulas for some functions of prime numbers. Illinois Journal of Mathematics, Volume 6, Issue 1 (1962), 64-94. (On Project Euclid)

Quicksort and prime numbers

The average number of operations needed for quicksort to sort a list of n items is approximately 10 times the nth prime number.

Here’s some data to illustrate this.

|------+-----------------+---------|
|    n | avg. operations | 10*p(n) |
|------+-----------------+---------|
|  100 |          5200.2 |    5410 |
|  200 |         12018.3 |   12230 |
|  300 |         19446.9 |   19870 |
|  400 |         27272.2 |   27410 |
|  500 |         35392.2 |   35710 |
|  600 |         43747.3 |   44090 |
|  700 |         52297.8 |   52790 |
|  800 |         61015.5 |   61330 |
|  900 |         69879.6 |   69970 |
| 1000 |         78873.5 |   79190 |
| 1100 |         87984.4 |   88310 |
| 1200 |         97201.4 |   97330 |
| 1300 |        106515.9 |  106570 |
| 1400 |        115920.2 |  116570 |
| 1500 |        125407.9 |  125530 |
| 1600 |        134973.5 |  134990 |
| 1700 |        144612.1 |  145190 |
| 1800 |        154319.4 |  154010 |
| 1900 |        164091.5 |  163810 |
| 2000 |        173925.1 |  173890 |
|------+-----------------+---------|

The maximum difference between the quicksort and prime columns is about 4%. In the latter half of the table, the maximum error is about 0.4%.

What’s going on here? Why should quicksort be related to prime numbers?!

The real mystery is the prime number theorem, not quicksort. The prime number theorem tells us that the nth prime number is approximately n log n. And the number of operations in an efficient sort is proportional to n log n. The latter is easier to see than the former.

A lot of algorithms have run time proportional to n log n: mergesort, heapsort, FFT (Fast Fourier Transform), etc. All these have run time approximately proportional to the nth prime.

Now for the fine print. What exactly is the average run time for quicksort? It’s easy to say it’s O(n log n), but getting more specific requires making assumptions. I used as the average number of operations 11.67 n log n – 1.74 n based on Knuth’s TAOCP, Volume 3. And why 10 times the nth prime and not 11.67? I chose 10 to make the example work better. For very large values on n, a larger coefficient would work better.

 

Leading digits of powers of 2

The first digit of a power of 2 is a 1 more often than any other digit. Powers of 2 begin with 1 about 30% of the time. This is because powers of 2 follow Benford’s law. We’ll prove this below.

When is the first digit of 2n equal to k? When 2n is between k × 10p and (k+1) × 10p for some positive integer p. By taking logarithms base 10 we find that this is equivalent to the fractional part of n log102 being between log10 k and log10 (k + 1).

The map

x ↦ ( x + log10 2 ) mod 1

is ergodic. I wrote about irrational rotations a few weeks ago, and this is essentially the same thing. You could scale x by 2π and think of it as rotations on a circle instead of arithmetic mod 1 on an interval. The important thing is that log10 2 is irrational.

Repeatedly multiplying by 2 is corresponds to adding log10 2 on the log scale. So powers of two correspond to iterates of the map above, starting with x = 0. Birkhoff’s Ergodic Theorem tells us that the number of times iterates of this map fall in the interval [ab] equals b – a. So for k = 1, 2, 3, … 9, the proportion of powers of 2 start with k is equal to  log10 (k + 1) –  log10 (k) =  log10 ((k + 1) / k).

This is Benford’s law. In particular, the proportion of powers of 2 that begin with 1 is equal to  log10 (2) = 0.301.

Note that the only thing special about 2 is that log10 2 is irrational. Powers of 3 follow Benford’s law as well because log10 3 is also irrational. For what values of b do powers of b not follow Benford’s law? Those with log10 b rational, i.e. powers of 10. Obviously powers of 10 don’t follow Benford’s law because their first digit is always 1!

[Interpret the “!” above as factorial or exclamation as you wish.]

Let’s look at powers of 2 empirically to see Benford’s law in practice. Here’s a simple program to look at first digits of powers of 2.

count = [0]*10
N = 10000

def first_digit(n):
    return int(str(n)[0])

for i in range(N):
    n = first_digit( 2**i )
    count[n] += 1

print(count)

Unfortunately this only works for moderate values of N. It ran in under a second with N set to 10,000 but for larger values of N it rapidly becomes impractical.

Here’s a much more efficient version that ran in about 2 seconds with N = 1,000,000.

from math import log10
N = 1000000
count = [0]*10

def first_digit_2_exp_e(e):
    r = (log10(2.0)*e) % 1
    for i in range(2, 11):
        if r < log10(i):
            return i-1
for i in range(N):
    n = first_digit_2_exp_e( i )
    count[n] += 1

print(count)

You could make it more efficient by caching the values of log10 rather than recomputing them. This brought the run time down to about 1.4 seconds. That’s a nice improvement, but nothing like the orders of magnitude improvement from changing algorithms.

Here are the results comparing the actual counts to the predictions of Benford’s law (rounded to the nearest integer).

|---------------+--------+-----------|
| Leading digit | Actual | Predicted |
|---------------+--------+-----------|
|             1 | 301030 |    301030 |
|             2 | 176093 |    176091 |
|             3 | 124937 |    124939 |
|             4 |  96911 |     96910 |
|             5 |  79182 |     79181 |
|             6 |  66947 |     66948 |
|             7 |  57990 |     57992 |
|             8 |  51154 |     51153 |
|             9 |  45756 |     45757 |
|---------------+--------+-----------|

The agreement is almost too good to believe, never off by more than 2.

Are the results correct? The inefficient version relied on integer arithmetic and so would be exact. The efficient version relies on floating point and so it’s conceivable that limits of precision caused a leading digit to be calculated incorrectly, but I doubt that happened. Floating point is precise to about 15 significant figures. We start with log10(2), multiply it by numbers up to 1,000,000 and take the fractional part. The result is good to around 9 significant figures, enough to correctly calculate which log digits the result falls between.

Update: See Andrew Dalke’s Python script in the comments. He shows a way to efficiently use integer arithmetic.

Uniform distribution of powers mod 1

A few days ago I wrote about how powers of the golden ratio are nearly integers but powers of π are not. This post is similar but takes a little different perspective. Instead of looking at how close powers are to the nearest integers, we’ll look at how close they are to their floor, the largest integer below. Put another way, we’ll throw away the integer parts and look at the decimal parts.

First a theorem:

For almost all x > 1, the sequence (xn) for n = 1, 2, 3, … is u.d. mod 1. [1]

Here “almost all” is a technical term meaning that the set of x‘s for which the statement above does not hold has Lebesgue measure zero. The abbreviation “u.d.” stands for “uniformly distributed.” A sequence uniformly distributed mod 1 if the fractional parts of the sequence are distributed like uniform random variables.

Even though the statement holds for almost all x, it’s hard to prove for particular values of x. And it’s easy to find particular values of x for which the theorem does not hold.

From [1]:

… it is interesting to note that one does not know whether sequences such as (en), (πn), or even ((3/2)n) are u.d. mod 1 or not.

Obviously powers of integers are not u.d. mod 1 because their fractional parts are all 0. And we’ve shown before that powers of the golden ratio and powers of the plastic constant are near integers, i.e. their fractional parts cluster near 0 and 1.

The curious part about the quote above is that it’s not clear whether powers of 3/2 are uniformly distributed mod 1. I wouldn’t expect powers of any rational number to be u.d. mod 1. Either my intuition was wrong, or it’s right but hasn’t been proved, at least not when [1] was written.

The next post will look at powers of 3/2 mod 1 and whether they appear to be uniformly distributed.

* * *

[1] Kuipers and Niederreiter, Uniform Distribution of Sequences

Plastic powers

Last week I wrote a blog post showing that powers of the golden ratio are nearly integers. Specifically, the distance from φn to the nearest integer decreases exponentially as n increases. Several people pointed out that the golden constant is a Pisot number, the general class of numbers whose powers are exponentially close to integers.

The so-called plastic constant P is another Pisot number, in fact the smallest Pisot number. P is the real root of x3x – 1 = 0.

P = \frac{ (9 - \sqrt{69})^{1/3} + (9 + \sqrt{69})^{1/3} }{ 2^{1/3} \,\,\, 3^{2/3} } = 1.3247\ldots

Because P is a Pisot number, we know that its powers will be close to integers, just like powers of the golden ratio, but the way they approach integers is more interesting. The convergence is slower and less regular.

We will the first few powers of P, first looking at the distance to the nearest integer on a linear scale, then looking at the absolute value of the distance on a logarithmic scale.

distance from powers of plastic constant to nearest integer

distance from powers of plastic constant to nearest integer, log scale

As a reminder, here’s what the corresponding plots looked like for the golden ratio.

distance from powers of golden ratio to nearest integer

distance from powers of golden ratio to nearest integer, log scale