Polynomials evaluated at integers

Let p(x) = a0 + a1x + a2x2 + … + anxn and suppose at least one of the coefficients ai is irrational for some i ≥ 1. Then a theorem by Weyl says that the fractional parts of p(n) are equidistributed as n varies over the integers. That is, the proportion of values that land in some interval is equal to the length of that interval.

Clearly it’s necessary that one of the coefficients be irrational. What may be surprising is that it is sufficient.

If the coefficients are all rational with common denominator N, then the sequence would only contain multiples of 1/N. The interval [1/3N, 2/3N], for example, would never get a sample. If a0 were irrational but the rest of the coefficients were rational, we’d have the same situation, simply shifted by a0.

This is a theorem about what happens in the limit, but we can look at what happens for some large but finite set of terms. And we can use a χ2 test to see how evenly our sequence is compared to what one would expect from a random sequence.

In the code below, we use the polynomial p(x) = √2 x² + πx + 1 and evaluate p at 0, 1, 2, …, 99,999. We then count how many fall in the bins [0, 0.01), [0.01, 0.02), … [0.99, 1] and compute a chi-square statistic on the counts.

from math import pi, sqrt, floor

def p(x):
    return 1 + pi*x + sqrt(2)*x*x

def chisq_stat(O, E):
    return sum( [(o - e)**2/e for (o, e) in zip(O, E)] )

def frac(x):
    return x - floor(x)

N = 100000
data = [frac(p(n)) for n in range(N)]

count = [0]*100
for d in data:
    count[ int(floor(100*d)) ] += 1

expected = [N/100]*100

print(chisq_stat(count, expected))

We get a chi-square statistic of 95.4. Since we have 100 bins, there are 99 degrees of freedom, and we should compare our statistic to a chi-square distribution with 99 degrees of freedom. Such a distribution has mean 99 and standard deviation √(99*2) = 14.07, so a value of 95.4 is completely unremarkable.

If we had gotten a very large chi-square statistic, say 200, we’d have reason to suspect something was wrong. Maybe a misunderstanding on our part or a bug in our software. Or maybe the sequence was not as uniformly distributed as we expected.

If we had gotten a very small chi-square statistic, say 10, then again maybe we misunderstood something, or maybe our sequence is remarkably evenly distributed, more evenly than one would expect from a random sequence.

Related posts:

Leading digits of powers of 2

The first digit of a power of 2 is a 1 more often than any other digit. Powers of 2 begin with 1 about 30% of the time. This is because powers of 2 follow Benford’s law. We’ll prove this below.

When is the first digit of 2n equal to k? When 2n is between k × 10p and (k+1) × 10p for some positive integer p. By taking logarithms base 10 we find that this is equivalent to the fractional part of n log102 being between log10 k and log10 (k + 1).

The map

x ↦ ( x + log10 2 ) mod 1

is ergodic. I wrote about irrational rotations a few weeks ago, and this is essentially the same thing. You could scale x by 2π and think of it as rotations on a circle instead of arithmetic mod 1 on an interval. The important thing is that log10 2 is irrational.

Repeatedly multiplying by 2 is corresponds to adding log10 2 on the log scale. So powers of two correspond to iterates of the map above, starting with x = 0. Birkhoff’s Ergodic Theorem tells us that the number of times iterates of this map fall in the interval [ab] equals b – a. So for k = 1, 2, 3, … 9, the proportion of powers of 2 start with k is equal to  log10 (k + 1) –  log10 (k) =  log10 ((k + 1) / k).

This is Benford’s law. In particular, the proportion of powers of 2 that begin with 1 is equal to  log10 (2) = 0.301.

Note that the only thing special about 2 is that log10 2 is irrational. Powers of 3 follow Benford’s law as well because log10 3 is also irrational. For what values of b do powers of b not follow Benford’s law? Those with log10 b rational, i.e. powers of 10. Obviously powers of 10 don’t follow Benford’s law because their first digit is always 1!

[Interpret the “!” above as factorial or exclamation as you wish.]

Let’s look at powers of 2 empirically to see Benford’s law in practice. Here’s a simple program to look at first digits of powers of 2.

count = [0]*10
N = 10000

def first_digit(n):
    return int(str(n)[0])

for i in range(N):
    n = first_digit( 2**i )
    count[n] += 1

print(count)

Unfortunately this only works for moderate values of N. It ran in under a second with N set to 10,000 but for larger values of N it rapidly becomes impractical.

Here’s a much more efficient version that ran in about 2 seconds with N = 1,000,000.

from math import log10
N = 1000000
count = [0]*10

def first_digit_2_exp_e(e):
    r = (log10(2.0)*e) % 1
    for i in range(2, 11):
        if r < log10(i):
            return i-1
for i in range(N):
    n = first_digit_2_exp_e( i )
    count[n] += 1

print(count)

You could make it more efficient by caching the values of log10 rather than recomputing them. This brought the run time down to about 1.4 seconds. That’s a nice improvement, but nothing like the orders of magnitude improvement from changing algorithms.

Here are the results comparing the actual counts to the predictions of Benford’s law (rounded to the nearest integer).

|---------------+--------+-----------|
| Leading digit | Actual | Predicted |
|---------------+--------+-----------|
|             1 | 301030 |    301030 |
|             2 | 176093 |    176091 |
|             3 | 124937 |    124939 |
|             4 |  96911 |     96910 |
|             5 |  79182 |     79181 |
|             6 |  66947 |     66948 |
|             7 |  57990 |     57992 |
|             8 |  51154 |     51153 |
|             9 |  45756 |     45757 |
|---------------+--------+-----------|

The agreement is almost too good to believe, never off by more than 2.

Are the results correct? The inefficient version relied on integer arithmetic and so would be exact. The efficient version relies on floating point and so it’s conceivable that limits of precision caused a leading digit to be calculated incorrectly, but I doubt that happened. Floating point is precise to about 15 significant figures. We start with log10(2), multiply it by numbers up to 1,000,000 and take the fractional part. The result is good to around 9 significant figures, enough to correctly calculate which log digits the result falls between.

Update: See Andrew Dalke’s Python script in the comments. He shows a way to efficiently use integer arithmetic.

Fractional parts, invariant measures, and simulation

A function f: XX is measure-preserving if for each iteration of f sends the same amount of stuff into a given set. To be more precise, given a measure μ and any μ-measurable set E with μ(E) > 0, we have

μ( ) = μ( f –1(E) ).

You can read the right side of the equation above as “the measure of the set of points that f maps into E.” You can apply this condition repeatedly to see that the measure of the set of points mapped into E after n iterations is still just the measure of E.

If X is a probability space, i.e. μ( ) = 1, then you could interpret the definition of measure-preserving to mean that the probability that a point ends up in E after n iterations is independent of n. We’ll illustrate this below with a simulation.

Let X be the half-open unit interval (0, 1] and let f be the Gauss map, i.e.

f(x) = 1/x – [1/x]

where [z] is the integer part of z. The function f is measure-preserving, though not for the usual Lebesgue measure. Instead it preserves the following measure:

\mu(E) = \frac{1}{\log 2} \int_E \frac{1}{1+x} \, dx

Let’s take as our set E an interval [a, b] and test via simulation whether the probability of landing in E after n iterations really is just the measure of E, independent of n.

We can’t just generate points uniformly in the interval (0, 1]. We have to generate the points so that the probability of a point coming from a set E is μ(E). That means the PDF of the distribution must be p(x) = 1 / (log(2) (1 + x)). We use the inverse-CDF method to generate points with this density.

from math import log, floor
from random import random

def gauss_map(x):
    y = 1.0/x
    return y - floor(y)

# iterate gauss map n times
def iterated_gauss_map(x, n):
    while n > 0:
        x = gauss_map(x)
        n = n - 1
    return x      

# measure mu( [a,b] )
def mu(a, b):
    return (log(1.0+b) - log(1.0+a))/log(2.0)

# Generate samples with Prob(x in E) = mu( E )
def sample():
    u = random()
    return 2.0**u - 1.0

def simulate(num_points, num_iterations, a, b):
    count = 0
    for _ in range(num_points):
        x = sample()
        y = iterated_gauss_map(x, num_iterations)
        if a < y < b:
            count += 1
    return count / num_points

# Change these parameters however you like
a, b = 0.1, 0.25
N = 1000000

exact = mu(a, b)
print("Exact probability:", exact)
print("Simulated probability after n iterations")
for n in range(1, 10):
    simulated = simulate(N, n, a, b)
    print("n =", n, ":", simulated)

Here’s the output I got:

Exact probability: 0.18442457113742736
Simulated probability after n iterations
n = 1 : 0.184329
n = 2 : 0.183969
n = 3 : 0.184233
n = 4 : 0.184322
n = 5 : 0.184439
n = 6 : 0.184059
n = 7 : 0.184602
n = 8 : 0.183877
n = 9 : 0.184834

With 1,000,000 samples, we expect the results to be the same to about 3 decimal places, which is what we see above.

Related post: Irrational rotations are ergodic.

(A transformation f  is ergodic if it is measure preserving and the only sets E with  f –1(E)  = E are those with measure 0 or full measure. Rational rotations are measure-preserving but not ergodic. The Gauss map above is ergodic.)

Gaussian correlation inequality

The Gaussian correlation inequality was proven in 2014, but the proof only became widely known this year. You can find Thomas Royan’s remarkably short proof here.

Let X be a multivariate Gaussian random variable with mean zero and let E and F be two symmetric convex sets, both centered at the origin. The Gaussian correlation inequality says that

Prob(in E and F) ≥ Prob(X in E) Prob(X in F).

Here’s a bit of Python code for illustrating the inequality. For symmetric convex sets we take balls of p-norm r where p ≥ 1 and r > 0. We could, for example, set one of the values of p to 1 to get a cube and set the other p to 2 to get a Euclidean ball.

from scipy.stats import norm as gaussian

def pnorm(v, p):
    return sum( [abs(x)**p for x in v] )**(1./p)

def simulate(dim, r1, p1, r2, p2, numReps):
    count_1, count_2, count_both = (0, 0, 0)
    for _ in range(numReps):
        x = gaussian.rvs(0, 1, dim)
        in_1 = (pnorm(x, p1) < r1)
        in_2 = (pnorm(x, p2) < r2)
        if in_1:
            count_1 += 1
        if in_2:
            count_2 += 1
        if in_1 and in_2:
            count_both += 1
    print("Prob in both:", count_both / numReps)
    print("Lower bound: ", count_1*count_2 * numReps**-2)


simulate(3, 1, 2, 1, 1, 1000)

When numReps is large, we expect the simulated probability of the intersection to be greater than the simulated lower bound. In the example above, the former was 0.075 and the latter 0.015075, ordered as we’d expect.

If we didn’t know that the theorem has been proven, we could use code like this to try to find counterexamples. Of course a simulation cannot prove or disprove a theorem, but if we found what appeared to be a counterexample, we could see whether it persists with different random number generation seeds and with a large value of  numReps. If so, then we could try to establish the inequality analytically. Now that the theorem has been proven we know that we’re not going to find real counterexamples, but the code is only useful as an illustration.

Related posts:

Irrational rotations are ergodic

In a blog post yesterday, I mentioned that the golden angle is an irrational portion of a circle, and so a sequence of rotations by the golden angle will not repeat itself. We can say more: rotations by an irrational portion of a circle are ergodic. Roughly speaking, this means that not only does the sequence not repeat itself, the sequence “mixes well” in a technical sense.

Ergodic functions have the property that “the time average equals the space average.” We’ll unpack what that means and illustrate it by simulation.

Suppose we pick a starting point x on the circle then repeatedly rotate it by a golden angle. Take an integrable function f on the circle and form the average of its values at the sequence of rotations. This is the time average. The space average is the integral of f over the circle, divided by the circumference of the circle. The ergodic theorem says that the time average equals the space average, except possibly for a setting of starting values of measure zero.

More generally, let X be a measure space (like the unit circle) with measure μ let T be an ergodic transformation (like rotating by a golden angle), Then for almost all starting values x we have the following:

\lim_{n\to \infty} \frac{1}{n} \sum_{k=0}^{n-1} f(T^k x) = \frac{1}{\mu(X)} \int_X f\, d\mu

Let’s do a simulation to see this in practice by running the following Python script.

        from scipy import pi, cos
        from scipy.constants import golden
        from scipy.integrate import quad
        
        golden_angle = 2*pi*golden**-2
        
        def T(x):
            return (x + golden_angle) % (2*pi)
        
        def time_average(x, f, T, n):
            s = 0
            for k in range(n):
                s += f(x)
                x = T(x)
            return s/n
        
        def space_average(f):
            integral = quad(f, 0, 2*pi)[0]
            return integral / (2*pi)
        
        f = lambda x: cos(x)**2
        N = 1000000
        
        print( time_average(0, f, T, N) )
        print( space_average(f) )

In this case we get 0.49999996 for the time average, and 0.5 for the space average. They’re not the same, but we only used a finite value of n; we didn’t take a limit. We should expect the two values to be close because n is large, but we shouldn’t expect them to be equal.

Update: The code and results were updated to fix a bug pointed out in the comments below.  I had written ... % 2*pi when I should have written ... % (2*pi). I assumed the modulo operator was lower precedence than multiplication, but it’s not. It was a coincidence that the buggy code was fairly accurate.

A friend of mine, a programmer with decades of experience, recently made a similar error. He’s a Clojure fan but was writing in C or some similar language. He rightfully pointed out that this kind of error simply cannot happen in Clojure. Lisps, including Clojure, don’t have operator precedence because they don’t have operators. They only have functions, and the order in which functions are called is made explicit with parentheses. The Python code x % 2*pi corresponds to (* (mod x 2) pi) in Clojure, and the Python code x % (2*pi) corresponds to (mod x (* 2 pi)).

Related: Origin of the word “ergodic”

Volume of a rose-shaped torus

Start with a rose, as described in the previous post:

Now spin that rose around a vertical line a distance R from the center of the rose. This makes a torus (doughnut) shape whose cross sections look like the rose above. You could think of having a cutout shaped like the rose above and extruding Play-Doh through it, then joining the ends in a loop.

Volume of rotation from rose

In case you’re curious, the image above was created with the following Mathematica command:

      RevolutionPlot3D[{4 + Cos[5 t] Cos[t], Cos[5 t] Sin[t]}, {t, 0, 2 Pi}]

What would the volume of resulting solid be?

This sounds like a horrendous calculus homework problem, but it’s actually quite easy. A theorem by Pappus of Alexandria (c. 300 AD) says that the volume is equal to the area times the circumference of the circle traversed by the centroid.

The area of a rose of the form r = cos(kθ) is simply π/2 if k is even and π/4 if k is odd. (Recall from the previous post that the number of petals is 2k if k is even and k if k is odd.)

The location of the centroid is easy: it’s at the origin by symmetry. If we rotate the rose around a vertical line x = R then the centroid travels a distance 2πR.

So putting all the pieces together, the volume is π²R if k is even and half that if k is odd. (We assume R > 1 so that the figure doesn’t intersect itself and some volume get counted twice.)

We can also find the surface area easily by another theorem of Pappus. The surface area is just the arc length of the rose times the circumference of the circle traced out by the centroid. The previous post showed how to find the arc length, so just multiply that by 2πR.

Length of a rose

The polar graph of r = cos(kθ) is called a rose. If k is even, the curve will trace out 2k petals as θ runs between 0 and 2π. If k is odd, it will trace out k petals, tracing each one twice. For example, here’s a rose with k = 5.

(I rotated the graph 36° so it would be symmetric about the vertical axis rather than the horizontal axis.)

The arc length of a curve in polar coordinates is given by

\int_a^b \sqrt{r^2 + \left(\frac{dr}{d\theta}\right)^2}\, d\theta

and so we can use this find the length. The integral doesn’t have a closed form in terms of elementary functions. Instead, the result turns out to use a special function E(x), the “complete elliptic integral of the second kind,” defined by

E(m) = \int_0^{\pi/2} \sqrt{1 - m \sin^2 x} \, dx

Here’s the calculation for the length of a rose:

\int_0^{2\pi} \sqrt{r^2 + (r')^2}\, d\theta &=& \int_0^{2\pi} \sqrt{cos^2 k\theta + k^2 \sin^2 k\theta} \, d\theta \\ &=& \int_0^{2\pi} \sqrt{cos^2 u + k^2 \sin^2 u} \, du \\ &=& 4 \int_0^{\pi/2} \sqrt{cos^2 u + k^2 \sin^2 u} \, du \\ &=& 4 \int_0^{\pi/2} \sqrt{1 + (k^2-1) \sin^2 u} \, du \\ &=& 4 E(-k^2 + 1)

So the arc length of the rose r = cos(kθ) with θ running from 0 to 2π is 4 E(-k² + 1). You can calculate E in SciPy with scipy.special.ellipe.

If we compute the length of the rose at the top of the post, we get 4 E(-24) = 21.01. Does that pass the sniff test? Each petal goes from r = 0 out to r = 1 and back. If the petal were a straight line, this would have length 2. Since the petals are curved, the length of each is a little more than 2. There are five petals, so the result should be a little more than 10. But we got a little more than 20. How can that be? Since 5 is odd, the rose with k = 5 traces each petal twice, so we should expect a value of a little more than 20, which is what we got.

As k gets larger, the petals come closer to being straight lines. So we should expect that 4E(-k² + 1) approaches 4k as k gets large. The following plot of E(-k² + 1) – k provides empirical support for this conjecture by showing that the difference approaches 0, and gives an idea of the rate of convergence. It should be possible to prove that, say, that E(-k²) asymptotically approaches k, but I haven’t done this.

Related posts:

When length equals area

The graph of hyperbolic cosine is called a catenary. A catenary has the following curious property: the length of a catenary between two points equals the area under the catenary between those two points.

The proof is surprisingly simple. Start with the following:

\sqrt{1 +\left(\frac{d}{dx} \cosh(x) \right)^2} = \sqrt{1 + \sinh^2(x)} = \cosh(x)

Now integrate the first and last expressions between two points a and b. Note that the former integral gives the arc length of cosh between a and b, and the later integral gives the area under the graph of cosh between a and b.

By the way, the most famous catenary may be the Gateway Arch in St. Louis, Missouri.

Gateway Arch at night

Solving systems of polynomial equations

In a high school algebra class, you learn how to solve polynomial equations in one variable, and systems of linear equations. You might reasonably ask “So when do we combine these and learn to solve systems of polynomial equations?” The answer would be “Maybe years from now, but most likely never.” There are systematic ways to solve systems of polynomial equations, but you’re unlikely to ever see them unless you study algebraic geometry.

Here’s an example from [1]. Suppose you want to find the extreme values of x³ + 2xyzx² on the unit sphere using Lagrange multipliers. This leads to the following system of polynomial equations where λ is the Lagrange multiplier.

\begin{eqnarray*} 3x^2 + 2yz - 2x\lambda &=& 0 \\ 2xz - 2y\lambda &=& 0 \\ 2xy - 2z - 2z\lambda &=& 0 \\ x^2 + y^2 + z^2 - 1 &=& 0 \end{eqnarray*}

There’s no obvious way to go about solving this system of equations. However, there is a systematic way to approach this problem using a “lexicographic Gröbner basis.” This transforms the problem from into something that looks much worse but that is actually easier to work with. And most importantly, the transformation is algorithmic. It requires some computation—there are numerous software packages for doing this—but doesn’t require a flash of insight.

The transformed system looks intimidating compared to the original:

\begin{eqnarray*} \lambda - \frac{3}{2}x - \frac{3}{2}yz - \frac{167616}{3835}z^6 - \frac{36717}{590}z^4 - \frac{134419}{7670}z^2 &=& 0 \\ x^2 + y^2 + z^2 - 1 &=& 0 \\ xy - \frac{19584}{3835}z^5 + \frac{1999}{295}z^3 - \frac{6403}{3835}z &=& 0 \\ xz + yz^2 - \frac{1152}{3835}z^5 - \frac{108}{295}z^3 + \frac{2556}{3835}z &=& 0 \\ y^3 + yz^2 - y -\frac{9216}{3835}z^5 + \frac{906}{295}z^3 - \frac{2562}{3835}z &=& 0 \\ y^2z - \frac{6912}{3835}z^5 -\frac{827}{295}z^3 -\frac{3839}{3835}z &=& 0 \\ yz^3 - yz - \frac{576}{59}z^6 + \frac{1605}{118}z^4 - \frac{453}{118}z^2 &=& 0 \\ z^7 - \frac{1763}{1152}z^5 + \frac{655}{1152}z^3 - \frac{11}{288}z &=& 0 \end{eqnarray*}

We’ve gone from four equations to eight, from small integer coefficients to large fraction coefficients, from squares to seventh powers. And yet we’ve made progress because the four variables are less entangled in the new system.

The last equation involves only z and factors nicely:

z\left(z^2 - \frac{4}{9}\right) \left(z^2 - \frac{11}{128}\right) = 0

This cracks the problem wide open. We can easily find all the possible values of z, and once we substitute values for z, the rest of the equations simplify greatly and can be solved easily.

The key is that Gröbner bases transform our problem into a form that, although it appears more difficult, is easier to work with because the variables are somewhat separated. Solving one variable, z, is like pulling out a thread that then makes the rest of the threads easier to separate.

Related: The great reformulation of algebraic geometry

* * *

[1] David Cox et al. Applications of Computational Algebraic Geometry: American Mathematical Society Short Course January 6-7, 1997 San Diego, California (Proceedings of Symposia in Applied Mathematics)

Generating pink noise

Different colors of noise are named by analogy with colors of light. Pink noise is between white noise and red noise.

White noise has equal power at all frequencies, just as white light is a combination of all the frequencies of the visible spectrum. The components of red noise are weighted toward low frequencies, just as red light is at the low end of the visible spectrum. Pink noise is weighted toward low frequencies too, but not as strongly as read. Specifically, the power in red noise drops off like 1/f² where f is frequency. The power in pink noise drops off like 1/f.

Generating pink noise is more complicated than you might think. The book Creating Noise, by Stefan Hollos and J. Richard Hollos, has a good explanation and C source code for generating pink noise and variations such as 1/f α noise for 0 < α < 1. If you want even more background, check out Recursive Digital Filters by the same authors.

If you’d like to hear what pink noise sounds like, here’s a sample that was created using the software in the book with a 6th order filter.

(Download)

Related posts:

The 3n+1 problem and Benford’s law

This is the third, and last, of a series of posts on Benford’s law, this time looking at a famous open problem in computer science, the 3n + 1 problem, also known as the Collatz conjecture.

Start with a positive integer n. Compute 3n + 1 and divide by 2 repeatedly until you get an odd number. Then repeat the process. For example, suppose we start with 13. We get 3*13+1 = 40, and 40/8 = 5, so 5 is the next term in the sequence. 5*3 + 1 is 16, which is a power of 2, so we get down to 1.

Does this process always reach 1? So far nobody has found a proof or a counterexample.

If you pick a large starting number n at random, it appears that not only will the sequence terminate, the values produced by the sequence approximately follow Benford’s law (source). If you’re unfamiliar with Benford’s law, please see the first post in this series.

Here’s some Python code to play with this.

from math import log10, floor

def leading_digit(x):
    y = log10(x) % 1
    return int(floor(10**y))

# 3n+1 iteration
def iterates(seed):
    s = set()
    n = seed
    while n > 1:
        n = 3*n + 1
        while n % 2 == 0:
            n = n / 2
        s.add(n)
    return s

Let’s save the iterates starting with a large starting value:

it = iterates(378357768968665902923668054558637)

Here’s what we get and what we would expect from Benford’s law:

|---------------+----------+-----------|
| Leading digit | Observed | Predicted |
|---------------+----------+-----------|
|             1 |       46 |        53 |
|             2 |       26 |        31 |
|             3 |       29 |        22 |
|             4 |       16 |        17 |
|             5 |       24 |        14 |
|             6 |        8 |        12 |
|             7 |       12 |        10 |
|             8 |        9 |         9 |
|             9 |        7 |         8 |
|---------------+----------+-----------|

We get a chi-square of 12.88 (p = 0.116) and so we get a reasonable fit.

Here’s another run with a different starting point.

it = iterates(243963882982396137355964322146256)

which produces

|---------------+----------+-----------|
| Leading digit | Observed | Predicted |
|---------------+----------+-----------|
|             1 |       44 |        41 |
|             2 |       22 |        24 |
|             3 |       15 |        17 |
|             4 |       12 |        13 |
|             5 |       11 |        11 |
|             6 |        9 |         9 |
|             7 |       11 |         8 |
|             8 |        6 |         7 |
|             9 |        7 |         6 |
|---------------+----------+-----------|

This has a chi-square value of 2.166 (p = 0.975) which is an even better fit.

Weibull distribution and Benford’s law

Introduction to Benford’s law

In 1881, Simon Newcomb noticed that the edges of the first pages in a book of logarithms were dirty while the edges of the later pages were clean. From this he concluded that people were far more likely to look up the logarithms of numbers with leading digit 1 than of those with leading digit 9. Frank Benford studied the same phenomena later and now the phenomena is known as Benford’s law, or sometime the Newcomb-Benford law.

A data set follows Benford’s law if the proportion of elements with leading digit d is approximately

log10((d + 1)/d).

You could replace “10” with b if you look at the leading digits in base b.

Sets of physical constants often satisfy Benford’s law, as I showed here for the constants defined in SciPy.

Incidentally, factorials satisfy Benford’s law exactly in the limit.

Weibull distributions

The Weibull distribution is a generalization of the exponential distribution. It’s a convenient distribution for survival analysis because it can have decreasing, constant, or increasing hazard, depending on whether the value of a shape parameter γ is less than, equal to, or greater than 1 respectively. The special case of constant hazard, shape γ = 1, corresponds to the exponential distribution.

Weibull and Benford

If the shape parameter of a Weibull distributions is “not too large” then samples from that distribution approximately follow Benford’s law (source). We’ll explore this statement with a little Python code.

SciPy doesn’t contain a Weibull distribution per se, but it does have support for a generalization of the Weibull known as the exponential Weibull. The latter has two shape parameters. We set the first of these to 1 to get the ordinary Weibull distribution.

      from math import log10, floor
      from scipy.stats import exponweib

      def leading_digit(x):
          y = log10(x) % 1
          return int(floor(10**y))

      def weibull_stats(gamma):

          distribution = exponweib(1, gamma)
          N = 10000
          samples = distribution.rvs(N)
          counts = [0]*10
          for s in samples:
              counts[ leading_digit(s) ] += 1

      print (counts)

Here’s how the leading digit distribution of a simulation of 10,000 samples from an exponential (Weibull with γ = 1) compares to the distribution predicted by Benford’s law.

      |---------------+----------+-----------|
      | Leading digit | Observed | Predicted |
      |---------------+----------+-----------|
      |             1 |     3286 |      3010 |
      |             2 |     1792 |      1761 |
      |             3 |     1158 |      1249 |
      |             4 |      851 |       969 |
      |             5 |      754 |       792 |
      |             6 |      624 |       669 |
      |             7 |      534 |       580 |
      |             8 |      508 |       511 |
      |             9 |      493 |       458 |
      |---------------+----------+-----------|

Looks like a fairly good fit. How could we quantify the fit so we can compare how the fit varies with the shape parameter? The most common approach is to use the chi-square goodness of fit test.

      def chisq_stat(O, E):
          return sum( [(o - e)**2/e for (o, e) in zip(O, E)] )

Here “O” stands for “observed” and “E” stands for “expected.” The observed counts are the counts we actually saw. The expected values are the values Benford’s law would predict:

      expected = [N*log10((i+1)/i) for i in range(1, 10)] 

Note that we don’t want to pass counts to chisq_stat but counts[1:] instead. This is because counts starts with 0 index, but leading digits can’t be 0 for positive samples.

Here are the chi square goodness of fit statistics for a few values of γ. (Smaller is better.)

      |-------+------------|
      | Shape | Chi-square |
      |-------+------------|
      |   0.1 |      1.415 |
      |   0.5 |      9.078 |
      |   1.0 |     69.776 |
      |   1.5 |    769.216 |
      |   2.0 |   1873.242 |
      |-------+------------|

This suggests that samples from a Weibull follow Benford’s law fairly well for shape γ < 1, i.e. for the case of decreasing hazard.

Click to learn more about Bayesian statistics consulting

Related posts

Golden angle

The golden angle is related to the golden ratio, but it is not as well known. And the relationship is not quite what you might think at first.

The golden ratio φ is (1 + √5)/2. A golden rectangle is one in which the ratio of the longer side to the shorter side is φ. Credit cards, for example, are typically golden rectangles.

You might guess that a golden angle is 1/φ of a circle, but it’s actually 1/φ2 of a circle. Let a be the length of an arc cut out of a circle by a golden angle and b be the length of its complement. Then by definition the ratio of b to a is φ. In other words, the golden angle is defined in terms of the ratio of its complementary arc, not of the entire circle. [1]

The video below has many references to the golden angle. It says that the golden angle is 137.5 degrees, which is fine given the context of a popular video. But this doesn’t explain where the angle comes from or give its exact value of 360/φ2 degrees.

[1] Why does this work out to 1/φ2? The ratio b/a equals φ, by definition. So the ratio of a to the whole circle is

a/(a + b) = a/(a + φa) = 1/(1 + φ) = 1/φ2

since φ satisfies the quadratic equation 1 + φ = φ2.