Let *p*(*x*) = *a*_{0} + *a*_{1}*x* + *a*_{2}*x*^{2} + … + *a*_{n}*x*^{n} and suppose at least one of the coefficients *a*_{i} is irrational for some *i* ≥ 1. Then a theorem by Weyl says that the fractional parts of *p*(*n*) are equidistributed as *n* varies over the integers. That is, the proportion of values that land in some interval is equal to the length of that interval.

Clearly it’s *necessary* that one of the coefficients be irrational. What may be surprising is that it is *sufficient*.

If the coefficients are all rational with common denominator *N*, then the sequence would only contain multiples of 1/*N*. The interval [1/3*N*, 2/3*N*], for example, would never get a sample. If *a*_{0} were irrational but the rest of the coefficients were rational, we’d have the same situation, simply shifted by *a*_{0}.

This is a theorem about what happens in the limit, but we can look at what happens for some large but finite set of terms. And we can use a χ^{2} test to see how evenly our sequence is compared to what one would expect from a random sequence.

In the Python code below, we use the polynomial *p*(*x*) = √2 *x*² + π*x* + 1 and evaluate *p* at 0, 1, 2, …, 99,999. We then count how many fall in the bins [0, 0.01), [0.01, 0.02), … [0.99, 1] and compute a chi-square statistic on the counts.

from math import pi, sqrt, floor def p(x): return 1 + pi*x + sqrt(2)*x*x def chisq_stat(O, E): return sum( (o - e)**2/e for (o, e) in zip(O, E) ) def frac(x): return x - floor(x) N = 100000 data = [frac(p(n)) for n in range(N)] count = [0]*100 for d in data: count[ int(floor(100*d)) ] += 1 expected = [N/100]*100 print(chisq_stat(count, expected))

We get a chi-square statistic of 95.4. Since we have 100 bins, there are 99 degrees of freedom, and we should compare our statistic to a chi-square distribution with 99 degrees of freedom. Such a distribution has mean 99 and standard deviation √(99*2) = 14.07, so a value of 95.4 is completely unremarkable.

If we had gotten a very large chi-square statistic, say 200, we’d have reason to suspect something was wrong. Maybe a misunderstanding on our part or a bug in our software. Or maybe the sequence was not as uniformly distributed as we expected.

If we had gotten a very small chi-square statistic, say 10, then again maybe we misunderstood something, or maybe our sequence is remarkably evenly distributed, more evenly than one would expect from a random sequence.

Thanks for the nice post, John!

Any significance of the chi-square test showing up middle of nowhere?

Why can’t we simply compare the 99 fractions directly as per law of large numbers?

The chi square test gives us a way to assess the deviations from perfect uniformity: Are they larger or smaller than we’d expect from uniform random samples?

Thanks for the post. The use of n for both a variable and the polynomial’s degree is a bit confusing though.