Central limit theorem and Runge phenomena

I was playing around with something this afternoon and stumbled on something like Gibbs phenomena or Runge phenomena for the Central Limit Theorem.

The first place most people encounter Gibbs phenomenon is in Fourier series for a step function. The Fourier series develops “bat ears” near the discontinuity. Here’s an example I blogged about before not with Fourier series but with analogous Chebyshev series.

Gibbs phenomena for Chebyshev interpolation

The series converges rapidly in the middle of the flat parts, but under-shoots and over-shoots near the jumps in the step function.

Runge phenomena is similar, where interpolating functions under- and over-shoot the function they’re approximating.

Runge example

Both plots above come from this post.

Here’s the example I ran across with the central limit theorem. The distribution of the average of a set of exponential random variables converges to the distribution of a normal random variable. The nice thing about the exponential distribution is that the averages have a familiar distribution: a gamma distribution. If each exponential has mean 1, the average has a gamma distribution with shape N and scale 1/N. The central limit theorem says this is converging in distribution to a normal distribution with the same mean and variance.

The plot below shows the difference between the density function of the average of N exponential random variables and the density function for its normal approximation, for N = 10 and for N = 400.

Notice that the orange line, corresponding to N = 400, is very flat, most of the time. That is, the normal approximation fits very well. But there’s this spike in the middle, something reminiscent of Gibbs phenomena or Runge phenomena. Going from 10 to 400 samples the average error decreases quite a bit, but the maximum error doesn’t go down much at all.

If you go back and look at the Central Limit Theorem, or its more quantitative counterpart the Berry-Esseen theorem, you’ll notice that it applies to the distribution function, not the density, or in other words, the CDF, not the PDF. I think the density functions do converge in this case because the exponential function has a smooth density, but the rate of convergence depends on the norm. It looks like the convergence is fast in square (L²) norm, but slow in sup norm. A little experiment shows that this is indeed the case.

Maybe the max norm doesn’t converge at all, i.e. the densities don’t converge pointwise. It looks like the max norm may be headed toward a horizontal asymptote, just like Gibbs phenomena.

Update: It seems we do not have uniform convergence. If we let N = 1,000,000, the sup norm of the error 0.1836. It appears the sup norm of the error is approaching a lower bound of approximately this value.

Related posts

Computing extreme normal tail probabilities

Let me say up front that relying on the normal distribution as an accurate model of extreme events is foolish under most circumstances. The main reason to calculate the probability of, say, a 40 sigma event is to show how absurd it is to talk about 40 sigma events. See my previous post on six-sigma events for an explanation.

For this post I’ll be looking at two-tailed events, the probability that a normal random variable is either less than −kσ or greater than kσ. If you’re only interested in one of these two probabilities, divide by 2. Also, since the results are independent of σ, let’s assume σ = 1 for convenience.

The following Python code will print a table of k-sigma event probabilities.

    from scipy.stats import norm

    for k in range(1, 40):
        print(k, 2*norm.cdf(-k))

This shows, for example, that a “25 sigma event,” something I’ve heard people talk about with a straight face, has a probability of 6 × 10−138.

The code above reports that a 38 or 39 sigma event has probability 0, exactly 0. That’s because the actual value, while not zero, is so close to zero that floating point precision can’t tell the difference. This happens around 10−308.

What if, despite all the signs warning hic sunt dracones you want to compute even smaller probabilities? Then for one thing you’ll need to switch over to log scale in order for the results to be representable as floating point numbers.

Exactly computing these extreme probabilities is challenging, but there are convenient upper and lower bounds on the probabilities that can be derived from Abramowitz and Stegun, equation 7.1.13 with a change of variables. See notes here.

We can use these bounds to compute upper and lower bounds on the base 10 logs of the tail probabilities.

    from scipy import log, sqrt, pi

    def core(t, c):
        x = 2*sqrt(2/pi)/(t + sqrt(t**2 + c))
        ln_p = -0.5*t**2 + log(x)
        return ln_p/log(10)

    def log10_upper(t):
        return core(t, 8/pi)

    def log10_lower(t):
        return core(t, 4)

This tells us that the log base 10 of the probability of a normal random variable being more than 38 standard deviations away from its mean is between −315.53968 and −315.53979. The upper and lower bounds agree to about seven significant figures, and the accuracy only improves as k gets larger. So for large arguments, we can use either the upper or lower bound as an accurate approximation.

The code above was used to compute this table of tail probabilities for k = 1 to 100 standard deviations.