Quantifying the error in the central limit theorem

by John on September 30, 2008

When I was preparing for a statistics class I’m teaching now, I wrote up some notes on the error in the central limit theorem (CLT) for a few common distributions. Under mild assumptions, the CLT says that if you take any distribution and average enough samples from it, the result is nearly a normal (Gaussian) distribution. The more samples you take, the closer the average is to being normal. That means you can use a normal distribution to approximate the distribution of an average of other distributions. That begs a couple questions.

  1. What can you say about the approximation error in general?
  2. What can you say about the approximation error in important special cases?

In other words, if I take some large but finite number of samples, can I get a numerical bound on the difference between the distribution of my average and the normal distribution? And if I’m not averaging just any old distributions but well-known distributions (binomial, Poisson, gamma, etc.) can I do better than in general?

The Berry-Esséen theorem answers the first question. If the distributions you’re averaging have a finite third central moment ρ, then the maximum error when averaging n samples is bounded by C ρ / σ3 n1/2 where C is a constant less than 0.8 and σ is the standard deviation of the distributions.

There is a variation on the Berry-Esséen theorem that gives the error for a particular x rather than the maximum error. The error for a particular x is bounded by D ρ / (1 + |x|3) σ3 n1/2. The constant D is known to be less than 31.  This gives an improvement over the maximum error estimates when x is large. However, this may not be so useful. The absolute error in the CLT approximation is small for large x, but only because we’re approximating one small probability by another. The relative error in the approximation may be enormous for large x.

I was primarily interested in the second question above, sharper error estimates for well-known distributions. I was surprised that I couldn’t find much written on the subject. There are some results along these lines, but apparently not many. According to one recent and rather large book on this subject, “no systematic studies along this direction seem to have been done.”

Here are a few pages I wrote about the errors in normal approximations with more emphasis on numerical examples rather than on theoretical error bounds. Here are the notes by distribution family.

beta
binomial
gamma
Poisson
Student-t

Also, here are notes on applying the Berry-Esséen theorem to the normal approximation to the Poisson.

{ 1 trackback }

The 41st Carnival of Mathematics « 360
10.10.08 at 17:31

{ 1 comment… read it below or add one }

1

Peter 10.01.08 at 02:49

The links to the distribution families don’t work. These seem to be the correct ones:
beta
binomial
gamma
Poisson
t

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>