When I was preparing for a statistics class I’m teaching now, I wrote up some notes on the error in the central limit theorem (CLT) for a few common distributions. Under mild assumptions, the CLT says that if you take any distribution and average enough samples from it, the result is nearly a normal (Gaussian) distribution. The more samples you take, the closer the average is to being normal. That means you can use a normal distribution to approximate the distribution of an average of other distributions. That raises a couple questions.
- What can you say about the approximation error in general?
- What can you say about the approximation error in important special cases?
In other words, if I take some large but finite number of samples, can I get a numerical bound on the difference between the distribution of my average and the normal distribution? And if I’m not averaging just any old distributions but well-known distributions (binomial, Poisson, gamma, etc.) can I do better than in general?
The Berry-Esséen theorem answers the first question. If the distributions you’re averaging have a finite third absolute central moment ρ, then the maximum error when averaging n samples is bounded by C ρ / σ3 n1/2 where C is a constant less than 0.8 and σ is the standard deviation of the distributions.
There is a variation on the Berry-Esséen theorem that gives the error for a particular x rather than the maximum error. The error for a particular x is bounded by D ρ / (1 + |x|3) σ3 n1/2. The constant D is known to be less than 31. This gives an improvement over the maximum error estimates when x is large. However, this may not be so useful. The absolute error in the CLT approximation is small for large x, but only because we’re approximating one small probability by another. The relative error in the approximation may be enormous for large x.
I was primarily interested in the second question above, sharper error estimates for well-known distributions. I was surprised that I couldn’t find much written on the subject. There are some results along these lines, but apparently not many. According to one recent and rather large book on this subject, “no systematic studies along this direction seem to have been done.”
Here are a few pages I wrote about the errors in normal approximations with more emphasis on numerical examples rather than on theoretical error bounds. Here are the notes by distribution family.
Also, here are notes on applying the Berry-Esséen theorem to the normal approximation to the Poisson.