When I was preparing for a statistics class I’m teaching now, I wrote up some notes on the error in the central limit theorem (CLT) for a few common distributions. Under mild assumptions, the CLT says that if you take any distribution and average enough samples from it, the result is nearly a normal (Gaussian) distribution. The more samples you take, the closer the average is to being normal. That means you can use a normal distribution to approximate the distribution of an average of other distributions. That raises a couple questions.

- What can you say about the approximation error in general?
- What can you say about the approximation error in important special cases?

In other words, if I take some large but finite number of samples, can I get a numerical bound on the difference between the distribution of my average and the normal distribution? And if I’m not averaging just any old distributions but well-known distributions (binomial, Poisson, gamma, etc.) can I do better than in general?

The Berry-Esséen theorem answers the first question. If the distributions you’re averaging have a finite third absolute central moment ρ, then the maximum error when averaging n samples is bounded by *C* ρ / σ^{3} *n*^{1/2} where *C* is a constant less than 0.8 and σ is the standard deviation of the distributions.

There is a variation on the Berry-Esséen theorem that gives the error for a particular x rather than the maximum error. The error for a particular x is bounded by *D* ρ / (1 + |*x*|^{3}) σ^{3} *n*^{1/2}. The constant *D* is known to be less than 31. This gives an improvement over the maximum error estimates when x is large. However, this may not be so useful. The *absolute *error in the CLT approximation is small for large x, but only because we’re approximating one small probability by another. The relative error in the approximation may be enormous for large x.

I was primarily interested in the second question above, sharper error estimates for well-known distributions. I was surprised that I couldn’t find much written on the subject. There are some results along these lines, but apparently not many. According to one recent and rather large book on this subject, “no systematic studies along this direction seem to have been done.”

Here are a few pages I wrote about the errors in normal approximations with more emphasis on numerical examples rather than on theoretical error bounds. Here are the notes by distribution family.

Also, here are notes on applying the Berry-Esséen theorem to the normal approximation to the Poisson.

The links to the distribution families don’t work. These seem to be the correct ones:

beta

binomial

gamma

Poisson

t

I am looking for a refinement of the Berry-Esseen constant when the initial distribution has some “good” properties. In my specific case, I’d like to use the Berry-Esseen theorem with a histogram distribution. Using the standard approximation (C=0.75) yields a relative error of 0.05 ; while a simulation shows a relative error of less than 0.001. Are you aware of such a result ? Thanks in advance.

Patrick: I’m not aware of a result like you’re looking for. I suspect you may not find one. The Berry-Esseén theorem is very general, and so it’s often pessimistic in specific applications, and apparently not too much has been published for more particular cases. You might try the book I linked to above. Maybe it would point you to a useful reference. Sorry I couldn’t be more help.

Hi John, You have a very useful site! Thanks for your effort.

I was wondering if you have a simple worked out example of the Berry-Esseen method? I’m not a mathematical, but need a method for estimating convergence of distributions of random variables with the CLT.

Any example is good, as long as it allows me to easily understand what happening. For example rolling a dice n time with CLT, and then finding the convergence rate.

Thanks!

Hi John,

Thanks for such an interesting blog.

I have a question about the characteristic functions of the distributions. Since we know that the convergence of the averages to the normal distribution also implies the convergence of their characteristic functions to the characteristic function of the normal distribution, is there a known bound for the error of characteristic functions? I guess that bound should also include the variable of the characteristic function.

Thanks.