Central Limit Theorems

These notes summarize several extensions of the Central Limit Theorem (CLT) and related results.



Classical Central Limit Theorem

Let Xn be a sequence of independent, identically distributed (i.i.d.) random variables. Assume each X has finite mean, E(X) = μ, and finite variance, Var(X) = σ2. Let Zn be the normalized average of the first n random variables, i.e.

Zn = (X1 + X2 + … + Xnnμ)/ σ √ n.

The classical Central Limit Theorem says that Zn converges in distribution to a standard normal distribution. This means that the CDF of Zn converges pointwise to Φ, the CDF of a standard normal (Gaussian) random variable. (See notes on modes of convergence.)

A special case of the CLT in which the Xn are assumed to be binomial goes back to Abraham de Moivre in 1733.


Rate of convergence

It is natural to ask about the rate of convergence in the CLT. If Fn is the CDF of Zn, once we know that Fn(x) converges to Φ(x) as n → ∞, we might want to know how quickly this convergence takes place. Said another way, for a given n, we might want to know how well Φ approximates Fn. This question is settled by the Berry-Esséen theorem. See Quantifying the error in the central limit theorem. For examples of normal approximations for specific distributions, see the following links: binomial, beta, gamma, Poisson, Student-t.


Directions for generalization

The classical CLT has three requirements:

  1. independence,
  2. identical distribution, and
  3. finite variance.

Each of these conditions can be weakened to create variations on the central limit theorem. We will keep the assumption of independence in these notes. For CLT results for dependent random variables, see Chow and Teicher. Below we consider non-identically distributed random variables and random variables with infinite variance.


Non-identically distributed random variables

In this section we allow the possibility that the Xn variables are not identically distributed. The main results in this are the Lindeberg-Feller theorem and its corollary Liapounov’s theorem.

First we introduce notation and assumptions common to both theorems. Let Xn be a sequence of independent random variables, at least one of which has a non-degenerate distribution. Assume each Xn has mean 0 and variance σn2. Define the partial sum

Sn = X1 + X2 + … + Xn

and its variance

sn2 = σ12 + σ22 + … + σn2.

Both theorems concern under what circumstances the normalized partial sums Sn / sn converge in distribution to a standard normal random variable. We start with Liapounov’s theorem because it is simpler.


Liapounov’s theorem

Liapounov’s theorem weakens the requirement of identical distribution but strengthens the requirement of finite variance. Where the classical CLT requires finite moments of order 2, Liapounov’s CLT requires finite moments of order 2 + δ for some δ > 0.

Assume E(|Xn|2+δ) is bounded for some δ > 0 and for all n. If

sn−1 − δ/21 ≤ kn E(|Xn|2 + δ) → 0

as n → ∞ then Sn / sn converges in distribution to a standard normal random variable.


Lindeberg-Feller theorem

The Lindeberg-Feller theorem is more general than Liapounov’s theorem. It gives necessary and sufficient conditions for Sn / sn to converge to a standard normal.

Lindeberg: Under the assumptions above (each X has zero mean and finite variance, and at least one X has a non-degenerate distribution) then if the Lindeberg condition holds, Sn / sn converges in distribution to a standard normal random variable.

Feller: Conversely, if Sn / sn converges in distribution to a standard normal and σn/sn → 0 and sn → ∞ then the Lindeberg condition holds.

So what is this Lindeberg condition? Let Fn be the CDF of Xn, i.e. Fn(x) = P(Xn < x). The Lindeberg condition requires

\lim_{n\to\infty} \frac{1}{s_n^2} \sum_{j=1}^n \int_{[x > \varepsilon s_j]} x^2 dF_j(x) = 0

for all ε > 0.


Generalized CLT for random variables with infinite variance

For this section, we require the random variables Xn to be independent and identically distributed. However, we do not require that they have finite variance.

First we look at some restrictions for what a generalized CLT would look like for random variables Xn without finite variance. We would need sequences of constants an and bn such that (X1 + X2 + … + Xnbn)/an converges in distribution to something. It turns out that the something that the sequence converges to must have a stable distribution.

Let X0, X1, and X2 be independent, identically distributed (iid) random variables. The distribution of these random variables is called stable if for every pair of positive real numbers a and b, there exists a positive c and a real d such that cX0 + d has the same distribution as aX1 + bX2.

Stable distributions can be specified by four parameters. One of the four parameters is the exponent parameter 0 < α ≤ 2. This parameter is controls the thickness of the distribution tails. The distributions with α = 2 are the normal (Gaussian) distributions. For α < 2, the PDF is asymptotically proportional to |x|-α-1 and the CDF is asymptotically proportional to |x| as x → ±∞.  And so except for the normal distribution, all stable distributions have thick tails; the variance does not exist.

The characteristic functions for stable distributions can be written in closed form in terms of the four parameters mentioned above. In general, however, the density functions for stable distributions cannot be written down in closed form. There are three exceptions: the normal distributions, the Cauchy distributions, and the Lévy distributions.

Let F(x) be the CDF for the random variables Xi. The following conditions on F are necessary and sufficient for the aggregation of the X’s to converge to a stable distribution with exponent α < 2.

  1. F(x) = (c1 + o(1)) |x|−α h(|x|) as x → -∞, and
  2. 1 – F(x) = (c2 + o(1)) x h(x) as x → ∞

where h(x) is a slowly varying function. Here o(1) denotes a function tending to 0. (See notes on asymptotic notation.) A slowly varying function h(x) is one such that the ratio h(cx) / h(x) → 1 as x → ∞ for all c > 0. Roughly speaking, this means F(x) has to look something like |x| in both the left and right tails, and so the X’s must be distributed something like the limiting distribution. For more information, see Petrov’s book below.