Central limit theorems

These notes summarize several extensions of the Central Limit Theorem (CLT) and related results.

Outline:

Classical CLT
- Rate of convergence
- Directions for generalization
Non-identically distributed random variables
- Liapounov’s theorem
- Lindeberg-Feller theorem
Generalized CLT for infinite variance

Classical Central Limit Theorem

Let X_n be a sequence of independent, identically distributed (i.i.d.) random variables. Assume each X has finite mean, E(X) = μ, and finite variance, Var(X) = σ². Let Z_n be the normalized average of the first n random variables, i.e.

Z_n = (X₁ + X₂ + … + X_n – nμ)/ σ √ n.

The classical Central Limit Theorem says that Z_n converges in distribution to a standard normal distribution. This means that the CDF of Z_n converges pointwise to Φ, the CDF of a standard normal (Gaussian) random variable. (See notes on modes of convergence.)

A special case of the CLT in which the X_n are assumed to be binomial goes back to Abraham de Moivre in 1733.

Rate of convergence

It is natural to ask about the rate of convergence in the CLT. If F_n is the CDF of Z_n, once we know that F_n(x) converges to Φ(x) as n → ∞, we might want to know how quickly this convergence takes place. Said another way, for a given n, we might want to know how well Φ approximates F_n. This question is settled by the Berry-Esséen theorem. See Quantifying the error in the central limit theorem. For examples of normal approximations for specific distributions, see the following links: binomial, beta, gamma, Poisson, Student-t.

Directions for generalization

The classical CLT has three requirements:

independence,
identical distribution, and
finite variance.

Each of these conditions can be weakened to create variations on the central limit theorem. We will keep the assumption of independence in these notes. For CLT results for dependent random variables, see Chow and Teicher. Below we consider non-identically distributed random variables and random variables with infinite variance.

Non-identically distributed random variables

In this section we allow the possibility that the X_n variables are not identically distributed. The main results in this are the Lindeberg-Feller theorem and its corollary Liapounov’s theorem.

First we introduce notation and assumptions common to both theorems. Let X_n be a sequence of independent random variables, at least one of which has a non-degenerate distribution. Assume each X_n has mean 0 and variance σ_n². Define the partial sum

S_n = X₁ + X₂ + … + X_n

and its variance

s_n² = σ₁² + σ₂² + … + σ_n².

Both theorems concern under what circumstances the normalized partial sums S_n / s_n converge in distribution to a standard normal random variable. We start with Liapounov’s theorem because it is simpler.

Liapounov’s theorem

Liapounov’s theorem weakens the requirement of identical distribution but strengthens the requirement of finite variance. Where the classical CLT requires finite moments of order 2, Liapounov’s CLT requires finite moments of order 2 + δ for some δ > 0.

Assume E(|X_n|^2+δ) is bounded for some δ > 0 and for all n. If

s_n^{-1 – δ/2} ∑_{1 ≤ k ≤ n} E(|X_n|^{2 + δ}) → 0

as n → ∞ then S_n / s_n converges in distribution to a standard normal random variable.

Lindeberg-Feller theorem

The Lindeberg-Feller theorem is more general than Liapounov’s theorem. It gives necessary and sufficient conditions for S_n / s_n to converge to a standard normal.

Lindeberg: Under the assumptions above (each X has zero mean and finite variance, and at least one X has a non-degenerate distribution) then if the Lindeberg condition holds, S_n / s_n converges in distribution to a standard normal random variable.

Feller: Conversely, if S_n / s_n converges in distribution to a standard normal and σ_n/s_n → 0 and s_n → ∞ then the Lindeberg condition holds.

So what is this Lindeberg condition? Let F_n be the CDF of X_n, i.e. F_n(x) = P(X_n < x). The Lindeberg condition requires

$\lim_{n\to\infty} \frac{1}{s_n^2} \sum_{j=1}^n \int_{[x > \varepsilon s_j]} x^2 dF_j(x) = 0$

for all ε > 0.

Generalized CLT for random variables with infinite variance

For this section, we require the random variables X_n to be independent and identically distributed. However, we do not require that they have finite variance.

First we look at some restrictions for what a generalized CLT would look like for random variables X_n without finite variance. We would need sequences of constants a_n and b_n such that (X₁ + X₂ + … + X_n – b_n)/a_n converges in distribution to something. It turns out that the something that the sequence converges to must have a stable distribution.

Let X₀, X₁, and X₂ be independent, identically distributed (iid) random variables. The distribution of these random variables is called stable if for every pair of positive real numbers a and b, there exists a positive c and a real d such that cX₀ + d has the same distribution as aX₁ + bX₂.

Stable distributions can be specified by four parameters. One of the four parameters is the exponent parameter 0 < α ≤ 2. This parameter is controls the thickness of the distribution tails. The distributions with α = 2 are the normal (Gaussian) distributions. For α < 2, the PDF is asymptotically proportional to |x|^-α-1 and the CDF is asymptotically proportional to |x|^-α as x → ±∞. And so except for the normal distribution, all stable distributions have thick tails; the variance does not exist.

The characteristic functions for stable distributions can be written in closed form in terms of the four parameters mentioned above. In general, however, the density functions for stable distributions cannot be written down in closed form. There are three exceptions: the normal distributions, the Cauchy distributions, and the Lévy distributions.

Let F(x) be the CDF for the random variables X_i. The following conditions on F are necessary and sufficient for the aggregation of the X’s to converge to a stable distribution with exponent α < 2.

F(x) = (c₁ + o(1)) |x|^-α h(|x|) as x → -∞, and
1 – F(x) = (c₂ + o(1)) x^-α h(x) as x → ∞

where h(x) is a slowly varying function. Here o(1) denotes a function tending to 0. (See notes on asymptotic notation.) A slowly varying function h(x) is one such that the ratio h(cx) / h(x) → 1 as x → ∞ for all c > 0. Roughly speaking, this means F(x) has to look something like |x|^-α in both the left and right tails, and so the X’s must be distributed something like the limiting distribution. For more information, see Petrov’s book below.

References

Limit Theorems of Probability Theory: Sequences of Independent Random Variables by Valentin Petrov.
Probability Theory: Independence, Interchangeability, Martingales by Yuan Shih Chow and Henry Teicher.
Power laws and the generalized CLT blog post
An introduction to stable distributions by John P. Nolan
The Life and Times of the Central Limit Theorem by William Adams