These notes summarize several extensions of the Central Limit Theorem (CLT) and related results.

**Outline**:

## Classical Central Limit Theorem

Let *X*_{n} be a sequence of independent, identically distributed (i.i.d.) random variables. Assume each *X* has finite mean, E(*X*) = μ, and finite variance, Var(*X*) = σ^{2}. Let *Z*_{n} be the normalized average of the first n random variables, i.e.

*Z*_{n} = (*X*_{1} + *X*_{2} + … + *X*_{n} – nμ)/ σ √ *n*.

The **classical Central Limit Theorem** says that *Z*_{n} converges in distribution to a standard normal distribution. This means that the CDF of *Z*_{n} converges pointwise to Φ, the CDF of a standard normal (Gaussian) random variable. (See notes on modes of convergence.)

A special case of the CLT in which the *X*_{n} are assumed to be binomial goes back to Abraham de Moivre in 1733.

### Rate of convergence

It is natural to ask about the rate of convergence in the CLT. If *F*_{n} is the CDF of *Z*_{n}, once we know that *F*_{n}(x) converges to Φ(*x*) as *n* → ∞, we might want to know how quickly this convergence takes place. Said another way, for a given *n*, we might want to know how well Φ approximates *F*_{n}. This question is settled by the **Berry-Esséen theorem**. See Quantifying the error in the central limit theorem. For examples of normal approximations for specific distributions, see the following links: binomial, beta, gamma, Poisson, Student-t.

### Directions for generalization

The classical CLT has three requirements:

- independence,
- identical distribution, and
- finite variance.

Each of these conditions can be weakened to create variations on the central limit theorem. We will keep the assumption of independence in these notes. For CLT results for dependent radnom variables, see Chow and Teicher. Below we consider non-identically distributed random variables and random variables with infinite variance.

## Non-identically distributed random variables

In this section we allow the possibility that the *X*_{n} variables are not identically distributed. The main results in this are are the Lindeberg-Feller theorem and its corollary Liapounov’s theorem.

First we introduce notation and assumptions common to both theorems. Let *X*_{n} be a sequence of independent random variables, at least one of which has a non-degenerate distribution. Assume each *X*_{n} has mean 0 and variance σ_{n}^{2}. Define the partial sum

*S*_{n} = *X*_{1} + *X*_{2} + … + *X*_{n}

and its variance

*s*_{n}^{2} = σ_{1}^{2} + σ_{2}^{2} + … + σ_{n}^{2}.

Both theorems concern under what circumstances the normalized partial sums *S*_{n} / *s*_{n} converge in distribution to a standard normal random variable. We start with Liapounov’s theorem because it is simpler.

### Liapounov’s theorem

**Liapounov’s theorem **weakens the requirement of indentical distribution but strengthens the requirement of finite variance. Where the classical CLT requires finite moments of order 2, Liapounov’s CLT requires finite moments of order 2 + δ for some δ > 0.

Assume E(|*X*_{n}|^{2+δ}) is bounded for some δ > 0 and for all *n*. If

s_{n}^{-1 – δ/2} ∑_{1 ≤ k ≤ n} E(|*X*_{n}|^{2 + δ}) → 0

as *n* → ∞ then *S*_{n} / *s*_{n} converges in distribution to a standard normal random variable.

### Lindeberg-Feller theorem

The Lindeberg-Feller theorem is more general than Liapounov’s theorem. It gives necessary and sufficient conditions for *S*_{n} / *s*_{n} to converge to a standard normal.

**Lindeberg**: Under the assumptions above (each *X* has zero mean and finite variance, and at least one *X* has a non-degenerate distribution) then if the Lindeberg condition holds, *S*_{n} / *s*_{n} converges in distribution to a standard normal random variable.

**Feller**: Conversely, if *S*_{n} / *s*_{n} converges in distribution to a standard normal and σ_{n}/s_{n} → 0 and s_{n} → ∞ then the Lindeberg condition holds.

So what is this **Lindeberg condition**? Let *F*_{n} be the CDF of *X*_{n}, i.e. *F*_{n}(*x*) = P(*X*_{n} < *x*). The Lindeberg condition requires

for all ε > 0.

## Generalized CLT for random variables with infinite variance

For this section, we require the random variables *X*_{n} to be independent and identically distributed. However, we do not require that they have finite variance.

First we look at some restrictions for what a generalized CLT would look like for random variables *X*_{n} without finite variance. We would need sequences of constants *a*_{n} and *b*_{n} such that (*X*_{1} + *X*_{2} + … + *X*_{n} – *b*_{n})/*a*_{n} converges in distribution to something. It turns out that the something that the sequence converges to must have a **stable distribution**.

Let *X _{0}*,

*X*, and

_{1}*X*be independent, identically distributed (iid) random variables. The distribution of these random variables is called

_{2}**stable**if for every pair of positive real numbers

*a*and

*b*, there exists a positive

*c*and a real

*d*such that

*cX*+

_{0}*d*has the same distribution as

*aX*+

_{1}*bX*.

_{2}Stable distributions can be specified by four parameters. One of the four parameters is the **exponent parameter** 0 < α ≤ 2. This parameter is controls the thickness of the distribution tails. The distributions with α = 2 are the normal (Gaussian) distributions. For α < 2, the PDF is asymptotically proportional to |*x*|^{-α-1} and the CDF is asymptotically proportional to |*x*|^{-α} as* x* → ±∞. And so except for the normal distribution, all stable distributions have thick tails; the variance does not exist.

The characterisitc functions for stable distributions can be written in closed form in terms of the four parameters mentioned above. In general, however, the density functions for stable distributions cannot be written down in closed form. There are three exeptions: the normal distributions, the Cauchy distributions, and the Lévy distributions.

Let *F*(*x*) be the CDF for the random variables *X _{i}*. The following conditions on

*F*are necessary and sufficient for the aggregation of the

*X*’s to converge to a stable distribution with exponent α < 2.

*F*(*x*) = (*c*_{1}+ o(1)) |*x*|^{-α}*h*(|*x*|) as*x*→ -∞, and- 1 -
*F*(*x*) = (*c*_{2}+ o(1))*x*^{-α}*h*(*x*) as*x*→ ∞

where *h*(*x*) is a slowly varying function. Here o(1) denotes a function tending to 0. (See notes on asymptotic notation.) A **slowly varying function** *h*(*x*) is one such that the ratio *h*(*cx*) / *h*(*x*) → 1 as *x* → ∞ for all *c* > 0. Roughly speaking, this means *F*(*x*) has to look something like |*x*|^{-α} in both the left and right tails, and so the *X*’s must be distributed something like the limiting distribution. For more information, see Petrov’s book below.

## References

Limit Theorems of Probability Theory: Sequences of Independent Random Variables by Valentin Petrov.

Probability Theory: Independence, Interchangeability, Martingales by Yuan Shih Chow and Henry Teicher.

Power laws and the generalized CLT blog post

An introduction to stable distributions by John P. Nolan

The Life and Times of the Central Limit Theorem by William Adams