Here’s an expert from a recent ACM Ubiquity interview with David Alderson that raises a few questions.
Actually, they [power laws] aren’t special at all. They can arise as natural consequences of aggregation of high variance data. You know from statistics that the Central Limit Theorem says distributions of data with limited variability tend to follow the Normal (bell-shaped, or Gaussian) curve. There is a less well-known version of the theorem that shows aggregation of high (or infinite) variance data leads to power laws. Thus, the bell curve is normal for low-variance data and the power law curve is normal for high-variance data. In many cases, I don’t think anything deeper than that is going on.
In this post I will explain the theory I believe Alderson is alluding to in his informal remarks. I’ll also explain some restrictions necessary for this theory to hold.
I don’t understand what Alderson has in mind when he refers to data with high but finite variance. If the variance is large but finite, the classical Central Limit Theorem (CLT) holds. If the variance is infinite, the classical CLT does not apply but a Generalized Central Limit Theorem might (or might not) apply.
The Generalized CLT says that if the “aggregation” converges to a non-degenerate distribution, that distribution must be a stable distribution. Also, stable distributions (except for normal distributions) have tails that are asymptotically proportional to the tails of a power law distribution. Note that this does not say under what conditions the aggregation has a non-degenerate limit. It only says something about what that limit must be like if it exists. Also, this does not say that the limit is a power law, only that it is a distribution whose tails are eventually proportional to those of a power law distribution.
In order to better understand what’s going on, there are several gaps to fill in.
- What are stable distributions?
- What do we mean by aggregation?
- What conditions insure that a non-degenerate limiting distribution exists?
Let X0, X1, and X2 be independent, identically distributed (iid) random variables. The distribution of these random variables is called stable if for every pair of positive real numbers a and b, there exists a positive c and a real d such that cX0 + d has the same distribution as aX1 + bX2.
Stable distributions can be specified by four parameters. One of the four parameters is the exponent parameter 0 < α ≤ 2. This parameter is controls the thickness of the distribution tails. The distributions with α = 2 are the normal (Gaussian) distributions. For α < 2, the PDF is asymptotically proportional to |x|-α-1 and the CDF is asymptotically proportional to |x|-α as x → ±∞. And so except for the normal distribution, all stable distributions have thick tails.
A stable distribution can be described in terms of its characteristic function, the Fourier transform of its PDF. The description of the characteristic function is a little complicated, but it can be written down in closed form. (See John P. Nolan’s notes on stable distributions for much more information.) However, the PDFs can only be written down in closed form in three special cases: the normal, Cauchy, and Lévy distributions. These three distributions correspond to α = 2, 1, and 1/2 respectively.
The Generalized CLT holds if there is a sequence of constants an and bn such that (X1 + X2 + … + Xn – bn) / an converges to a stable distribution. This is what is meant by the “aggregation” of the X‘s. The factors are an necessarily asymptotically equal to n1/α where α is the exponential parameter for the limiting distribution.
We now get to the most critical question: what kinds of random variables lead to stable distributions when aggregated? They must have tails something like the tails of the limiting distribution. In this sense the Generalized CLT is not as magical as the classical CLT. The classical CLT says you can aggregate random variables quite unlike a normal distribution and get a normal distribution out in the limit. But the Generalized CLT requires that the distribution of the X‘s must be somewhat similar to limiting distribution. The specific requirements are given below.
Let F(x) be the CDF for the random variables Xi. The following conditions on F are necessary and sufficient for the aggregation of the X‘s to converge to a stable distribution with exponent α < 2.
- F(x) = (c1 + o(1)) |x|-α h(|x|) as x → -∞, and
- 1 –F(x) = (c2 + o(1)) x-α h(x) as x → ∞
where h(x) is a slowly varying function. The notation o(1) is explained in these notes on asymptotic notation. A slowly varying function h(x) is one such that h(cx) / h(x) → 1 as x → ∞ for all c > 0. Roughly speaking, this means F(x) has to look something like |x|-α in both the left and right tails, and so the X‘s must be distributed something like the limiting distribution.
Power laws do not fall out of the Generalized CLT as easily as the normal distribution falls out of the classical CLT. The aggregation of random variables with infinite variance might not converge to any distribution, or it might converge to a degenerate distribution. And if the aggregation converges to a non-degenerate distribution, this distribution is not strictly a power law but rather has tails like a power law.