Here’s an expert from a recent ACM Ubiquity interview with David Alderson that raises a few questions.
Actually, they [power laws] aren’t special at all. They can arise as natural consequences of aggregation of high variance data. You know from statistics that the Central Limit Theorem says distributions of data with limited variability tend to follow the Normal (bell-shaped, or Gaussian) curve. There is a less well-known version of the theorem that shows aggregation of high (or infinite) variance data leads to power laws. Thus, the bell curve is normal for low-variance data and the power law curve is normal for high-variance data. In many cases, I don’t think anything deeper than that is going on.
In this post I will explain the theory I believe Alderson is alluding to in his informal remarks. I’ll also explain some restrictions necessary for this theory to hold.
I don’t understand what Alderson has in mind when he refers to data with high but finite variance. If the variance is large but finite, the classical Central Limit Theorem (CLT) holds. If the variance is infinite, the classical CLT does not apply but a Generalized Central Limit Theorem might (or might not) apply.
The Generalized CLT says that if the “aggregation” converges to a non-degenerate distribution, that distribution must be a stable distribution. Also, stable distributions (except for normal distributions) have tails that are asymptotically proportional to the tails of a power law distribution. Note that this does not say under what conditions the aggregation has a non-degenerate limit. It only says something about what that limit must be like if it exists. Also, this does not say that the limit is a power law, only that it is a distribution whose tails are eventually proportional to those of a power law distribution.
In order to better understand what’s going on, there are several gaps to fill in.
- What are stable distributions?
- What do we mean by aggregation?
- What conditions insure that a non-degenerate limiting distribution exists?
Let X0, X1, and X2 be independent, identically distributed (iid) random variables. The distribution of these random variables is called stable if for every pair of positive real numbers a and b, there exists a positive c and a real d such that cX0 + d has the same distribution as aX1 + bX2.
Stable distributions can be specified by four parameters. One of the four parameters is the exponent parameter 0 < α ≤ 2. This parameter is controls the thickness of the distribution tails. The distributions with α = 2 are the normal (Gaussian) distributions. For α < 2, the PDF is asymptotically proportional to |x|−α−1 and the CDF is asymptotically proportional to |x|−α as x → ±∞. And so except for the normal distribution, all stable distributions have thick tails.
A stable distribution can be described in terms of its characteristic function, the Fourier transform of its PDF. The description of the characteristic function is a little complicated, but it can be written down in closed form. (See John P. Nolan’s notes on stable distributions for much more information.) However, the PDFs can only be written down in closed form in three special cases: the normal, Cauchy, and Lévy distributions. These three distributions correspond to α = 2, 1, and 1/2 respectively.
The Generalized CLT holds if there is a sequence of constants an and bn such that (X1 + X2 + … + Xn − bn) / an converges to a stable distribution. This is what is meant by the “aggregation” of the X‘s. The factors are an necessarily asymptotically equal to n1/α where α is the exponential parameter for the limiting distribution.
We now get to the most critical question: what kinds of random variables lead to stable distributions when aggregated? They must have tails something like the tails of the limiting distribution. In this sense the Generalized CLT is not as magical as the classical CLT. The classical CLT says you can aggregate random variables quite unlike a normal distribution and get a normal distribution out in the limit. But the Generalized CLT requires that the distribution of the X‘s must be somewhat similar to limiting distribution. The specific requirements are given below.
Let F(x) be the CDF for the random variables Xi. The following conditions on F are necessary and sufficient for the aggregation of the X‘s to converge to a stable distribution with exponent α < 2.
- F(x) = (c1 + o(1)) |x|−α h(|x|) as x → -∞, and
- 1 − F(x) = (c2 + o(1)) x−α h(x) as x → ∞
where h(x) is a slowly varying function. The notation o(1) is explained in these notes on asymptotic notation. A slowly varying function h(x) is one such that h(cx) / h(x) → 1 as x → ∞ for all c > 0. Roughly speaking, this means F(x) has to look something like |x|−α in both the left and right tails, and so the X‘s must be distributed something like the limiting distribution.
Power laws do not fall out of the Generalized CLT as easily as the normal distribution falls out of the classical CLT. The aggregation of random variables with infinite variance might not converge to any distribution, or it might converge to a degenerate distribution. And if the aggregation converges to a non-degenerate distribution, this distribution is not strictly a power law but rather has tails like a power law.
Very exciting blog post. Insightful.
(Your blog post would be more useful if I could click on Generalized CLT and get immediately to its statement.)
1) I don’t think that most distributions people call “power laws” are bona fide “power laws”. I think we effectively just look at the tail, asymptotically, and do some hand-waving to conclude it is a power law. And that’s alright. So, I don’t expect anyone to demonstrate that we can actual power laws in practice… I always understood that we get something “close” to a power low.
2) Well, you can probably rule out convergence to a degenerate distribution on a case-by-case basis. Or you can just dismiss them as being exceptions. I think most people would think it is sensible?
3) Also, about not converging at all, can you give an example, maybe?
Not quite true that finite variance => classical CLT applies. The distribution has to have some moment greater than 2 be finite (whether it be 2.1, 2.01, or whatever). I think t3 distribution is a counterexample. R code to show this follows:
samp <- replicate(4000,mean(rt(1000,df=3)))
qqnorm(samp)
abline(a=0,b=1)
Maybe make that a qqline(samp) in the last line above. Anyway, 10000 replications shows it a little better.
@John: Finite variance is enough for the CLT if the random variables are independent and identically distributed. If the random variables are not identically distributed, then the CLT requires the “Lyapunov condition” which includes the existence of moments of order 2 + δ for some δ > 0. So I believe you and I were referring to different variations on the classical CLT.
Yes, you’re right. Just read up on the myriad of CLTs again. I guess I got some weird samples.
I think that, in addition to the fumble on power law, Alderson erects a straw man. Rather than graph vs network distinctions, which are important but I don’t think helpful, a far greater challenge is whether Internet measurements are wide sense stationary or not. There’s a notable lack of attention to time constants and notions of stability.
I thought Alderman’s claim about high- or infinite-variance data was odd and oddly stated, but I lack the sophistication to see John Cook’s analysis. Thanks John!
BTW, Alderson has a more substantial version of his claims in NOTICES OF THE AMS (volume 56, number 5, May 2009), but the quoted discussion on power laws is pushed off into reference 33 (via page 595), or:
L. Li, D. Alderson, W. Willinger, and J. C. Doyle, A first principles approach to understanding the Internet’s router-level topology, Proc. ACM SIGCOMM’04, 34(4) (2004), 3–14.
I have put a copy of the above at the link, but I think their citation was a mistake. In fact, there’s very little discussion of the matter there. Instead, consider:
W.Willinger, D.Alderson, L.Li, “A Pragmatic Approach to Dealing with HighVariability in Network Measurements”, October 2004, IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement.
which addresses CLT in its Section 2.2.1. I’ve linked that online as well. They summarize their finding where they aggregate infinite variance stable with Together, these results show that the Gaussian and scaling distributions are both invariant under aggregation. More precisely, the classical and non-classical CLTs state that the stable distributions are the only fixed points of the renormalization group transformation
(i.e., aggregation) and that Gaussian distributions are, in fact, a very special case (i.e., α = 2).
Somehow the link for “First Principles” did not survive the post. Here it is, again:
http://bilge.pyrate.mailbolt.com/blogBeginningNewtonmas2008/FirstPrinciplesApproachToUnderstandingTheInternet'sRouterLevelTopology–LiAldersonWillingerDoyle.pdf