Power laws and the generalized CLT

Here’s an expert from a recent ACM Ubiquity interview with David Alderson that raises a few questions.

Actually, they [power laws] aren’t special at all. They can arise as natural consequences of aggregation of high variance data. You know from statistics that the Central Limit Theorem says distributions of data with limited variability tend to follow the Normal (bell-shaped, or Gaussian) curve. There is a less well-known version of the theorem that shows aggregation of high (or infinite) variance data leads to power laws. Thus, the bell curve is normal for low-variance data and the power law curve is normal for high-variance data. In many cases, I don’t think anything deeper than that is going on.

In this post I will explain the theory I believe Alderson is alluding to in his informal remarks. I’ll also explain some restrictions necessary for this theory to hold.

I don’t understand what Alderson has in mind when he refers to data with high but finite variance. If the variance is large but finite, the classical Central Limit Theorem (CLT) holds. If the variance is infinite, the classical CLT does not apply but a Generalized Central Limit Theorem might (or might not) apply.

The Generalized CLT says that if the “aggregation” converges to a non-degenerate distribution, that distribution must be a stable distribution. Also, stable distributions (except for normal distributions) have tails that are asymptotically proportional to the tails of a power law distribution. Note that this does not say under what conditions the aggregation has a non-degenerate limit. It only says something about what that limit must be like if it exists. Also, this does not say that the limit is a power law, only that it is a distribution whose tails are eventually proportional to those of a power law distribution.

In order to better understand what’s going on, there are several gaps to fill in.

  1. What are stable distributions?
  2. What do we mean by aggregation?
  3. What conditions insure that a non-degenerate limiting distribution exists?

Let X0, X1, and X2 be independent, identically distributed (iid) random variables. The distribution of these random variables is called stable if for every pair of positive real numbers a and b, there exists a positive c and a real d such that cX0 + d has the same distribution as aX1 + bX2.

Stable distributions can be specified by four parameters. One of the four parameters is the exponent parameter 0 < α ≤ 2. This parameter is controls the thickness of the distribution tails. The distributions with α = 2 are the normal (Gaussian) distributions. For α < 2, the PDF is asymptotically proportional to |x|-α-1 and the CDF is asymptotically proportional to |x| as x → ±∞.  And so except for the normal distribution, all stable distributions have thick tails.

A stable distribution can be described in terms of its characteristic function, the Fourier transform of its PDF.  The description of the characteristic function is a little complicated, but it can be written down in closed form. (See John P. Nolan’s notes on stable distributions for much more information.) However, the PDFs can only be written down in closed form in three special cases: the normal, Cauchy, and Lévy distributions. These three distributions correspond to α = 2, 1, and 1/2 respectively.

The Generalized CLT holds if there is a sequence of constants an and bn such that (X1 + X2 + … + Xnbn) / an converges to a stable distribution. This is what is meant by the “aggregation” of the X‘s. The factors are an necessarily asymptotically equal to n1/α where α is the exponential parameter for the limiting distribution.

We now get to the most critical question: what kinds of random variables lead to stable distributions when aggregated? They must have tails something like the tails of the limiting distribution. In this sense the Generalized CLT is not as magical as the classical CLT. The classical CLT says you can aggregate random variables quite unlike a normal distribution and get a normal distribution out in the limit. But the Generalized CLT requires that the distribution of the X‘s must be somewhat similar to limiting distribution. The specific requirements are given below.

Let F(x) be the CDF for the random variables Xi. The following conditions on F are necessary and sufficient for the aggregation of the X‘s to converge to a stable distribution with exponent α < 2.

  1. F(x) = (c1 + o(1)) |x| h(|x|) as x → -∞, and
  2. 1 –F(x) = (c2 + o(1)) x h(x) as x → ∞

where h(x) is a slowly varying function. The notation o(1) is explained in these notes on asymptotic notation. A slowly varying function h(x) is one such that h(cx) / h(x) → 1 as x → ∞ for all c > 0. Roughly speaking, this means F(x) has to look something like |x| in both the left and right tails, and so the X‘s must be distributed something like the limiting distribution.

Power laws do not fall out of the Generalized CLT as easily as the normal distribution falls out of the classical CLT. The aggregation of random variables with infinite variance might not converge to any distribution, or it might converge to a degenerate distribution. And if the aggregation converges to a non-degenerate distribution, this distribution is not strictly a power law but rather has tails like a power law.

Related links

Has C++ jumped the shark?

Bjarne Stroustrup, creator of C++, wrote an article for Dr. Dobbs recently lamenting the decision to cut “concepts” from the upcoming revision of the C++ standard. His article left me with the feeling that C++ had jumped the shark.

The upcoming standard has been called “C++0x” based on the assumption (or at least the hope) that the standard would come out in the first decade of this century. But there will be no C++0x; now it will have to be C++1x. Part of the reason for removing concepts was to avoid pushing the release of the standard out even further. Stroustrup says he expects concepts will be added to the standard in five years. How many people will care by the time the standard is finished?

I’ve written C++ for a long time and I still use C++ for some problems. I like the language, but it has gone about as far as it can go. It’s safe to say the language has essentially stopped evolving if new standards are only going to come out every decade or two.

I have great respect for the people working on the C++ standard committee. The design of C++ is an engineering marvel given its constraints. But if it takes such Herculean efforts to make changes, maybe it’s time to call the language finished.

I’m content with the current version of C++. If I used C++ for everything I do, maybe I’d be anxious for new features. But if something is hard to do in C++, I just don’t use C++. I don’t see a new version of C++ changing my decisions about what language to use for various tasks. If something is easier to do in Python than in C++ now, for example, that will probably still be the case when the new standard is implemented.

Update: The ISO committee approved the final draft of the C++ 2011 standard 25 March 2011.

Related links:

Why Shakespeare is hard to read

Near the end of her course on classical mythology, Elizabeth Vandiver speculates on why people find Shakespeare hard to read. She says that, contrary to popular opinion, the difficulty is not the language per se. Elizabethan English is not that foreign to modern readers. The difficulty in reading Shakespeare comes from the literary allusions, particularly the allusions to classical mythology.

Her explanation matches my experience. I can easily read the King James version of the Bible, produced during Shakespeare’s lifetime, but I find it hard to slog through Shakespeare. (To be fair, I must say I grew up with far more exposure to the King James Bible than to Shakespeare.)

Vandiver when on to say that the primary source of classical mythology for Shakespeare and his audience was Ovid’s Metamorphoses. Studying this one book would make the Bard much more approachable.

Email isn't the problem

I find it odd that so many folks complain about email. Email’s ruining their lives. They wish email had never been invented. Etc.

Say you get 100 email messages in a day. Would you rather have 100 phone calls? The problem isn’t the 100 messages but the 100 new responsibilities represented by the email messages. Email spam is a annoying, but phone spam (i.e. telemarketing) is worse.

There are alternatives to email that work well in their niche. RSS subscriptions beat email newsletters. Instant messaging beats email for quick interaction. Twitter beats email for keeping up with loquacious friends. But I don’t understand those who say “email is dead.”

Publishers and e-books

I like the way Manning sells e-books. They sell PDF versions of their books for significantly less than paper versions and they give a discount when you buy both.  Also, their early access program lets you read books as their being written. Readers get the content sooner and writers get valuable feedback as they go. O’Reilly and Pragmatic Bookshelf have similar programs.

I don’t care for the way Wiley sells e-books. For example, they sell The R Book for $110 in paper and $105 in PDF. That’s absurd. The book has 960 pages weighs 3.7 pounds. The difference in what it costs to produce the paper and electronic versions must be far more than $5. And I don’t believe Wiley gives any discount for buying both paper and PDF versions.

I also find Wiley’s license terms unreasonably restrictive. For example, “you … May not move the file to a different computer.” You may download the file directly to up to four computers within 14 days of buying the book. But if you buy a new computer, say six months after you buy the book, you’re not supposed to move the book to your new machine. That means you’re effectively renting the book for the lifetime of the computer you download it to.

PowerShell news

I found out some PowerShell news by listening to the PowerScripting podcast this morning. Here are a couple things I found interesting. See the podcast show notes for more news.

First, a new version of the PowerShell Community Extensions came out last week, version 1.2. See Keith Hill’s summary of the changes.

Second, Bruce Payette is working on a second edition of his book PowerShell in Action. You can get an “early access” version of the book through the publisher, Manning.

Approximating a solution that doesn’t exist

The following example made an impression on me when I first saw it years ago. I still think it’s an important example, though I’d draw a different conclusion from it today.

Problem: Let y(t) be the solution to the differential equation y‘ = t2 + y2 with y(0) = 1. Calculate y(1).

If we use Euler’s numerical method with a step size h = 0.1, we get y(1) = 7.19. If we reduce the step size to 0.05 we get y(1) = 12.32. If we reduce the step size further to 0.01, we get y(1) = 90.69. That’s strange. Let’s switch over to a more accurate method, 4th order Runge-Kutta. With a step size 0.1 the Runge-Kutta method gives 735.00, and if we use a step size of 0.01 we get a result larger than 1015. What’s going on?

The problem presupposes that a solution exists at t = 1 when in fact no solution exists. General theory (Picard’s theorem) tells that a unique solution exists for some interval containing 0, but it does not tell us how far that interval extends. With a little work we can show that a solution exists for t at least as large as π/4. However, the solution becomes unbounded somewhere between π/4 and 1.

When I first saw this example, my conclusion was that it showed how important theory is. If you just go about numerically computing solutions without knowing that a solution exists, you can think you have succeeded when you’re actually computing something that doesn’t exist. Prove existence and uniqueness before computing. Theory comes first.

Now I think the example shows the importance of the interplay between theory and numerical computation. It would be nice to know how big the solution interval is before computing anything, but that’s not always possible. Also, it’s not obvious from looking at the equation that there should be a problem at t = 1. The difficulties we had with numerical computation suggested there might be a theoretical problem.

I first saw this problem in an earlier edition of Boyce and DiPrima. The book goes on to approximate the interval over which the solution does exist using a combination of analytical and numerical methods. It looks like the solution becomes unbounded somewhere near t = 0.97.

I wouldn’t say that theory or computation necessarily come first. I’d say you iterate between them, starting with the approach that is more tractable. Theoretical results are  more satisfying when they’re available, but theory often doesn’t tell us as much as we’d like to know. Also, people make mistakes in theoretical computation just as they do in numerical computation. It’s best when theory and numerical work validate each other.

The problem does show the importance of being concerned with existence and uniqueness, but theoretical methods are not the only methods for exploring existence. Good numerical practice, i.e. trying more than one step size or more than one numerical method, is also valuable. In any case, the problem shows that without some diligence — either theoretical or numerical — you could naively compute an approximate “solution” where no solution exists.

RelatedConsulting in differential equations