Motivating example: planet spacing
My previous post showed that planets are roughly evenly distributed on a log scale, not just in our solar system but also in extrasolar planetary systems. I hadn’t seen this before I stumbled on it by making some plots.
I didn’t think it was an original discovery—I assume someone did this exercise immediately when systems with several planets were discovered—but I didn’t know what this observation was called. I now know it’s known as the Titius-Bode law, a generalization of an observation about our solar system by Messrs. Titius and Bode a couple centuries ago. See, for example, .
Several people were skeptical of the claim that planets are distributed according to a power law and pointed out that uniformly distributed points can look fairly evenly distributed on a logarithmic scale. Which is true, and gets to the topic I want to discuss in this post. Planets are not spaced like uniform random samples (see ) and yet it reasonable, at first glance, to ask whether they are.
If you’re expecting a power law, and you’re given uniformly distributed data, it doesn’t look too surprising. On the other hand, if you’re expecting uniformly distributed data and you see data distributed according to a power law, you are surprised. I’ll formalize this below.
If you’ve ever tried to make a scaled model of our solar system, you were probably surprised that the planets are far from uniformly spaced. A scaled model of our solar system, say at a museum, is likely to position a few of the inner planets to scale, and then use text to explain where the outer planets should be. For example, there may be a footnote saying “And if everything were to scale, Pluto would be behind the Exxon station at the end of the street.” This is an example of implicitly expected a uniform distribution and receiving data distributed according to a power law.
Some people suspected that I was doing the opposite. By plotting distances on a log scale, I’m implicitly expected a power law distribution. Maybe the data were roughly uniform, but I fooled myself into seeing a power law.
The Kullback-Leibler divergence from Y to X, written KL(X || Y), is the average surprise of seeing Y when you expected X. That’s one of the interpretations. See this post for more interpretations.
In general, Kullback-Leibler divergence is not symmetric. The divergence from X to Y typically does not equal the divergence from Y to X. The discussion above claims that the surprise from seeing power law data when expecting a uniform distribution is greater than the surprise from seeing uniform data when expected a power law distribution. We show below that this is true.
Let X be random variable uniformly distributed on [0, 1] and let Y be a random variable with distribution proportional to xα on the same interval. (The proportionality constant necessary to make the probability integrate to 1 is α + 1.) We will show that KL(X || Y) is greater than KL(Y || X).
First we calculate the two divergences.
And here is a plot comparing the two results as a function of the exponent α.
 Timothy Bovaird, Charles H. Lineweaver; Exoplanet predictions based on the generalized Titius–Bode relation, Monthly Notices of the Royal Astronomical Society, Volume 435, Issue 2, 21 October 2013, Pages 1126–1138, https://doi.org/10.1093/mnras/stt1357