Linear regression and planet spacing

A while back I wrote about how planets are evenly spaced on a log scale. I made a bunch of plots, based on our solar system and the extrasolar systems with the most planets, and said noted that they’re all roughly straight lines. Here’s the plot for our solar system, including dwarf planets, with distance on a logarithmic scale.

Distances in our solar system including planets and dwarf planets

This post is a quick follow up to that one. You can quantify how straight the lines are by using linear regression and comparing the actual spacing with the spacing given by the best straight line. Here I’m regressing the log of the distance of each planet from its star on the planet’s ordinal position.

NB: I am only using regression output as a measure of goodness of fit. I am not interpreting anything as a probability.

|-----------+-----------+----------+-----------|
| System    | Adjusted  | Slope    | Intercept |
|           | R-squared | p-value  | p-value   |
|-----------+-----------+----------+-----------|
| home      |    0.9943 | 1.29e-11 |  2.84e-08 |
| kepler90  |    0.9571 | 1.58e-05 |  1.29e-06 |
| hd10180   |    0.9655 | 1.41e-06 |  2.03e-07 |
| hr8832    |    0.9444 | 1.60e-04 |  5.57e-05 |
| trappist1 |    0.9932 | 8.30e-07 |  1.09e-09 |
| kepler11  |    0.9530 | 5.38e-04 |  2.00e-05 |
| hd40307   |    0.9691 | 2.30e-04 |  1.77e-05 |
| kepler20  |    0.9939 | 8.83e-06 |  3.36e-07 |
| hd34445   |    0.9679 | 2.50e-04 |  4.64e-04 |
|-----------+-----------+----------+-----------|

R² is typically interpreted as how much of the variation in the data is explained by the model. In the table above, the smallest value of R² is 94%.

p-values are commonly, and wrongly, understood to be the probability of a model assumption being incorrect. As I said above, I’m completely avoiding any interpretation of p-values as the probability of anything, only noting that small values are consistent with a good fit.

Journals commonly, and wrongly, are willing to assume that anything with a p-value less than 0.05 is probably true. Some are saying the cutoff should be 0.005. There are problems with using any p-value cutoff, but I don’t want to get into here. I’m only saying that small p-values are typically seen as evidence that a model fits, and the values above are orders of magnitude smaller than what journals consider acceptable evidence.

When I posted my article about planet spacing I got some heated feedback saying that this isn’t exact, that it’s unscientific, etc. I thought that was strange. I never said it was exact, only that it was a rough pattern. And although it’s not exact, it would be hard to find empirical studies of anything with such a good fit. If you held economics or psychology, for example, to the same standards of evidence, there wouldn’t be much left.

This pattern is known as the Titius-Bode law. I stumbled on it by making some plots. I assumed from the beginning that someone else must have done the same exercise and that the pattern had a name, but I didn’t know that name until later.

Someone sent me a paper that analyzes the data on extrasolar planets and Bode’s law, something much more sophisticated than the crude sketch above, but unfortunately I can’t find it this morning. I don’t recall what they did. Maybe they fit a hierarchical model where each system has its own slope and intercept.

One criticism has been that by regressing against planet order, you automatically get a monotone function. That’s true, but you do get a much better fit on a log scale than on a linear scale in any case. You might look at just the relative planet spacings without reference to order.

Proving life exists on Earth

NASA’s Galileo mission was primarily designed to explore Jupiter and its moons. In 1989, the Galileo probe started out traveling away from Jupiter in order to do a gravity assist swing around Venus. About a year later it also did a gravity assist maneuver around Earth. Carl Sagan suggested that when passing Earth, the Galileo probe should turn its sensors on Earth to look for signs of life. [1]

Artist conception of Galileo probe surveying Jupiter and its moons

Now obviously we know there’s life on Earth. But if we’re going look for life on other planets, it’s reasonable to ask that our methods return positive results when examining the one planet we know for sure does host life. So scientists looked at the data from Galileo as if it were coming from another planet to see what patterns in the data might indicate life.

I’ve started using looking for life on Earth as a metaphor. I’m working on a project right now where I’m looking for a needle in a haystack, or rather another needle in a haystack: I knew that one needle existed before I got started. So I want to make sure that my search procedure at least finds the one positive result I already know exists. I explained to a colleague that we need to make sure we can at least find life on Earth.

This reminds me of simulation work. You make up a scenario and treat it as the state of nature. Then pretend you don’t know that state, and see how successful your method is at discovering that state. It’s sort of a schizophrenic way of thinking, pretending that half of your brain doesn’t know what the other half is doing.

It also reminds me of software testing. The most trivial tests can be surprisingly effective at finding bugs. So you write a few tests to confirm that there’s life on Earth.

Related posts

[1] I found out about Galileo’s Earth reconnaissance listening to the latest episode of the .NET Rocks! podcast.

Planets evenly spaced on log scale

The previous post was about Kepler’s observation that the planets were spaced out around the sun the same way that nested regular solids would be. Kepler only knew of six planets, which was very convenient because there are only five regular solids. In fact, Kepler thought there could only be six planets because there are only five regular solids.

The distances to each of the planets is roughly a geometric series. Ratios of consecutive distances are between 1.3 and 3.4. That means the distances should be fairly evenly spaced on a logarithmic scale. The plot below shows that this is the case.

Distance to planets in our solar system on log scale

The plot goes beyond the six planets known in Kepler’s day and adds four more: Uranus, Neptune, Pluto, and Eris. Here I’m counting the two largest Kuiper belt objects as planets. Distances are measured in astronomical units, i.e. Earth = 1.

Update: The fit is even better if we include Ceres, the largest object in the asteroid belt. It’s now called a dwarf planet, but it was considered a planet for the first fifty years after it was discovered.

Distances in our solar system including planets and dwarf planets

Extrasolar planets

Is this just true of our solar system, or is it true of other planetary systems as well? At the time of writing, we know of one planetary system outside our own that has 8 planets. Three other systems have around 7 planets; I say “around” because it depends on whether you include unconfirmed planets. The spacing between planets in these other systems is also fairly even on a logarithmic scale. Data on planet distances is taken from each system’s Wikipedia page. Distances are semimajor axes, not average distances.

Kepler-90

Kepler-90 is the only planetary system outside our own with eight confirmed planets that we know of.

Distances to planets in Kepler-90 system

HD 10180

HD 10180 has seven confirmed planets and two unconfirmed planets. The unconfirmed planets are included below, the 3rd and 6th objects.

Distances to planets in HD 10180 system

HR 8832

The next largest system is HR 8832 with five confirmed planets and two unconfirmed, numbers 5 and 6 below. It would work out well for this post if the 6th object were found to be a little closer to its star.

Distances to planets in HR 8832 system

TRAPPIST-1

TRAPPIST-1 is interesting because the planets are very close to their star, ranging from 0.01 to 0.06 AU. Also, this system follows an even logarithmic spacing more than the others.

Distances to planets in TRAPPIST-1 system

Systems with six planets

There are currently four known systems with four planets: Kepler-11, Kepler-20, HD 40307, and HD 34445. They also appear to have planets evenly spaced on a log scale.

Distances between planets in known systems with six planets

Like TRAPPIST-1, Kepler-20 has planets closer in and more evenly spaced (on a log scale).

Related posts

Mercury and the bandwagon effect

Mercury

The study of the planet Mercury provides two examples of the bandwagon effect. In her new book Worlds Fantastic, Worlds Familiar, planetary astronomer Bonnie Buratti writes

The study of Mercury … illustrates one of the most confounding bugaboos of the scientific method: the bandwagon effect. Scientists are only human, and they impose their own prejudices and foregone conclusions on their experiments.

Around 1800, Johann Schroeter determined that Mercury had a rotational period of 24 hours. This view held for eight decades.

In the 1880’s, Giovanni Schiaparelli determined that Mercury was tidally locked, making one rotation on its axis for every orbits around the sun. This view also held for eight decades.

In 1965, radar measurements of Mercury showed that Mercury completes 3 rotations in every 2 orbits around the sun.

Studying Mercury is difficult since it is only visible near the horizon and around sunrise and sunset, i.e. when the sun’s light interferes. And it is understandable that someone would confuse a 3:2 resonance with tidal locking. Still, for two periods of eight decades each, astronomers looked at Mercury and concluded what they expected.

The difficulty of seeing Mercury objectively was compounded by two incorrect but satisfying metaphors. First that Mercury was like Earth, rotating every 24 hours, then that Mercury was like the moon, orbiting the sun the same way the moon orbits Earth.

Buratti mentions the famous Millikan oil drop experiment as another example of the bandwagon effect.

… Millikan’s value for the electron’s charge was slightly in error—he had used a wrong value for the viscosity of air. But future experimenters all seemed to get Millikan’s number. Having done the experiment myself I can see that they just picked those values that agreed with previous results.

Buratti explains that Millikan’s experiment is hard to do and “it is impossible to successfully do it without abandoning most data.” This is what I like to call acceptance-rejection modeling.

The name comes from the acceptance-rejection method of random number generation. For example, the obvious way to generate truncated normal random values is to generate (unrestricted) normal random values and simply throw out the ones that lie outside the interval we’d like to truncate to. This is inefficient if we’re truncating to a small interval, but it always works. We’re conforming our samples to a pre-determined distribution, which is OK when we do it intentionally. The problem comes when we do it unintentionally.

Photo of Mercury above via NASA

Freudian hypothesis testing

Sigmund Freud

In his paper Mindless statistics, Gerd Gigerenzer uses a Freudian analogy to describe the mental conflict researchers experience over statistical hypothesis testing. He says that the “statistical ritual” of NHST (null hypothesis significance testing) “is a form of conflict resolution, like compulsive hand washing.”

In Gigerenzer’s analogy, the id represents Bayesian analysis. Deep down, a researcher wants to know the probabilities of hypotheses being true. This is something that Bayesian statistics makes possible, but more conventional frequentist statistics does not.

The ego represents R. A. Fisher’s significance testing: specify a null hypothesis only, not an alternative, and report a p-value. Significance is calculated after collecting the data. This makes it easy to publish papers. The researcher never clearly states his hypothesis, and yet takes credit for having established it after rejecting the null. This leads to feelings of guilt and shame.

The superego represents the Neyman-Pearson version of hypothesis testing: pre-specified alternative hypotheses, power and sample size calculations, etc. Neyman and Pearson insist that hypothesis testing is about what to do, not what to believe. [1]

 

Click to learn more about Bayesian statistics consulting

 

I assume Gigerenzer doesn’t take this analogy too seriously. In context, it’s a humorous interlude in his polemic against rote statistical ritual.

But there really is a conflict in hypothesis testing. Researchers naturally think in Bayesian terms, and interpret frequentist results as if they were Bayesian. They really do want probabilities associated with hypotheses, and will imagine they have them even though frequentist theory explicitly forbids this. The rest of the analogy, comparing the ego and superego to Fisher and Neyman-Pearson respectively, seems weaker to me. But I suppose you could imagine Neyman and Pearson playing the role of your conscience, making you feel guilty about the pragmatic but unprincipled use of p-values.

* * *

[1] “No test based upon a theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis. But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern behaviour in regard to them, in following which we insure that, in the long run of experience, we shall not often be wrong.”

Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philos Trans Roy Soc A, 1933;231:289, 337.

The intersection of genomes is empty

From this story in Quanta Magazine:

In fact, there’s no single set of genes that all living things need in order to exist. When scientists first began searching for such a thing 20 years ago, they hoped that simply comparing the genome sequences from a bunch of different species would reveal an essential core shared by all species. But as the number of genome sequences blossomed, that essential core disappeared. In 2010, David Ussery, a biologist at Oak Ridge National Laboratory in Tennessee, and his collaborators compared 1,000 genomes. They found that not a single gene is shared across all of life.

Continuum between anecdote and data

The difference between anecdotal evidence and data is overstated. People often have in mind this dividing line where observations on one side are worthless and observations on the other side are trustworthy. But there’s no such dividing line. Observations are data, but some observations are more valuable than others, and there’s a continuum of value.

Rib eye steak

I believe rib eye steaks are better for you than rat poison. My basis for that belief is anecdotal evidence. People who have eaten rib eye steaks have fared better than people who have eaten rat poison. I don’t have exact numbers on that, but I’m pretty sure it’s true. I have more confidence in that than in any clinical trial conclusion.

Hearsay evidence about food isn’t very valuable, per observation, but since millions of people have eaten steak for thousands of years, the cumulative weight of evidence is pretty good that steak is harmless if not good for you. The number of people who have eaten rat poison is much smaller, but given the large effect size, there’s ample reason to suspect that eating rat poison is a bad idea.

Now suppose you want to get more specific and determine whether rib eye steaks are good for you in particular. (I wouldn’t suggest trying rat poison.) Suppose you’ve noticed that you feel better after eating a steak. Is that an anecdote or data? What if you look back through your diary and noticed that every mention of eating steak lately has been followed by some remark about feeling better than usual. Is that data? What if you decide to flip a coin each day for the next month and eat steak if the coin comes up heads and tofu otherwise. Each of these steps is an improvement, but there’s no magical line you cross between anecdote and data.

Suppose you’re destructively testing the strength of concrete samples. There are better and worse ways to conduct such experiments, but each sample gives you valuable data. If you test 10 samples and they all withstand two tons of force per square inch, you have good reason to believe the concrete the samples were taken from can withstand such force. But if you test a drug on 10 patients, you can’t have the same confidence that the drug is effective. Human subjects are more complicated than concrete samples. Concrete samples aren’t subject to placebo effects. Also, cause and effect are more clear for concrete. If you apply a load and the sample breaks, you can assume the load caused the failure. If you treat a human for a disease and they recover, you can’t be as sure that the treatment caused the recovery. That doesn’t mean medical observations aren’t data.

Carefully collected observations in one area may be less statistically valuable than anecdotal observations in another. Observations are never ideal. There’s always some degree of bias, effects that can’t be controlled, etc. There’s no quantum leap between useless anecdotes and perfectly informative data. Some data are easy to draw inference from, but data that’s harder to understand doesn’t fail to be data.

Click to learn more about Bayesian statistics consulting

Skin in the game for observational studies

The article Deming, data and observational studies by S. Stanley Young and Alan Karr opens with

Any claim coming from an observational study is most likely to be wrong.

They back up this assertion with data about observational studies later contradicted by prospective studies.

Much has been said lately about the assertion that most published results are false, particularly observational studies in medicine, and I won’t rehash that discussion here. Instead I want to cut to the process Young and Karr propose for improving the quality of observational studies. They summarize their proposal as follows.

The main technical idea is to split the data into two data sets, a modelling data set and a holdout data set. The main operational idea is to require the journal to accept or reject the paper based on an analysis of the modelling data set without knowing the results of applying the methods used for the modelling set on the holdout set and to publish an addendum to the paper giving the results of the analysis of the holdout set.

They then describe an eight-step process in detail. One step is that cleaning the data and dividing it into a modelling set and a holdout set would be done by different people than the modelling and analysis. They then explain why this would lead to more truthful publications.

The holdout set is the key. Both the author and the journal know there is a sword of Damocles over their heads. Both stand to be embarrassed if the holdout set does not support the original claims of the author.

* * *

The full title of the article is Deming, data and observational studies: A process out of control and needing fixing. It appeared in the September 2011 issue of Significance.

Update: The article can be found here.

How did our ancestors sleep?

Electric lighting has changed the way we sleep, encouraging us to lose sleep by staying awake much longer after dark than we otherwise would.

Or maybe not. A new study of three contemporary hunter-gatherer tribes found that they stay awake long after dark and sleep an average of 6.5 hours a night. They also don’t nap much [1]. This suggests the way we sleep may not be that different from our ancient forebears.

Historian A. Roger Ekirch suggested that before electric lighting it was common to sleep in two four-hour segments with an hour or so of wakefulness in between. His theory was based primarily on medieval English texts that refer to “first sleep” and “second sleep” and has other literary support as well. A small study found that subjects settled into the sleep pattern Ekirch predicted when they were in a dark room for 14 hours each night for a month. But the hunter-gatherers don’t sleep this way.

Maybe latitude is an important factor. The hunter-gatherers mentioned above live between 2 and 20 degrees south of the equator whereas England is 52 degrees north of the equator. Maybe two-phase sleep was more common at high latitudes with long winter nights. Of course there are many differences between modern/ancient [2] hunter-gatherers and medieval Western Europeans besides latitude.

Two studies have found two patterns of how people sleep without electric lights. Maybe electric lights don’t have as much impact on how people sleep as other factors.

Related post: Paleolithic nonsense

* * *

[1] The study participants were given something like a Fitbit to wear. The article said that naps less than 15 minutes would be below the resolution of the monitors, so we don’t know how often the participants took cat naps. We only know that they rarely took longer naps.

[2] There is an implicit assumption that the contemporary hunter-gatherers live and, in particular, sleep like their ancient ancestors. This seems reasonable, though we can’t be certain. There is also the bigger assumption that the tribesmen represent not only their ancestors but all paleolithic humans. Maybe they do, and we don’t have much else to go on, but we don’t know. I suspect there was more diversity in the paleolithic era than we assume.

The name we give to bright ideas

From The Book of Strange New Things:

… I said that if science could come up with something like the Jump it could surely solve a problem like that. Severin seized hold of that word, “science.” Science, he said, is not some mysterious larger-than-life force, it’s just the name we give to bright ideas that individual guys have when they’re lying in bed at night, and that if the fuel thing bothered me so much, there was nothing stopping me from having a bright idea to solve it …

Subway map of the solar system

This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out.

It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the thumbnail above.

New development in cancer research scandal

My interest in the Anil Potti scandal started when my former colleagues could not reproduce the analysis in one of Potti’s papers. (Actually, they did reproduce the analysis, at great effort, in the sense of forensically determining the erroneous steps that were carried out.) Two years ago, the story was on 60 Minutes. The straw that broke the camel’s back was not bad science but résumé padding.

It looks like the story is a matter of fraud rather than sloppiness. This is unfortunate because sloppiness is much more pervasive than fraud, and this could have made a great case study of bad analysis. However, one could look at it as a case study in how good analysis (by the folks at MD Anderson) can uncover fraud.

Now there’s a new development in the Potti saga. The latest issue of The Cancer Letter contains letters by whistle-blower Bradford Perez who warned officials at Duke about problems with Potti’s research.

Go anywhere in the universe in two years

Here’s a totally impractical but fun back-of-the-envelope calculation from Bob Martin.

Suppose you have a space ship that could accelerate at 1 g for as long as you like. Inside the ship you would feel the same gravity as on earth. You could travel wherever you like by accelerating at 1 g for the first half of the flight then reversing acceleration for the second half of the flight. This approach could take you to Mars in three days.

If you could accelerate at 1 g for a year you could reach the speed of light, and travel half a light year. So you could reverse your acceleration and reach a destination a light year away in two years. But this ignores relativity. Once you’re traveling at near the speed of light, time practically stops for you, so you could keep going as far as you like without taking any more time from your perspective. So you could travel anywhere in the universe in two years!

Of course there are a few problems. We have no way to sustain such acceleration. Or to build a ship that could sustain an impact with a spec of dust when traveling at relativistic speed. And the calculation ignores relativity until it throws it in at the end. Still, it’s fun to think about.

Update: Dan Piponi gives a calculation on G+ that addresses the last of the problems I mentioned above, sticking relativity on to the end of a classical calculation. He does a proper relativistic calculation from the beginning.

If you take the radius of the observable universe to be 45 billion light years, then I think you need about 12.5 g to get anywhere in it in 2 years. (Both those quantities as measured in the frame of reference of the traveler.)

If you travel at constant acceleration a for time t then the distance covered is c^2/a (cosh(a t/c) – 1) (Note that gives the usual a t^2/2 for small t.)

Timid medical research

Cancer research is sometimes criticized for being timid. Drug companies run enormous trials looking for small improvements. Critics say they should run smaller trials and more of them.

Which side is correct depends on what’s out there waiting to be discovered, which of course we don’t know. We can only guess. Timid research is rational if you believe there are only marginal improvements that are likely to be discovered.

Sample size increases quickly as the size of the effect you’re trying to find decreases. To establish small differences in effect, you need very large trials.

If you think there are only small improvements on the status quo available to explore, you’ll explore each of the possibilities very carefully. On the other hand, if you think there’s a miracle drug in the pipeline waiting to be discovered, you’ll be willing to risk falsely rejecting small improvements along the way in order to get to the big improvement.

Suppose there are 500 drugs waiting to be tested. All of these are only 10% effective except for one that is 100% effective. You could quickly find the winner by giving each candidate to one patient. For every drug whose patient responded, repeat the process until only one drug is left. One strike and you’re out. You’re likely to find the winner in three rounds, treating fewer than 600 patients. But if all the drugs are 10% effective except one that’s 11% effective,  you’d need hundreds of trials with thousands of patients each.

The best research strategy depends on what you believe is out there to be found. People who know nothing about cancer often believe we could find a cure soon if we just spend a little more money on research. Experts are more sanguine, except when they’re asking for money.