Suppose a new study comes out saying a drug or a food or a habit lowers your risk of some disease. What is the probability that the study’s result is correct? Obviously this is a very important question, but one that is not raised often enough.
I’ve referred to a paper by John Ioannidis (*) several times before, but I haven’t gone over the model he uses to support his claim that most study results are false. This post will look at some equations he derives for estimating the probability that a claimed positive result is correct.
First of all, let R be the ratio of positive findings to negative findings being investigated in a particular area. Of course we never know exactly what R is, but let’s pretend that somehow we knew that out of 1000 hypotheses being investigated in some area, 200 are correct. Then R would be 200/800 = 0.25. The value of R varies quite a bit, being relatively large in some fields of study and quite small in others. Imagine researchers pulling hypotheses to investigate from a hat. The probability of selecting a hypothesis that really is true would be R/(R+1) and the probability selecting a false hypothesis is 1/(R+1).
Let α be the probability of incorrectly declaring a false hypothesis to be true. Studies are often designed with the goal that α would be 0.05. Let β be the probability that a study would incorrectly conclude that that a true hypothesis is false. In practice, β is far more variable than α. You might find study designs with β anywhere from 0.5 down to 0.01. The design choice β = 0.20 is common in some contexts.
There are two ways to publish a study claiming a new result: you could have selected a true hypothesis and correctly concluded that it was true, or you could have selected a false but incorrectly concluded it was true. The former has probability
(1 − β)R/(R + 1)
and the latter has probability
α/(R + 1).
The total probability of concluding a hypothesis is true, correctly or incorrectly, is the sum of these probabilities, i.e.
((1 − β)R + α)/(R + 1).
The probability that a study conclusion is true given that you concluded it was true, the positive predictive value or PPV, is the ratio of
(1 − β)R/(R + 1)
to
((1 − β)R + α)/(R + 1).
In summary, under the assumptions above, the probability of a claimed result being true is
(1 − β)R/((1 − β)R + α).
If (1 − β)R < α then the model say that a claim is more likely to be false than true. This can happen if R is small, i.e. there are not a large proportion of true results under investigation, and if β is large, i.e. if studies are small. If R is smaller than α, most studies will be false no matter how small you make β, i.e. no matter how large the study. This says that in a challenging area, where few of the ideas being investigated lead to progress, there will be a large proportion of false results published, even if the individual researchers are honest and careful.
Ioannidis develops two other models refining the model above. Suppose that because of bias, some proportion of results that would otherwise have been reported as negative are reported as positive. Call this proportion u. The derivation of the positive predictive value is similar to that in the previous model, but messier. The final result is
R(1 − β + uβ)/(R(1 − β + uβ) + α + u − αu).
If 1 − β > α, which is nearly always the case, then the probability of a reported result being correct decreases as bias increases.
The final model considers the impact of multiple investigators testing the same hypothesis. If more people try to prove the same thing, it’s more likely that someone will get lucky and “prove” it, whether or not the thing to be proven is true. Leaving aside bias, if n investigators are testing each hypothesis, the probability that a positive claim is true is given by
R(1 − βn)/(R + 1 − (1 − α)n − Rβn).
As n increases, the probability of a positive claim being true decreases.
The probability of a result being true is often much lower than is commonly believed. One reason is that hypothesis testing focuses on the probability of the data given a hypothesis rather than the probability of a hypothesis given the data. Calculating the probability of a hypothesis given data relies on prior probabilities, such as the factors R/(R + 1) and 1/(R + 1) above. These prior probabilities are elusive and controversial, but they are critical in evaluating how likely it is that claimed results are true.
Related: Adaptive clinical trial design
(*) John P. A. Ioannidis, Why most published research findings are false. CHANCE volume 18, number 4, 2005.
Hello John,
Could you explain how did you concluded that “The probability of selecting a hypothesis that really is true would be R/(R+1)”? I was wondering if this result came from a binomial distribution, or even a hypergeometric one. However, in both cases I couldn’t figure out how to conclude this.
Thanks very much in advance.
Manoel: Let P be the number of positive hypotheses being investigated and let N be the number of negative hypotheses. R is defined as P/N. Then the probability of picking a positive hypothesis to investigate is P/(N+P) = NR/(NR + N) = R/(R+1).
I’ve wondered why Ioannidis formulated his paper in terms of odds rather than probabilities. The latter might have been clearer. Calculating probabilities from odds isn’t hard, but it interrupts the flow of reading the paper.
I’m no high-powered mathematician, but this seems counter-intuitive to me. I wonder if it’s because of the difference between a de novo hypothesis and one that’s been subjected to experimental testing. I can see a minority of dreamed-up hypotheses being correct — I’ve had a lot of those myself — but once data are accumulated and the “disproven” hypotheses weeded out, that should improve the odds of the remaining hypotheses being correct. Otherwise, what’s the point of scientific investigation? A lot of the old synapses (emphasis on OLD) are degraded by time, but I must be missing something here.
Its curious that these results arent more popularly known. It seems to me that statisticians, when they have something to gain in capitalism, dont mind ignoring these results. And that these results are de-emphasized and kept on the back burner of society. Statisticians and scientists alike dont mind maintaining the illusion and delusion for the rest of society, arrogantly believing their math and their methods are not wrong. Its very typical. Whenever rational thought emerges that forces us to question our precepts, it disappears into the ether and its author is usually scorned and mocked into obscurity. (aliens built the pyramids, and other such things). Here is a good argument saying that, with all the certainty we try to build into our methods, it is still likely to be wrong… sort of makes moot all of our efforts thus far.