Most published research results are false

John Ioannidis wrote an article in Chance magazine a couple years ago with the provocative title Why Most Published Research Findings are False.  [Update: Here's a link to the PLoS article reprinted by Chance. And here are some notes on the details of the paper.] Are published results really that bad? If so, what’s going wrong?

Whether “most” published results are false depends on context, but a large percentage of published results are indeed false. Ioannidis published a report in JAMA looking at some of the most highly-cited studies from the most prestigious journals. Of the studies he considered, 32% were found to have either incorrect or exaggerated results. Of those studies with a 0.05 p-value, 74% were incorrect.

The underlying causes of the high false-positive rate are subtle, but one problem is the pervasive use of p-values as measures of evidence.

Folklore has it that a “p-value” is the probability that a study’s conclusion is wrong, and so a 0.05 p-value would mean the researcher should be 95 percent sure that the results are correct. In this case, folklore is absolutely wrong. And yet most journals accept a p-value of 0.05 or smaller as sufficient evidence.

Here’s an example that shows how p-values can be misleading. Suppose you have 1,000 totally ineffective drugs to test. About 1 out of every 20 trials will produce a p-value of 0.05 or smaller by chance, so about 50 trials out of the 1,000 will have a “significant” result, and only those studies will publish their results. The error rate in the lab was indeed 5%, but the error rate in the literature coming out of the lab is 100 percent!

The example above is exaggerated, but look at the JAMA study results again. In a sample of real medical experiments, 32% of those with “significant” results were wrong. And among those that just barely showed significance, 74% were wrong.

See Jim Berger’s criticisms of p-values for more technical depth.

Tagged with: ,
Posted in Clinical trials, Science, Statistics
10 comments on “Most published research results are false
  1. Mauro says:

    Hi,

    Many times I see researchs saying that “some thing” was tested in thousands of people where, compared to the world population, it is less than 0,001%… so how can I believe that the investigation is correct?

    Best regards,
    Mauro

  2. John says:

    Basing a conclusion on a very small subset of the world population may be legitimate. It all depends on whether the sample is representative. One of the surprising results from statistics is that the quality of an inference depends only on the size of the sample, not on the size of the population the sample was drawn from. (Assuming the population is so large that you can safely ignore the difference between sampling with and without replacement, which is true of the world population.)

  3. Laurent Bossavit says:

    See also Cosma Shalizi’s post, “The Neutral Model of Inquiry”: http://cscs.umich.edu/~crshalizi/weblog/698.html

  4. Adam Keck says:

    Filtering publishable results by desirable p-value (last paragraph) sounds like a classic case of Survivor Bias. Correct?

  5. John says:

    You could say there’s a survival bias: only those experiments that meet some statistical requirement are published. But it’s more subtle than that. You’ve got to have some procedure for deciding which results are correct. I would argue that p-values are not the right filter, or that at a minimum p-values are incorrectly interpreted.

  6. Jim Moult says:

    It appears relevant to me to note that — though the .05 p-value indicates a 1 in 20 chance that the null hypothesis should not have been rejected, — in cases where one is testing whether treatment A performs better than treatment B, that the likelihood that treatment A was in fact better — when the result reported was that treatment A was better with p-value < .05 — includes a high proportion of cases in which A was at least as good. Put another way, I am unlikely to be making a truly bad decision to accept and act on selecting A in preference to B, even though I may be mistaken that it is truly better. Placing in context, some of us are primarily concerned with making a choice, and not primarily with the certainty that our choice is superior. Out certainty is not unimportant, it is just of less importance. I find the discussion and observation itself fascinating.

  7. Woody says:

    But, proving it is wrong can be wrong as well.

  8. I. J. Kennedy says:

    Here’s a fresh report that says the situation is bad, but not as bad as all that.
    http://www.technologyreview.com/view/510126/the-statistical-puzzle-over-how-much-biomedical-research-is-wrong/

  9. John says:

    The more optimistic report seems to be based only on a theoretical model, but the more pessimistic result has empirical support which the article discusses at the bottom.

10 Pings/Trackbacks for "Most published research results are false"
  1. [...] John Ioannidis: p-valores pequeños no implican una probabilidad pequeña de que la hipótesis nula sea incorrecta. En una revisión de estudios médicos se encontró que el 74% de los estudios con p-valores menores que 0.05 llegaban a conclusiones erróneas. [...]

  2. [...] en el artículo del Plos como en esta entrada del blog “The Endeavour” se intenta buscar el culpable de este fenómeno. Todas las pruebas inculpan a la indolente [...]

  3. [...] posts: Most published research results are false Canonical examples from robust statistics ? [...]

  4. [...] This post was mentioned on Twitter by Magnus Lie Hetland, topkara, HN Firehose, Renan Birck Pinheiro, Fumihiro CHIBA and others. Fumihiro CHIBA said: memo: Most published research results are false — The Endeavour http://bit.ly/dSkd3h [...]

  5. [...] Reporting p-values Posted on February 14, 2011 by somikraha In a previous article, I have already tackled the fundamental notions of frequentist statistics as violating the desiderata of science. Be that as it may, the stupendous extent of this approach’s contradiction with its own norms merits some attention. This is Ioannidis’ main argument, as also John Cook (see Cook’s blog post). [...]

  6. [...] and Lehrer provide an explanation for this failure, which is summarized in John Cook’s blog post: Suppose you have 1,000 totally ineffective drugs to test. About 1 out of every 20 trials will [...]

  7. [...] into thinking all of this sounds rational and sophisticated. An example by applied mathematician John D. Cook illustrates the fundamental problem. By the 95% logic, out of 1000 pre-publication studies of a [...]

  8. [...] El culto al p-value (o por qué una gran parte de los documentos de  investigación están mal) [...]

  9. [...] Most published research results are false Classical statistics in a nutshell [...]

  10. [...] such as biology and epidemiology — soft compared to physics, but hard compared to sociology — often get things wrong. In softer sciences, research results might be not even [...]