John Ioannidis wrote an article in Chance magazine a couple years ago with the provocative title Why Most Published Research Findings are False. Are published results really that bad? If so, what’s going wrong?
Whether “most” published results are false depends on context, but a large percentage of published results are indeed false. Ioannidis published a report in JAMA looking at some of the most highly-cited studies from the most prestigious journals. Of the studies he considered, 32% were found to have either incorrect or exaggerated results. Of those studies with a 0.05 p-value, 74% were incorrect.
The underlying causes of the high false-positive rate are subtle, but one problem is the pervasive use of p-values as measures of evidence.
Folklore has it that a “p-value” is the probability that a study’s conclusion is wrong, and so a 0.05 p-value would mean the researcher should be 95 percent sure that the results are correct. In this case, folklore is absolutely wrong. And yet most journals accept a p-value of 0.05 or smaller as sufficient evidence.
Here’s an example that shows how p-values can be misleading. Suppose you have 1,000 totally ineffective drugs to test. About 1 out of every 20 trials will produce a p-value of 0.05 or smaller by chance, so about 50 trials out of the 1,000 will have a “significant” result, and only those studies will publish their results. The error rate in the lab was indeed 5%, but the error rate in the literature coming out of the lab is 100 percent!
The example above is exaggerated, but look at the JAMA study results again. In a sample of real medical experiments, 32% of those with “significant” results were wrong. And among those that just barely showed significance, 74% were wrong.
See Jim Berger’s criticisms of p-values for more technical depth.

{ 1 trackback }
{ 2 comments… read them below or add one }
Mauro 10.30.08 at 15:13
Hi,
Many times I see researchs saying that “some thing” was tested in thousands of people where, compared to the world population, it is less than 0,001%… so how can I believe that the investigation is correct?
Best regards,
Mauro
John 10.30.08 at 16:02
Basing a conclusion on a very small subset of the world population may be legitimate. It all depends on whether the sample is representative. One of the surprising results from statistics is that the quality of an inference depends only on the size of the sample, not on the size of the population the sample was drawn from. (Assuming the population is so large that you can safely ignore the difference between sampling with and without replacement, which is true of the world population.)