John Ioannidis stirred up a healthy debate when he published Why Most Published Research Findings Are False. Unfortunately, most of the discussion has been over whether the word “most” is correct, i.e. whether the proportion of false results is more or less than 50 percent. At least there is more awareness that some published results are false and that it would be good to have some estimate of the proportion.
However, a more fundamental point has been lost. At the core of Ioannidis’ paper is the assertion that the proportion of true hypotheses under investigation matters. In terms of Bayes’ theorem, the posterior probability of a result being correct depends on the prior probability of the result being correct. This prior probability is vitally important, and it varies from field to field.
In a field where it is hard to come up with good hypotheses to investigate, most researchers will be testing false hypotheses, and most of their positive results will be coincidences. In another field where people have a good idea what ought to be true before doing an experiment, most researchers will be testing true hypotheses and most positive results will be correct.
For example, it’s very difficult to come up with a better cancer treatment. Drugs that kill cancer in a petri dish or in animal models usually don’t work in humans. One reason is that these drugs may cause too much collateral damage to healthy tissue. Another reason is that treating human tumors is more complex than treating artificially induced tumors in lab animals. Of all cancer treatments that appear to be an improvement in early trials, very few end up receiving regulatory approval and changing clinical practice.
A greater proportion of physics hypotheses are correct because physics has powerful theories to guide the selection of experiments. Experimental physics often succeeds because it has good support from theoretical physics. Cancer research is more empirical because there is little reliable predictive theory. This means that a published result in physics is more likely to be true than a published result in oncology.
Whether “most” published results are false depends on context. The proportion of false results varies across fields. It is high in some areas and low in others.
John:
I agree with your general point but I think it would be even better to frame, not in terms of “true” and “false” but rather in terms of size, direction, and variability of effects (or, more generally, differences). Is an effect of 0.01 (or an odds ratio of 1.01) equivalent to zero (i.e., a “false hypothesis”)? Maybe, maybe not, it depends on the context, it depends on whether the +0.01 in one setting will end up being -0.01 in another. Again, I agree with the general thrust of what you are writing, but I think that in most fields it will make less sense to talk about “proportion of hypotheses that are true” than to talk about the distribution of the size and variability of effects.
Andrew: It’s easier for me to write about, and easier for most readers to understand, binary outcomes. A more nuanced article would consider the distribution of effect sizes as you suggest.
While your point is, of course, correct, I am not sure that there are many fields where it is easy to come up with true and novel statements.
The problem is that researchers will quickly exhaust all the obvious results and after a few years, you will be back to the unknowns where it is very hard to know what else might be true.
You take physics… well, who can come up with a novel testable theory that is also true? Depends what you mean by novel… if you are just deriving it easily from known facts, then it is not very hard, but it may also not be publishable. But if you mean a genuinely novel testable theory… well, I don’t think this is easy at all! I think that if you work on the frontier of our knowledge in physics, it must be very easy to be wrong.
In my work, I try to design better ways to index data, or faster algorithms. If it were easy to come up with a faster algorithm, previous work would have covered it. To make progress, I have to go after non-obvious results. This means that I am almost always wrong…
Of course, methodologically, I am less likely to get good results “out of sheer luck”. That is, if I say “my software x is faster than software y on data z”… well, that’s probably true and reproducible. The problem is with the extended version of this statement (e.g., any implementation comparable to x is faster than any implementation comparable to y on any data).
I don’t know how often I am wrong in the extended sense, but I can tell you that I am *very* excited when anyone confirms my findings. I never take it for granted that others will be able to confirm my results.
One way to be original is to be trivial. Find a minor unimportant variation on something well known and run with it. That accounts for a vast amount of the literature.
Maybe journals should publish more articles like this: “I attacked this important problem head-on, and here are 10 things I tried. Unfortunately, none of them worked.” That seems like more of a contribution than a positive result on an artificial and unimportant problem.
I think the issue is not negative results so much as negative results misrepresented as positive results.
John –
I completely agree with what you say here. I think estimating the rate of false discoveries is an interesting question statistically, but not necessarily scientifically informative. Certainly not as informative as determining best practices for reducing false discoveries.
The reason I’ve engaged in the discussion is I’m seriously concerned the research community is being severely damaged by the press and hype surrounding these issues. I have read transcripts from congressional hearings and heard from administrators who have to appear before congress that this issue is coming up.
If the common wisdom is most research is false that could have major political consequences for science funding in a time when cutting science funding is very expedient. I think, as responsible scientists, we should be very careful about making sweeping statements about scientific practice based on what is relatively little real data that addresses that question.
Jeff
Research results may be more questionable in areas such as sociology, education, and humanistic psychology which are driven by questionable theories.