Five criticisms of significance testing

by John on November 18, 2008

The follow list summarizes five criticisms of significance testing as it is commonly practiced.

  1. Andrew Gelman: In reality, null hypotheses are nearly always false. Is drug A identically effective as drug B? Certainly not. You know before doing an experiment that there must be some difference that would show up given enough data.
  2. Jim Berger: A small p-value means the data were unlikely under the null hypothesis. Maybe the data were just as unlikely under the alternative hypothesis. Comparisons of hypotheses should be conditional on the data.
  3. Stephen Ziliak and Deirdra McCloskey: Statistical significance is not the same as scientific significance. The most important question for science is the size of an effect, not whether the effect exists.
  4. William Gosset: Statistical error is only one component of real error, maybe a small component. When you actually conduct multiple experiments rather than speculate about hypothetical experiments, the variability of your data goes up.
  5. John Ioannidis: Small p-values do not mean small probability of being wrong. In one review, 74% of studies with p-value 0.05 were found to be wrong.

Related posts:
Statistically significant but incorrect
False positives for medical papers

{ 3 comments… read them below or add one }

1

Pedro 11.18.08 at 21:50

I’ve never understood the alternative though. Confidence intervals? What if the best prediction a theory can make is: A and B should be different. In my field (psycholinguistics), we’re not at the point yet where we can say, under condition A, I expect at 250 ms reaction time, and under B a 300 ms reaction time.

I’m happy to accept that significance testing is a bad tool. What is a good replacement?

2

John 11.18.08 at 22:02

Using Bayes factors — the ratio of the posterior probabilities of the two hypotheses given the data — gets around some of the weaknesses of significance testing, especially the criticisms of Jim Berger and John Ioannidis. In fact, part of the problem with the naive use of p-values is that people often think that they are Bayes factors, without using that term.

Bayes factors have an intuitive interpretation in terms of decibels by analogy to sound intensity.

3

Dan 11.20.08 at 23:35

Significance testing tends to be over utilized. Everyone wants that p-value for their publication (and the journals demand it), so a lot of hypothesis tests are forced when a simple estimate might have been more appropriate. It important to remember that a significance test is not the end of the investigation, just a step in the process.

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>