The follow list summarizes five criticisms of significance testing as it is commonly practiced.
- Andrew Gelman: In reality, null hypotheses are nearly always false. Is drug A identically effective as drug B? Certainly not. You know before doing an experiment that there must be some difference that would show up given enough data.
- Jim Berger: A small p-value means the data were unlikely under the null hypothesis. Maybe the data were just as unlikely under the alternative hypothesis. Comparisons of hypotheses should be conditional on the data.
- Stephen Ziliak and Deirdra McCloskey: Statistical significance is not the same as scientific significance. The most important question for science is the size of an effect, not whether the effect exists.
- William Gosset: Statistical error is only one component of real error, maybe a small component. When you actually conduct multiple experiments rather than speculate about hypothetical experiments, the variability of your data goes up.
- John Ioannidis: Small p-values do not mean small probability of being wrong. In one review, 74% of studies with p-value 0.05 were found to be wrong.