I recently found out about a book that was published earlier this year, The Cult of Statistical Significance by Stephen Ziliak and Deidra McCloskey. The subtitle is sure to stir up controversy: How the Standard Error Costs Us Jobs, Justice, and Lives.
From the parts I’ve read it sounds like the central criticism of the book is that statistical significance is not necessarily scientific significance. Statistical significance questions whether an effect exists and is unconcerned with the size or importance of the effect.
Significance testing errs in two directions. First, in practice many people believe that any hypothesis with a p-value less than 0.05 is very likely true and important, though often such hypotheses are untrue and unimportant. Second, many act as if a hypothesis with a p-value greater than 0.05 is “insignificant” regardless of context. Not only is the 0.05 cutoff arbitrary, it is quite common to say there is evidence if p = 0.049 and to say there is no evidence if p = 0.051. Common sense tells you that if 0.049 provides evidence then 0.051 provides slightly less evidence rather than no evidence.
The book gives the example of Merck saying there is “no evidence” that Vioxx has a higher probability of causing heart attacks than naproxen because their study did not achieve the magical 0.05 significance level. The book argues that “significance” should depend on context. When the stakes are higher, such as people suffering heart attacks, it should take less evidence before we declare an effect significant. Also, if you don’t want to find significance, you can always reduce the size of your study to decrease your chances of finding significance. [I have not followed the Vioxx case and have no opinion on its specifics.] In addition to the Vioxx case, Ziliak and McCloskey provide case studies in economics, psychology, and medicine.
Whenever someone raises objections to significance testing the reaction is always “Yes, everyone knows that.” Everyone agrees that the 0.05 cutoff is arbitrary, everyone agrees that effect sizes matter, etc. And yet nearly everyone continues to play the p < 0.05 game.