Suppose you’re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision *in theory* while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of “in theory” changes because you have two competing theories.

When you compare the power of two methods, you’re evaluating each method’s probability of success *under its own assumptions*. In other words, **you’re picking the method that has the better opinion of itself**. Thus the more powerful method is not necessarily the method that has the better chance of leading you to a correct conclusion.

Comparing power alone is not enough. You also need to evaluate whether a method makes realistic assumptions and whether it is robust to deviations from its assumptions.

**Related posts**:

Can you illustrate this with a concrete example? I think I understand the idea, but…

In almost all scenarios in science the strict null hypothesis is false, so the more powerful statistical method should be used. Of more importance is the effect size, often best illustrated using confidence intervals.

Eugene: Here’s an example. Consider parametric versus non-parametric tests. The former make stronger distributional assumptions, such as assuming data come from a normal distribution. Often a parametric test has more power than its non-parametric analog. That means the parametric test is more likely to find statistical significance

if its distributional assumptions are correct. But if the distributional assumptions are not satisfied, the non-parametric test may be more powerful.People think they’re placing one bet when they’re actually placing two bets. They’re betting on their modeling assumptions being correct, and betting on their method working given the modeling assumptions. Too often only the latter comes to mind.

Q: I agree that strict null hypotheses are almost certainly false, but I’ll leave that aside. Let’s say you want to estimate effect size with a confidence interval as you suggest. Then the smaller interval is better. Assume your data come from a normal distribution. If also you assume you know the variance, you can get a smaller confidence interval than if you acknowledge that you are only estimating the variance. So by making a stronger (unjustified) assumption, you can get “better” results. Your understanding of reality hasn’t improved, but your numbers have. That’s an elementary example, and not too many people would fall for that. But people do analogous things in more subtle circumstances.

“People think they’re placing one bet when they’re actually placing two bets….”

Thanks–your example of parametric tests was just what I needed.

Why would you have the method be robust against deviations of its assumptions? The method is (usually) designed for use when the assumptions are valid. One should have another method that selects between the methods when the assumptions are different, right?

MathDr: Because your assumptions are never entirely valid.

This is a statistical analog to physicists wondering how sensitive a system is to initial conditions. Do small changes in initial conditions lead to small changes in final conditions as in stable systems, or can small changes in initial conditions have enormous consequences as in chaotic systems? You need to know which realm you’re in.

One should have another method that selects between the methods when the assumptions are different, right?Then you need a method to choose the method to choose the method. And assumptions for your method^2 and method^3… Methods all the way down.

Restating in terms of Bayes formula, for hypothesis test we have

p(H1|E)/p(H0|E)=p(E|H1)/p(E|H0) p(H1)/p(H0)

So picking the theory/model with the best likelihood/fit is only a part of the story. We also must consider the relative probability of the competing theories being true, and/or the relative complexity of the models.