I was looking at my daughter’s statistics homework recently, and there were a pair of questions about testing the level of lead in drinking water. One question concerned testing whether the water was safe, and the other concerned testing whether the water was unsafe.

There’s something bizarre, even embarrassing, about this. You want to do two things: estimate the amount of lead, and decide what to do in response. But instead of simply doing just that, you do this arcane dance of choosing two hypotheses, one natural and one arbitrary, and treating the two asymmetrically, depending on which one you call the null and which you call the alternative. This asymmetry is the reason you make a distinction between testing whether the water is safe and testing whether it is unsafe.

It’s a weird tangle of estimation and decision making. The decision-making rules implicit in the procedure are not at all transparent. And even though you are testing the level of lead, you’re doing so indirectly.

The Bayesian approach to the problem is much easier to understand. You estimate the probability distribution for the concentration of lead based on all available information. You can plot this distribution and show it to civil engineers, politicians, or anybody else who needs to make a decision. Non-statisticians are much more likely to understand such a plot than the nuances of null and alternative hypotheses, significance, power, and whether you’re testing for safety versus testing for non-safety. (Statisticians are more likely to understand estimation as well.)

In the homework problems, the allowable level of lead was 15 ppm. After obtaining the posterior distribution on the concentration of lead, you could simply estimate the probability that the concentration is above 15 ppm. But you could also calculate the probability that the concentration lies in any other range you’re interested in.

Classical statistics does not allow such probability calculations. Even a confidence interval, something that looks like a probability statement about the concentration of lead, is actually a probability statement about the statistical process being used and not a probability statement about lead concentration per se.

Part of the problem with classical hypothesis testing is its very name. We *aren’t* testing the hypothesis of whether or not the water is safe. Instead, we’re testing the study design: do we *have enough data* to be confident that the lead level is in some range? And that’s a valid & useful thing to test, just not what the name implies.

The result of a classical test isn’t meant to *be* the decision of whether or not to drink the water. Rather, the result is meant to inform us about whether or not we have enough data to *make* further decisions.

The historical decision to call it “hypothesis testing” has been terribly misleading. I agree that the Bayesian approach has benefits, but I also wish the classical approach had a better name, so we could value what it actually does — not worry that it doesn’t do something it isn’t meant to do.