From Matt Briggs:
I challenge you to find me in any published statistical analysis, outside of an introductory textbook, a confidence interval given the correct interpretation. If you can find even one instance where the [frequentist] confidence interval is not interpreted as a [Bayesian] credible interval, then I will eat your hat.
Most statistical analysis is carried out by people who do not interpret their results correctly. They carry out frequentist procedures and then give the results a Bayesian interpretation. This is not simply a violation of an academic taboo. It means that people generally underestimate the uncertainty in their conclusions.
16 thoughts on “Interpreting statistics”
I’m pretty sure that professional statisticians use confidence interval in the right way. And many econometricians as well.
Manoel: I don’t know about econometricians, but I assure you many professional statisticians don’t understand what they’re doing.
If you think the confidence interval gets misinterpreted, you should ask a random scientist what a p-value means.
I thought statisticians were random scientists …
For the example perhaps most familiar to most people, consider the +/- 3% given to most polling estimates. Most people don’t have the foggiest idea where it comes from or what it means, and even if they are sure they do, they probably do not.
The root cause is not some fundamental mystery, it is that the correct description is ugly and awkward.
I assure you many professional statisticians don’t understand what they’re doing.
I don’t suppose you know any choice excerpts? I could use some for my classes.
This accusation that frequents invariably misinterpret confidence intervals is absurd, and can only seem plausible to those who themselves hear every confidence interval report through Bayesian ears (and thereby beg the question)!
There are a billion balls in a bag, 95% of which are of one color, 5% another. You pull a single ball out at random. What is the probability that the ball is of the majority color? I say it’s 95%. It seems this gives Bayesians the vapors! “No!” they say, “the probability is either one or zero, because the ball, now chosen, either is or is not of the majority color. We can say nothing more!”
Bah humbug, says I.
Steve: A Bayesian would say that our knowledge of the ball’s color is uncertain, and that it is appropriate to model that uncertainty by a probability distribution. If you look at the ball but won’t let me see it, your probability distribution has collapsed to a point (much like an observation in quantum mechanics) but mine has not. We have different subjective probabilities because we have different information.
Oh, come on Mayo, you can’t possibly believe that if you’ve read any of the research on the subject.
Surveys show that professors teaching stats err on the side of Bayesian interpretations of frequentist statements a full third of the time, and other professors (who presumably use or at least encounter frequentist statistics every day in their work) are even worse. (section 4 in this PDF: http://people.umass.edu/~bioep740/yr2009/topics/Gigerenzer-jSoc-Econ-1994.pdf )
This style of argument makes you look like an ideologue. You can say whatever you like about how science should be done, but don’t make these flippant accusations when the facts are against you.
I agree, I think. Does Mr Briggs? We know a priori that 95% of the balls are colored the same. Neither of us can look in the bag, but we both see a single drawn ball, say it’s red. I say there is a 95% chance that the majority color is red. Mr Briggs seems to say otherwise. He seems to say that pulling a red ball out does not give us any useful information regarding the probability that the majority color is red.
Ninety-five percent of confidence intervals contain the mean. I take a random sample and produce a confidence interval. I say there is a 95% chance that the mean is in there. Briggs says otherwise, no? Isn’t my interpretation “wrong” in his eyes?
“But what about the interval I calculated with the only sample I do have? What does that interval mean? Nothing. Absolutely, positively nothing. The only thing you are allowed to say is to speak the tautology, “Either the true values of µ and s lie in the calculated intervals or they do not.”
John, as an economics student, maybe I’ve been spared this mistake. I’ve not seen the interpretation as stated in your comment, but this could be a symptom of (1) my being ill-read in the econometric literature or (2) my being ill-read in Bayesian inference.
Do you mean one such misinterpretation is “A 95% confidence interval about a point estimate means the true population parameter would lie outside the interval 5% of the time.”?
Paraphrasing Kennedy (2008, 6th edition) in what follows.
A CI is a region or area that, when constructed in repeated samples, covers the true estimate in, say, 95% of the samples. (p. 54; similar to above)
The CI can be interpreted as being such that the probability that the estimate falls in that interval is 95%. When this is the shortest possible such interval, Bayesians refer to it as the highest posterior density interval. This is the way in which many clients of classical/frequentist statisticians want to and do interpret classical 95% CI, in spite of the fact that it is illegetimate to do so.
Sorry, page 222 on the last paragraph.
Wrote a longish response earlier that disappeared, and perhaps this one will as well. So I won’t go back through it, but I just want to assure David that I’ve more than read the literature—I’ve actually even written some of it! Gigerentzer has written some interesting things, but his claims about the radical differences between Fisher and N-P statistics is quite unfounded and very misleading.
I think confidence intervals are useful things to have, but their “objective interpretation” doesn’t mean much. I’d rather have likelihoods plotted over the domain in question, comparing the plausibility of one hypothesis over another. I understand that likelihoods don’t always exist. I also understand that the fraction of the space that one posterior is greater than another is not necessarily a useful input for decision-making.
I think it is useful, when, say, concluding if an event of significance has happened in some interval, to have estimates of location, and estimates of dispersion. Practically speaking, I don’t much care what these are. I like being independent of models, of assumptions on distributions. With my work, inadequate amounts of data are typically not an issue, but they are an issue some of the time, when I can’t have enough data about a particular subject of inquiry.
I’d like to think my purpose to to expose to people the uncertainty in their inferences, whether my techniques are resampling or importance sampling or other methods.
I do agree, even if I might disagree with the methodological critique, that such appreciation of uncertainty and range is at the heart of our art, and of our key responsibilities.
Here was the gist of my erased point: With a properly computed CI, or well-tested there is a well-warranted statement corresponding to the stated interval. It is only by viewing every such statement through a Bayesian lens that claims such as :
hypothesis H is well corroborated or warranted by the data are invariable interpreted Bayesianly. And so, the person is interpreting the CI correctly, but the Bayesian who views every claim to having evidence as a posterior probability will say, aha, you’re misinterpreting your CI Bayesianly! But that’s certainly not what people mean. The English word “probability” can be equivocal, but that’s no excuse to insist only a Bayesian construal is correct.
Many make the same mistake as regards the word “likely” and so mistakenly think people are ignoring base rates that should be considered—but that’s another story.
For some relevant blogposts (errorstatistics.blogspot.com), see Oct 30, 31; Nov. 8.
Lots of papers on my homepage discussing how to interpret relevant error probabilities as claims of how well-corroborated a claim is (how consistent the data are with a hypothesis, and which discrepancies were sufficiently well-probed to infer they are absent). This is the severity interpretation of tests. Sorry to be writing this quickly, ….
They make the same mistake with construing
I presented you with some data that contradicted a claim you made. There are several responses you could have provided that would have made for an interesting discussion.
For instance, you could have argued that the six-question surveys described in the section I linked to were ambiguous or invalid. You could have argued that the people being surveyed were, in fact, correct, and the authors were wrong. You could have argued that my surveys were about p-values and not about confidence intervals, so the point didn’t apply (although that might have been a stretch). You could have presented other surveys or reviews of the literature that showed that people correctly interpreted frequentist confidence intervals.
But neither of your posts actually does any of that. Instead, you lashed out at Gigerenzer for a supposed error that wouldn’t even have been germane to the discussion if he had conducted the surveys himself (which he didn’t), and reiterated your original accusations agains unnamed Bayesians in a remarkably garbled post.
The original post by Briggs only asked for a single example of a frequentist correctly interpreting their results. I had assumed that was an absurdly low bar, but you haven’t even provided that much, let alone a response to the data I’ve presented corroborating him. If the survey questions and methodology are valid, then the argument you’ve reiterated is simply false, no matter how much your ideology wants it to be true.
What exactly are you trying to accomplish with your online presence?