Jim Berger gives the following example illustrating the difference between frequentist and Bayesian approaches to inference in his book The Likelihood Principle.
A fine musician, specializing in classical works, tells us that he is able to distinguish if Hayden or Mozart composed some classical song. Small excerpts of the compositions of both authors are selected at random and the experiment consists of playing them for identification by the musician. The musician makes 10 correct guesses in exactly 10 trials.
A drunken man says he can correctly guess in a coin toss what face of the coin will fall down. Again, after 10 trials the man correctly guesses the outcomes of the 10 throws.
A frequentist statistician would have as much confidence in the musician’s ability to identify composers as in the drunk’s ability to predict coin tosses. In both cases the data are 10 successes out of 10 trials. But a Bayesian statistician would combine the data with a prior distribution. Presumably most people would be inclined a priori to have more confidence in the musician’s claim than the drunk’s claim. After applying Bayes theorem to analyze the data, the credibility of both claims will have increased, though the musician will continue to have more credibility than the drunk. On the other hand, if you start out believing that it is completely impossible for drunks to predict coin flips, then your posterior probability for the drunk’s claim will continue to be zero, no matter how much evidence you collect.
Dennis Lindley coined the term “Cromwell’s rule” for the advice that nothing should have zero prior probability unless it is logically impossible. The name comes from a statement by Oliver Cromwell addressed to the Church of Scotland:
I beseech you, in the bowels of Christ, think it possible that you may be mistaken.
In probabilistic terms, “think it possible that you may be mistaken” corresponds to “don’t give anything zero prior probability.” If an event has zero prior probability, it will have zero posterior probability, no matter how much evidence is collected. If an event has tiny but non-zero prior probability, enough evidence can eventually increase the posterior probability to a large value.
The difference between a small positive prior probability and a zero prior probability is the difference between a skeptical mind and a closed mind.
10 thoughts on “Musicians, drunks, and Oliver Cromwell”
Did Haydn produce a lot more music than Mozart? So that taking a random sample from the combined complete works of both would give a definite bias to Haydn.
What exactly are the odds of guessing the right coin-toss 10 times in a row. My non-statistical mind would guess 50% * 10 = 5%. That’s assuming the drunk didn’t by chance have a weighted coin, too.
Aaron: the probability of guessing each of ten coin tosses for a fair coin is 1/2^10 = 1/1024 or about 0.001.
yeah, thanks. 50% ** 10! not 50% * 1/10
I figured that out in the wee hours of the morning after I realized that what I wrote wasn’t even what I originally meant, which I’d for some reason calculated as 2% in my mind.
I’d thought at first that it must be “around 1:1000″ but then decided there must be a trick to it. In the clarity of the night I thought ” if I flip the coin two times, what are the possibilities of it coming up heads twice in a row — four combinations, 1/4, or 1/2 * 1/2.” It wasn’t hard to extropolate that out, since counting by exponents is a great way to waste time in school. And besides, anyone who doesn’t know 2^10 shouldn’t be using a computer.
Sorry John. You and Jim are dead wrong. Frequentists would NOT treat the two cases identically if there was prior information. If you have prior information that the probability is in the itnerval [0,epsilon] then the maximum likelihood estimate from 10 successes out of 10 is epsilon.
That seems more honest to me. Much more honest than the Bayesian fuge of saying p is uniform which leads to adding one success and one failure i.e. the estimate 11/12.
And frequentisits do NOT assert that probabilities are 0. They sometimes POINT estimate a probability to be zero, but always aim to accompany this with a confidence interval. Again, 10/10 successes does not lead to an assertion that p=1, it leads to a 95% interval (0.741,1).
Chris, if you know your probability is in the interval [0, epsilon], you’re violating Cromwell’s rule. You’re assigning zero probability to all values larger than epsilon and no amount of data can change your mind.
To be fair, in practice data could eventually change your mind. If you saw enough successes you’d conclude that the initial restriction was not appropriate. But the Bayesian formalism could accommodate unexpected data without having to change the model.
The point is that it is possible for someone to use trickery or a non-fair coin to predict 100 coin flips in a row.
On the other hand, if an event has probability 0 it doesn’t mean that it’s impossible. Consider the event of picking a random number between 0 and 1.
Reminds me of the essay
The Median Isn’t the Message by Stephen Jay Gould
Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions
“The difference between a small positive prior probability and a zero prior probability is the difference between a skeptical mind and a closed mind.”