The other day I heard someone say something like the following:

I can’t believe how people don’t understand probability. They don’t realize that if a coin comes up heads 20 times, on the next flip there’s still a 50-50 chance of it coming up tails.

But if I saw a coin come up heads 20 times, I’d suspect it would come up heads the next time.

There are two levels of uncertainty here. **If** the probability of a coin coming up heads is θ = 1/2 and the tosses are independent, then yes, the probability of a head is 1/2 each time, regardless of how many heads have shown before. The parameter θ models our uncertainty regarding which side will show after a toss of the coin. That’s the first level of uncertainty.

But what about our uncertainty in the value of θ? Twenty flips showing the same side up should cause us to question whether θ really is 1/2. Maybe it’s a biased coin and θ is greater than 1/2. Or maybe it really is a fair coin and we’ve just seen a one-in-a-million event. (Such events do happen, but only one in a million times.) Our uncertainty regarding the value of θ is a second level of uncertainty.

Frequentist statistics approaches these two kinds of uncertainty differently. That approach says that θ is a constant but unknown quantity. Probability describes the uncertainty regarding the coin toss given some θ but not the uncertainty regarding θ. The Bayesian models all uncertainty using probability. So the outcome of the coin toss given θ is random, but θ itself is also random. It’s turtles all the way down.

It’s possible to have different degrees of uncertainty at each level. You could, for example, calculate the probability of some quantum event very accurately. If that probability is near 1/2, there’s a lot of uncertainty regarding the event itself, but little uncertainty about the parameter. High uncertainty at the first level, low uncertainty at the next level. If you warp a coin, it may not be apparent what effect that will have on the probability of the outcome. Now there’s significant uncertainty at the first and second level.

We’ve implicitly assumed that a single parameter θ describes the uncertainty in a coin toss outcome. Maybe that’s not true. Maybe the person tossing the coin has the ability to influence the outcome. (Some very skilled people can. I’ve heard rumors that Persi Diaconis is good at this.) Now we have a third level of uncertainty, uncertainty regarding our model and not just its parameter.

If you’re sure that a parameter θ describes the coin toss, but you don’t know θ, then the coin toss outcome is an known unknown and θ is an unknown unknown, a second-order uncertainty. More often though people use the term “unknown unknown” to describe a third-order uncertainty, unforeseen factors that are not included in a model, not even as uncertain parameters.

***

When one adds several random variables with certain nice properties one can often describe a sort of convergence of the random sum. (for example, sum of random variables with finite moments give rise to the normal distribution as a fixed point.) I wonder if something similar can be said about this random probabilities at different levels or if one is always doomed to complete uncertainty (at the top level) if one assumes unknowns (even with small uncertainties) at *all* levels. For example what happens if one assumes that the random variables are Guassian at all levels.

When you take into consideration uncertainty at a lower level, uncertainty at the higher levels increases.

Frequentist methods tend to underestimate overall uncertainty because they don’t properly account for second-order uncertainty.

I’m not sure… Bayesians often seem to me to be too sure of themselves. Let p denote the probability that a coin comes up heads. If you ask a non-Bayesian what the the probability of heads is, he might say “It’s some p in [0,1]. That’s all I can say.” But I know Bayesians who might say “Well, with no more info, we assume a maximum entropy, non-informative prior, i.e. a uniform distribution for p on [0,1]. Thus the probability of heads works out to be 1/2.”

In your example, the Bayesian answer is more in keeping with the spirit of statistics. Logic is about what is known to be true; statistics are about what is probably true.

If you only say “the probability is a probability,” you’re only stating a tautology, a logical statement that doesn’t add anything. But saying “I’d use 1/2 if I had to make a guess, but there’s large uncertainty in my answer” adds a little value.

I agree that Bayesians can be too sure of themselves too. In general statisticians of whatever stripe tend to underestimate uncertainty.

Βayesians can not be more sure than the a priori probability that they have to assume. In other words, even the maximum entropy state has probability associated with it in any given instance.

I once watched Persi Diaconis flip 9 heads out of 10 on demand, at an ORSA/TIMS (now “INFORMS”) conference.

All of the frequentists I’ve known were perfectly happy to rely on symmetry arguments for dice, coins, cards, etc. It’s when you start asking them what the probability is that Will Shakespeare wrote “Hamlet” (or that O.J. Simpson killed his wife) that they get flustered.

Nassim Taleb has some thoughts exactly on this:

https://en.wikipedia.org/wiki/Ludic_fallacy#Example_1:_Suspicious_coin

I don’t always hear Bayesians say there is uncertainty in Prob(heads). If p is uniform on [0,1], then I often hear Prob(heads)=E(p)=1/2 exactly. And I don’t recall hearing much about a distribution on the set of distributions, as alfC brings up. Do people really consider more than one level of turtle? It seems like you would have an explosion of dimensions to deal with. And in the end it would probably all integrate out to give 1/2 anyway, by symmetry.

A statistician is taking a math exam. The first question begins: “Prove that if x is in [0,1], then …” The statistician answers: “Without any further information, we take x=1/2 …”

Hi John,

Very interesting thoughts. Your comments bring the opening scenes of Rosencrantz and Guildenstern to my mind. I don’t know though, which level/kind of uncertainly implies that the coin has come heads up 157 times…

https://www.youtube.com/watch?v=KchhSIVwMdY

Best,

Zoltán

Three statisticians go deer hunting. The first two line up a shot on a deer they see. The first statistician misses by a foot to the left. The second one misses by a foot to the right. The third statistician excitedly shouts, “We got it! We got it!”

“I can’t believe how people don’t understand probability. They don’t realize that if a FAIR coin comes up heads 20 times, on the next flip there’s still a 50-50 chance of it coming up tails”

Most coin flip sequence discussions implicitly assume a fair coin, and I doubt it is prudent to question the fairness of a coin with only 20 flips.

I wonder if the odds of randomly selecting a non-fair coin from the world’s coin supply is more or less likely than seeing 20 heads in a row.

In response to SteveBrooklineMA’s observation that “Bayesians often seem to me to be too sure of themselves,” I suspect that much of the apparent overconfidence comes from the observer and the Bayesians having different understandings of the word probability.

Using the given example, “Let p denote the probability that a coin comes up heads,” the observer probably expects that p describes a physical property of the coin, its propensity to come up heads. Thus when the Bayesians say that their best estimate is that p = 1/2, it’s natural for the observer take their apparent claim that the coin is fair, especially on so little evidence, as a sign of overconfidence. After all, how could they be so sure that the coin wasn’t biased?

But the Bayesians have made no such claim. It’s all been a misunderstanding. To them p means something entirely different. It describes not the coin but the Bayesians’

beliefsabout the coin. It represents the strength of their belief that the coin will come up heads.The Bayesians do indeed recognize that the coin may be biased, or that the person flipping it may be skilled in the biasing of flips. But since they lack knowledge of which way any such bias would run, any argument they could make for bias toward heads would apply equally to bias toward tails, and by symmetry the thrust of the arguments would cancel out. Thus when asked for their best estimate of p, they are forced to answer that they have no reason to believe that the coin is more likely to come up heads than tails, or vice versa, and they express this state of belief with the statement p = 1/2.

@Tom: Agreed, and I’d add that point estimates are not Bayesian conclusions but

summariesof Bayesian conclusions. The posterior distribution contains is primary, and any single number is a way to describe that distribution. In signal processing terms, this is a lossy compression.Amazingly astute. You know everything has elements of uncertainty in it. And uncertainty is true of anything epistemic (not just aleatory). There will always be unknowns.

In finance (my area), everyone thinks of probability, from the econometricians to the risk-neutral physicists turned quants. But while we think about the bond price diffusing from day to day, the politicians are thinking of defaulting (and what prob will you assign to that?) and the lawyers try to devise schemes to avoid it. None of the experts can agree on a fixed probability for that outcome–hence uncertainty. Second order probability probably works well enough but it is clear that at times….the market doesn’t clear…the market is not complete (i.e., not all outcomes have fixed prices) and it is not at all efficient.

Uncertainty looms large during the crisis. But for some…it is a matter of life. Will Argentina avoid default or be forced to default by the court decision, and then find a nice work-around. This….is a daily type of question for some traders out there. What, me worry?

I’ve seen some odd attempts to deal with uncertainty and ignorance differently within a single framework. When I was in grad school, I took a seminar led by a visiting professor from New Zealand who had some odd notions about using interval-valued probabilities to do this. Here’s the book he wrote about it:

http://www.amazon.com/Statistical-Probabilities-Monographs-Statistics-Probability/dp/0412286602

I’d add that it’s worth questioning the example. While the person you are quoting may be accurately describing his/her views on probability. It seems reasonable to me that might be using hyperbole to express the fact that people frequently misunderstand probability to the point of not being able to discern when a set of events should evoke questioning of our assumptions about fairness. Sure 20 heads is an event with a likelihood of 2^20 on a fair coin but so is every other specifically called outcome.

This is, in my experience anyway what I see more frequently. For example I knew a bunch of religious people from Country A who went to Country B to talk to people there about their religion. In the process of talking to someone there they mentioned they were from A and the person from B handed them a business card from A from a company one of the two people had a relative that worked at.

The people from A assumed this was a sign that their presence there was a directive of the supreme being.

While I might well agree that particular event was pretty improbable the possibility of *some* kind of improbable event happening to them on a trip was, I expect pretty likely.

@John: Whether a single value is the full conclusion or a summary depends on the underlying question, doesn’t it? If the proposition in question is X = “the coin will come up heads on the next toss,” and you know nothing about the coin (other than that the two possible outcomes of tossing it are heads and tails), then the full answer to the question is that P(X) = 1/2. But if the proposition in question is actually a family of propositions X(f) = “The propensity of the coin to come up heads when tossed is f,” then the full answer is a mapping from all possible values of f to P(X(f)), in other words a distribution.

The reason I belabor this point is that, at least for me, it’s helpful to keep in mind how distributions resolve into individual propositions and vice versa. I know I’m in trouble if I can’t figure out what the base propositions are.

@Jonathan, “20 heads is an event with a likelihood of 2^20 on a fair coin but so is every other specifically called outcome.” This is where real world (namely Physics and a model of the world with context) kicks in. The 20 outcomes may be *probabilitiscally* independent events but they are not *physically* independent, because… they have been throw with the *same physical coin*. And we *know* (in many physical models of a coin) that geometric properties of it may consistently bias the outcome. However we don’t have simple “models of coin” that give “other specifically called outcome”. So, given an extraordinary outcome, we start to look for “real” context. Are the throws done with the same coin?, if not where the coins minted the same year?, in the same place? Anyway, can I *see* that coin please?