Small probabilities add, big ones don’t

A video has been making the rounds in which a well-known professor [1] says that if something has a 20% probability of happening in one attempt, then it has a 40% chance of happening in two attempts, a 60% chance in happening in three attempts, etc.

This is wrong, but it’s a common mistake. And one reason it’s common is that a variation on the mistake is approximately correct, which we will explain shortly.

It’s obvious the reasoning in the opening paragraph is wrong when you extend it to five, or especially six, attempts. Are you certain to succeed after five attempts? What does it even mean that you have a 120% chance of success after six attempts?!

But let’s reduce the probabilities in the opening paragraph. If there’s a 2% chance of success on your first attempt, is there a 4% chance of success in two attempts and a 6% chance of success in three attempts? Yes, approximately.

Two attempts

Here’s is the correct formula for the probability of an event happening in two tries.

P(A \cup B) = P(A) + P(B) - P(A\cap B)

In words, the probability of A or B happening equals the probability of A happening, plus the probability of B happening, minus the probability of A and B both happening. The last term is is a correction term. Without it, you’re counting some possibilities twice.

So if the probability of success on each attempt is 0.02, the probability of success on two attempts is

0.02 + 0.02 − 0.0004 = 0.0396 ≈ 0.04.

When the probabilities of A and B are each small, the probability of A and B both happening is an order of magnitude smaller, assuming independence [2]. The smaller the probabilities of A and B, the less the correction term matters.

If the probability of success on each attempt is 0.2, now the probability of success after two attempts is 0.36. Simply adding probabilities and neglecting the correction term is incorrect, but not terribly far from correct in this case.

Three attempts

When you consider more attempts, things get more complicated. The probability of success after three attempts is given by

\begin{align*} P(A \cup B \cup C) &= P(A) + P(B) + P(C) \\ &- P(A\cap B) - P(B \cap C) - P(A \cap C) \\ &+ P(A \cap B \cap C) \end{align*}

as I discuss here. Adding the probabilities of success separately over-estimates the correct probability. So you correct by subtracting the probabilities of pairs of successes. But then this is over-corrects, because you need to add back in the probability of three successes.

If A, B, and C all have a 20% probability, the probability of A or B or C happening is 48.8%, not 60%, again assuming independence.

The error from naively adding probabilities increases when the number of probabilities increase.

n attempts

Now let’s look at the general case. Suppose your probability of success on each attempt is p. Then your probability of failure on each independent attempt is 1 − p. The probability of at least one success out of n attempts is the complement of the probability of all failures, i.e.

1 - (1 - p)^n

When p is small, and when n is small, we can approximate this by np. That’s why naively adding probabilities works when the probabilities are small and there aren’t many of them. Here’s a way to say this precisely using the binomial theorem.

\begin{align*} 1 - (1 - p)^n &= 1 - \left(1 - np + {n \choose 2}p^2 - {n \choose 3}p^3 - \cdots\right ) \\ &= np + {\cal O}(p^2) \end{align*}

The exact probability is np plus (n − 1) terms that involve higher powers of p. When p and n are sufficiently small, these terms can be ignored.

 

[1] I’m deliberately not saying who. My point here is not to rub his nose in his mistake. This post will be online long after the particular video has been forgotten.

[2] Assuming A and B are independent. This is not always the case, and wrongly assuming independence can have disastrous consequences as I discuss here, but that’s a topic for another day.

One thought on “Small probabilities add, big ones don’t

  1. Great explanation. I guess a way to measure for two events how “negligible” the correction term $P(A \cap B)$ is, would be $P(A \cap B \mid A \cup B)$. (Then it would perhaps also be interesting to see a 3D plot which maps $P(A)$ and $P(B)$ to $P(A \cap B \mid A \cup B)$, assuming A and B are independent.)

    Another possible measure of negligibility could be $P(A\mid B)P(B\mid A)$.

Leave a Reply

Your email address will not be published. Required fields are marked *