Suppose you’ve been monitoring a rare event for a long time, then you see your first occurrence on the Nth observation. Now what would you say about the event’s probability?
For example, suppose you’re wondering whether dogs ever have two tails. You observe thousands of dogs and never see two tails. But then you see a dog with two tails? Now what can you say about the probability of dogs having two tails? It’s certainly not zero.
We’ll first look at the case of 0 successes out of N trials then look at the case of 1 success out of N trials.
If you’re observing a binary event and you’ve seen no successes out of N trials your point estimate of the probability of your event is 0. You can’t have any confidence in the relative accuracy of your estimate: if the true probability is positive, no matter how small, then the relative error in your estimate is infinite.
But you can have a great deal of confidence in its absolute accuracy. When you’re looking for a binary event and you have not seen any instances in N trials for large N, then a 95% confidence interval for the event’s probability is approximately [0, 3/N]. This is the statistical rule of three. This is a robust estimate, one you could derive from either a frequentist or Bayesian perspective.
Note that the confidence interval [0, 3/N] is exceptionally narrow. When observing a moderate mix of successes and failures the width of the confidence interval is on the order of 1/√N, not 1/N.
After seeing your first success, your point estimate jumps from 0 to 1/N, and infinite relative increase. What happens to your confidence interval?
If we use Jeffreys’ beta(1/2, 1/2) prior, then the posterior distribution after seeing 1 success and N − 1 failures is a beta(3/2, N + 1/2). Now an approximate 95% confidence interval is
So compared to the case of seeing zero successes, seeing one success makes our confidence interval about 50% wider and shifts it to the left by 0.1/N.
So if you’ve seen 100,000 dogs and only 1 had two tails, you could estimate that a 95% confidence interval for the probability of a dog having two tails is
[10−6, 4.7 × 10−5].
If we run the exact numbers we get
[ 1.07 × 10−6, 4.67 ×10−5].