I saw on Twitter this afternoon a paraphrase of a quote from Nassim Taleb to the effect that if you see a six-sigma event, that’s evidence that it wasn’t really a six-sigma event.

What does that mean? Six sigma means six standard deviations away from the mean of a probability distribution, sigma (σ) being the common notation for a standard deviation. Moreover, the underlying distribution is implicitly a normal (Gaussian) distribution; people don’t commonly talk about “six sigma” in the context of other distributions [1]. Here’s a table to indicate the odds against a *k*-sigma event for various *k*.

|-------+-----------------| | Sigma | Odds | |-------+-----------------| | 1 | 2 : 1 | | 2 | 21 : 1 | | 3 | 370 : 1 | | 4 | 16,000 : 1 | | 5 | 1,700,000 : 1 | | 6 | 500,000,000 : 1 | |-------+-----------------|

If you see something that according to your assumptions should happen twice in a billion tries, maybe you’ve seen something extraordinarily rare, or maybe your assumptions were wrong. Taleb’s comment suggests the latter is more likely.

## Bayes rule and Bayes factors

You could formalize this with Bayes rule. For example, suppose you’re 99% sure the thing you’re looking at has a normal distribution with variance 1, but you’re willing to concede there’s a 1% chance that what you’re looking at has a heavier-tailed distribution, say a Student *t* distribution with 10 degrees of freedom, rescaled to also have variance 1.

It’s hard to tell the two distributions apart, especially in the tails. But although both are small in the tails, the normal is *relatively* much smaller.

Now suppose you’ve seen an observation greater than 6. The Bayes factor in favor of the *t* distribution hypothesis is 272. This means that even though before seeing any data you thought the odds were 99 to 1 in favor of the data coming from a normal distribution, after seeing such a large observation you would put the odds at 272 to 1 in favor of the *t* distribution.

If you allow a small possibility that your assumption of a normal distribution is wrong (see Cromwell’s rule) then seeing an extreme event will radically change your mind. You don’t have to think the heavier-tailed distribution is equally likely, just a possibility. If you *did* think *a priori* that both possibilities were equally likely, the posterior odds for the *t* distribution would be 27,000 to 1.

In this example we’re comparing the normal distribution to a very specific and somewhat arbitrary alternative. Our alternative was just an example. You could have picked a wide variety of alternatives that would have given a qualitatively similar result, reversing your *a priori* confidence in a normal model.

By the way, a *t* distribution with 10 degrees of freedom is not a very heavy-tailed distribution. It has heavier tails than a normal for sure, but not nearly as heavy as a Cauchy, which corresponds to a *t* with only one degree of freedom. If we had used a distribution with a heavier tail, the posterior odds in favor of that distribution would have been higher.

***

[1] A six-sigma event isn’t that rare unless your probability distribution is normal. By Markov’s inequality, the probability is less than 1/36 for any distribution. The rarity of six-sigma events comes from the assumption of a normal distribution more than from the number of sigmas per se.