I saw on Twitter this afternoon a paraphrase of a quote from Nassim Taleb to the effect that if you see a six-sigma event, that’s evidence that it wasn’t really a six-sigma event.

What does that mean? Six sigma means six standard deviations away from the mean of a probability distribution, sigma (σ) being the common notation for a standard deviation. Moreover, the underlying distribution is implicitly a normal (Gaussian) distribution; people don’t commonly talk about “six sigma” in the context of other distributions [1]. Here’s a table to indicate the odds against a *k*-sigma event for various *k*.

|-------+-----------------| | Sigma | Odds | |-------+-----------------| | 1 | 2 : 1 | | 2 | 21 : 1 | | 3 | 370 : 1 | | 4 | 16,000 : 1 | | 5 | 1,700,000 : 1 | | 6 | 500,000,000 : 1 | |-------+-----------------|

If you see something that according to your assumptions should happen twice in a billion tries, maybe you’ve seen something extraordinarily rare, or maybe your assumptions were wrong. Taleb’s comment suggests the latter is more likely.

## Bayes rule and Bayes factors

You could formalize this with Bayes rule. For example, suppose you’re 99% sure the thing you’re looking at has a normal distribution with variance 1, but you’re willing to concede there’s a 1% chance that what you’re looking at has a fat-tailed distribution, say a Student *t* distribution with 10 degrees of freedom, rescaled to also have variance 1.

It’s hard to tell the two distributions apart, especially in the tails. But although both are small in the tails, the normal is *relatively* much smaller.

Now suppose you’ve seen an observation greater than 6. The Bayes factor in favor of the *t* distribution hypothesis is 272. This means that even though before seeing any data you thought the odds were 99 to 1 in favor of the data coming from a normal distribution, after seeing such a large observation you would put the odds at 272 to 1 in favor of the *t* distribution.

If you allow a small possibility that your assumption of a normal distribution is wrong (see Cromwell’s rule) then seeing an extreme event will radically change your mind. You don’t have to think the fat-tailed distribution is equally likely, just a possibility. If you *did* think *a priori* that both possibilities were equally likely, the posterior odds for the *t* distribution would be 27,000 to 1.

In this example we’re comparing the normal distribution to a very specific and somewhat arbitrary alternative. Our alternative was just an example. You could have picked a wide variety of alternatives that would have given a qualitatively similar result, reversing your *a priori* confidence in a normal model.

By the way, a *t* distribution with 10 degrees of freedom is not a very fat-tailed distribution. It has fatter tails than a normal for sure, but not nearly as fat as a Cauchy, which corresponds to a *t* with only one degree of freedom. If we had used a distribution with a heavier tail, the posterior odds in favor of that distribution would have been higher.

***

[1] A six-sigma event isn’t that rare unless your probability distribution is normal. By Markov’s inequality, the probability is less than 1/36 for any distribution. The rarity of six-sigma events comes from the assumption of a normal distribution more than from the number of sigmas per se.

https://en.m.wikipedia.org/wiki/Vysochanskij–Petunin_inequality might cover a much wider range of cases, with little loss.

To be precise, when you talk about seeing a six-sigma event you should take into account the number of observations made. If such an event took place after a single observation, it would weigh more strongly in favor of a heavy-tailed distribution than it would if it occurred once in 1000 observations, although the latter would still be highly unlikely for a normal distribution. (Much of what we observe in daily life consists implicitly of multiple observations; for example, if we read about an exceptionally large storm on the news, we should consider it in the context of all the other locations that could have had such a storm but did not, and all of the other days we saw the news but did not read about such a storm.)

Still, it’s probably true that what seems to be a normal distribution often has heavier tails. The central limit theorem works when you are observing the sum of many small random variables, but if there are terms in the sum that are usually 0 but sometimes very large – corresponding to unexpected events that your model might not account for – you would expect to see heavier tails without much of an increase in variance.

Let’s not forget this classic:

“We were seeing things that were 25-standard deviation moves, several days in a row,” said David Viniar, Goldman’s chief financial officer.

Honestly, I think your parenthetical about 6\sigma only meaning what we think it means being true for the Gaussian was the most surprising part of this for me, and points out how much establishing whether you’ve got fat tails matters. Ironically, I suspect Taleb’s point doesn’t work very well in the case of fat tails, which is what I generally associate him with.

David,

When people count sigmas, they implicitly refer to normal distributions. Taleb’s point is exactly that in practice tails are fatter than a normal distribution, and a six-sigma event is actually evidence that you have fat tails. Taleb tweeted a link to this post, implying it doesn’t contradict his views.

Nathan,

I’m considering one observation in this post. But the result is qualitatively the same for multiple observations.

You don’t need heavy tails for the CLT to fail in application. Even with thin-tailed distributions, central limit theorem approximations can have large, even unbounded,

relative erroraway from the middle.The CLT, and its more quantitative counterpart Berry-Esseen, bounds the

absolute error, not the relative error. And for moderate-sized probabilities, small absolute error implies small relative error. But out in the tails, the relative error could be unbounded.I’ve seen 6 sigma events. I wasn’t surprised. The distribution probably wasn’t exactly normal, but it was extremely close out to well past 6 sigma. This was part of a big simulation with billions of trials. These days 6 sigma on the normal distribution really can happen, though you probably have to be doing something fairly strange to see it.

A real six-sigma event happens on average twice in every billion trials, so you should see a few of them when you do billions of trials.

Not directly relevant, but a lot of software uses a polynomial approximation to the Normal CDF function. If you are doing calculations at 6+ standard deviations that approximation may fail badly.

In the extreme left-hand tail the normal PDF and CDF are nearly identical, and you can use the PDF in place of the CDF. In old versions of MATLAB the approximation failed around -6 to -7 sigma. Some testing might be needed to determine just if/when the approximation breaks down in other software.

Hi Dr.Cook,

I really appreciate your valuable postings, you are doing a great job.

I have a question. Could you provide some info about the following:

“What are the misuses and abuses of the Central Limit Theorem in statistical practice?”

I am grateful for your help. Thanks.

One error with applications of the central limit theorem are a failure to distinguish absolute error and relative error in the tails. The CLT says that the

absoluteerror in the normal approximation to an average goes to zero, but therelativeerror might diverge. Said another way, the CLT gives useful approximations in the fat middle (providing the hypotheses of the theorem hold) but maybe not in the tails. Events far out in the tails are rare, but maybe not nearly as rare as the normal approximation would suggest.