In my previous post, I speculated on why heights are normally distributed, that is, why their statistical distribution is very nearly Gaussian. In this post I want to point out where it breaks down. I’ll look closely at an example from Elementary Statistics by Mario Triola.

At the beginning of the chapter, we noted that the United States Army requires that women’s heights be between 58 and 80 inches. Find the percentage of women satisfying that requirement. Again assume that women have heights that are normally distributed with a mean of 63.6 inches and a standard deviation of 2.5 inches.

The book gives a solution of 98.7%. That’s probably a fairly realistic result, though maybe not to three significant figures. My quibble is with one of the details along the way to the solution, not the final solution itself.

A height of 80 inches is 6.56 standard deviations away from the mean. The probability of a normal random random variable taking on a value that far away from its mean is between 2 and 3 out of 100 billion. Since there are about 7 billion people on our planet, and less than half of these are adult women, this says it would be unlikely to ever find a woman 80 inches (6′ 8″) tall. But there are many women that tall or taller. The world record is 91 inches (7′ 7″), or about 11 standard deviations from the mean. If heights really were normally distributed, the probability of such a height would be 1.9 x 10^{-28} or about 2 chances in 10,000,000,000,000,000,000,000,000,000. The fit is even worse in the lower tail of the distribution. The world’s shortest woman is 25.5 inches tall, 15 standard deviations below the mean.

The normal distribution describes heights remarkably well near the mean, even a couple standard deviations on either side of the mean. But in the extremes, such as six standard deviations out, the model doesn’t fit well. The *absolute* error is small: the normal model predicts that women 80 inches tall or taller are uncommon, and indeed they are. But they are not nearly as uncommon as the model suggests. The *relative* error in the model when predicting extreme values is enormous.

The normal model often doesn’t fit well in the extremes. It often underestimates the probability of rare events. The Black Swan gives numerous examples of rare events that were not *as *rare as a normal distribution would predict. What might account for this poor fit?

Well, why should we expect a normal distribution to fit well in the first place? Because of the central limit theorem. This theorem says roughly that if you average a large number of independent random variables, the result has an approximately normal distribution. But there are many ways the assumptions of this theorem could fail to hold: the random variables might not be independent, they might not be identically distributed, they might have thick tails, etc. And even when the assumptions of the central limit do apply, the theorem only guarantees that the *absolute *error in the normal approximation goes to zero. It says nothing about the relative error. That may be why the normal model accurately predicts what percentage of women are eligible to serve in the US Army but does not accurately predict how many women are over 6′ 8″ tall.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

Hi!

You should write books! Ah… It was refreshing that you can actually explain statistical data (and some of its weakness) it such a clear and concise manner…

Thanks!

Sean.

P.S. People actually read this! And like it!

Did you e-mailed the author of the book, to include these 2 posts (Why are heights normal and not normal)? 🙂

Thank you for this brief and informative post.

I’d agree that the normal distribution is a fine thing for most in the middle. Yet effectively capturing frequency distributions using probability models, particularly when you have mostly normal data, but some extremely extreme outliers can be done. That’s what kurtosis and skew are for!

With physical systems, things DO tend to break down at extreme points. Even though that isn’t your point here, it elicited thoughts of fault tolerances and boundary values, when processes and machines reach their tolerance points and cease functioning.

I realize that that is a different matter than the traditional mean = 0, stdev = 1, skew = 0 and kurtosis = 4 (I think!) normal distribution breaking down at extreme upper and lower intervals of the data distribution? Is there an anaolgy there? Not even that? More? Please explain if you have time. I know some things, but you know more. Thank you.

PS I am enjoying your blog immensely!

I think that not the statistics is failing to explain rare events, but our current understandings of nature. The extreme height people does not exists besouse of randomness, but due to qualitative difference from others – the diseases gigantism and dwarfism. Such cases are called outliers so in the each study we have to decide what is the object of study – the gomogenous population or rare events. The statistics is still working: taking all people with gigantism, we can observe the normal distribution of height; the sample with dwarfism also has normal distribution.

The Central Limit Theorem does imply that the relative error of a normal approximation goes to 0, not just the absolute error. At some point the absolute error is less than 10^-4, at which point the relative error of a probability estimated to be 10^-2 isn’t terrible. However, the Central Limit Theorem is not effective, and height is not an average of IID random variables.

The Berry-Esseen Theorem and similar results give uniform bounds on the absolute error, and consequently for a particular n they give much better bounds on the relative error near the center than at the tails.

Thanks for posting and congratulations on your ability to put this into simple, concise layman’s terms. Cheers.

I think the probabilities would be slightly bigger if you calculate them for the probability in the population of people ever lived instead of the population of people alive at the moment.

So, the central limit theorem applies to the mean of a sample as compared to the mean of a population. It says that as you take larger samples, the means of the samples get closer to a normal distribution about the population mean. This is why the CLT is so useful. It doesn’t actually matter what the target distribution is, if you take samples with large enough n, you can characterize how the means of those samples will be distributed. There is actually no reason at all why you would assume that heights are normally distributed.

Every child is an average of sample of two of his parents’ genome. Repeat this averaging over a few generations, and thanks to the central limit theorem the whole population will be normally distributed. And it helps that the partner selection works in a way where the opposites attract each other – averages on opposite sides of the mean are more likely. Of course the averaging is not perfect – it is not true average, only combination of two random halfs of the genome.

It’s critical that height is determined by many different genetic factors. Sex is determined by one genetic factor — the presence or absence of a Y chromosome — and so at each generation we get male or female, not a smeared-out maleness spectrum that runs from negative to positive. Or like Mendel’s peas. They were smooth or wrinkled, but not a normal distribution of wrinkledness.