In my previous post, I speculated on why heights are normally distributed, that is, why their statistical distribution is very nearly Gaussian. In this post I want to point out where it breaks down. I’ll look closely at an example from Elementary Statistics by Mario Triola.
At the beginning of the chapter, we noted that the United States Army requires that women’s heights be between 58 and 80 inches. Find the percentage of women satisfying that requirement. Again assume that women have heights that are normally distributed with a mean of 63.6 inches and a standard deviation of 2.5 inches.
The book gives a solution of 98.7%. That’s probably a fairly realistic result, though maybe not to three significant figures. My quibble is with one of the details along the way to the solution, not the final solution itself.
A height of 80 inches is 6.56 standard deviations away from the mean. The probability of a normal random random variable taking on a value that far away from its mean is between 2 and 3 out of 100 billion. Since there are about 7 billion people on our planet, and less than half of these are adult women, this says it would be unlikely to ever find a woman 80 inches (6′ 8″) tall. But there are many women that tall or taller. The world record is 91 inches (7′ 7″), or about 11 standard deviations from the mean. If heights really were normally distributed, the probability of such a height would be 1.9 x 10-28 or about 2 chances in 10,000,000,000,000,000,000,000,000,000. The fit is even worse in the lower tail of the distribution. The world’s shortest woman is 25.5 inches tall, 15 standard deviations below the mean.
The normal distribution describes heights remarkably well near the mean, even a couple standard deviations on either side of the mean. But in the extremes, such as six standard deviations out, the model doesn’t fit well. The absolute error is small: the normal model predicts that women 80 inches tall or taller are uncommon, and indeed they are. But they are not nearly as uncommon as the model suggests. The relative error in the model when predicting extreme values is enormous.
The normal model often doesn’t fit well in the extremes. It often underestimates the probability of rare events. The Black Swan gives numerous examples of rare events that were not as rare as a normal distribution would predict. What might account for this poor fit?
Well, why should we expect a normal distribution to fit well in the first place? Because of the central limit theorem. This theorem says roughly that if you average a large number of independent random variables, the result has an approximately normal distribution. But there are many ways the assumptions of this theorem could fail to hold: the random variables might not be independent, they might not be identically distributed, they might have thick tails, etc. And even when the assumptions of the central limit do apply, the theorem only guarantees that the absolute error in the normal approximation goes to zero. It says nothing about the relative error. That may be why the normal model accurately predicts what percentage of women are eligible to serve in the US Army but does not accurately predict how many women are over 6′ 8″ tall.
* * *
For daily posts on probability, follow @ProbFact on Twitter.