Why isn’t everything normally distributed?

Adult heights follow a Gaussian, a.k.a. normal, distribution [1]. The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem.

If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs.

The central limit theorem says that the sum of many independent, additive effects is approximately normally distributed [2]. Genes are more digital than analog, and do not produce independent, additive effects. For example, the effects of dominant and recessive genes act more like max and min than addition. Genes do not appear independently—if you have some genes, you’re more likely to have certain other genes—nor do they act independently—some genes determine how other genes are expressed.

Height is influenced by environmental effects as well as genetic effects, such as nutrition, and these environmental effects may be more additive or independent than genetic effects.

Incidentally, if effects are independent but multiplicative rather than additive, the result may be approximately log-normal rather than normal.

* * *

Fine print:

[1] Men’s heights follow a normal distribution, and so do women’s. Adults not sorted by sex follow a mixture distribution as described here and so the distribution is flatter on top than a normal. It gets even more complicated when you considered that there are slightly more women than men in the world. And as with many phenomena, the normal distribution is a better description near the middle than at the extremes.

[2] There are many variations on the central limit theorem. The classical CLT requires that the random variables in the sum be identically distributed as well, though that isn’t so important here.

11 thoughts on “Why isn’t everything normally distributed?

  1. Am I wrong to be bothered by the description of heights as being normally distributed when they cannot take on negative values? Or is 0 so many standard deviations from the mean that we would be unlikely to see one (given all the people born throughout history), even if it were possible?

  2. Dave, that’s an example of where the normal approximation breaks down in the extremes. Though as you said, 0 is many standard deviations away from the mean.

    As you travel from the mean out into the tails, the first problem you encounter is not that the probability of negative heights is over-estimated, but that the probability of extremely short and extremely tall people is under-estimated. These are rare events, so the absolute error is very small, but the relative error far out in the tail is huge. The normal approximation is perfectly adequate for, say, airlines wanting to estimate how many passengers will have to stoop when entering a plane.

  3. For those that are interested in the biological as well as the statistical details, Wood et al. (2014) is the largest genetic study of height thus far (N = 253,288). The main take home message is that “[t]he results are consistent with a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants, located throughout the genome but clustered in both a biological and genomic manner.”

    (Disclosure: I am a middle-author on this paper.)

  4. It is a common idea that height of human beings is fitted by a Normal distribution. However, if you think a little bit more to it:
    1) Height does not accept negative values, as Dave pointed out. For me, this is the primary alarm sign that the distribution is probably not Normal,

    2) I wonder why so few people have tried to fit a Log-Normal on such data as well: it works equally well. A Log-Normal distribution does not always look completely asymmetrical, especially when the mean >> sd. See, for instance legend of Fig. 1 in Limpert et al 2001 (http://bioscience.oxfordjournals.org/content/51/5/341.extract).

    3) Growth is essentially a multiplicative phenomenon. This is well-known since the work of Von Bertalanffy and others. Most, if not all serious growth curves are exponential in nature for this reason.

    So, given those three arguments, why do people still believe height of humans is a good example of a Normally-distributed variable? For me not, I tend to favour the Log-Norml distribution in this case.

  5. Just want to note that independence is a sufficient condition for normal approximation to hold, but it is not a necessary condition. Sums (or averages) of dependent variables can be distributed approximately normal.

  6. Negative values are over 21 standard deviations from the mean, and so are astronomically unlikely, less than 10^-100. By comparison, there are about 10^80 particles in the observable universe. If that were the only inaccuracy of the normal approximation, no other probability model would fit reality so well.

  7. Personally, I don’t find it surprising that not everything is normally distributed. Why should any real phenomenon follow a theoretical limiting distribution anyway, never mind a symmetric, infinite-tailed distribution that is exact only in an unachievable limit? The surprise is that so many things _are_ sufficiently near normality for it to be useful!

  8. S Ellison: I agree. The most astonishing thing is that so many things are normal. But once you hear a justification for that via the Central Limit Theorem, the next question is “Then why isn’t everything normal?”

Leave a Reply

Your email address will not be published. Required fields are marked *