Extremely small probabilities

One objection to modeling adult heights with a normal distribution is that the former is obviously positive but the latter can be negative. However, by this model negative heights are astronomically unlikely. I’ll explain below how one can take “astronomically” literally in this context.

A common model says that men’s and women’s heights are normally distributed with means of 70 and 64 inches respectively, both with a standard deviation of 3 inches. A woman with negative height would be 21.33 standard deviations below the mean, and a man with negative height would be 23.33 standard deviations below the mean. These events have probability 3 × 10-101 and 10-120 respectively. Or to write them out in full




As I mentioned on Twitter yesterday, if you’re worried about probabilities that require scientific notation to write down, you’ve probably exceeded the resolution of your model. I imagine most probability models are good to two or three decimal places at most. When model probabilities are extremely small, factors outside the model become more important than ones inside.

According to Wolfram Alpha, there are around 1080 atoms in the universe. So picking one particular atom at random from all atoms in the universe would be on the order of a billion trillion times more likely than running into a woman with negative height. Of course negative heights are not just unlikely, they’re impossible. As you travel from the mean out into the tails, the first problem you encounter with the normal approximation is not that the probability of negative heights is over-estimated, but that the probability of extremely short and extremely tall people is under-estimated. There exist people whose heights would be impossibly unlikely according to this normal approximation. See examples here.

Probabilities such as those above have no practical value, but it’s interesting to see how you’d compute them anyway. You could find the probability of a man having negative height by typing pnorm(-23.33) into R or scipy.stats.norm.cdf(-23.33) into Python. Without relying on such software, you could use the bounds

\frac{x}{\sqrt{2\pi}(x^2 + 1)} \exp(-x^2/2) < \Phi^c(x) < \frac{1}{\sqrt{2\pi}\,x} \exp(-x^2/2)

with x equal to -21.33 and -23.33. For a proof of these bounds and tighter bounds see these notes.

6 thoughts on “Extremely small probabilities

  1. John,

    People seem to have problems with understanding really big or small numbers. For example, I once came across a claim that the probability of a tied presidential election is 10^-92. This is ridiculous, as can be seen, for example, by a very simple calculation such as: most elections are within 10 million votes, so the probability of an exact tie (unconditional on details of the election) has to be of order 1 in 10 million, or 10^-7. This calculation can be made more historically accurate and can account for the electoral college and the possibility of recounts (indeed, my colleagues and I have published a few papers on the topic). The point is that people just aren’t used to thinking about this sort of number. It would be like saying that Chicago is far from New York, so let’s call it 10^80 miles away!

    As Bill James once wrote, the problem is that people tend to treat numbers as words rather than as numbers.

    Sometimes this is ok (for example, we have a sense of what it feels like when it’s 20 degrees outside, or 30 degrees, or 40 degrees, and for those purposes the numbers do not need to be treated as numbers. Indeed, in the weather example, people are probably better off thinking of these degrees as words, otherwise they might say that 40 degrees is twice as hot as 20 degrees, or that 20 degrees plus 40 degrees equals 60 degrees (as in an example from one of Feynman’s books).

    In other cases, though, a number is a number and can be treated as such. Which reminds me: One of the problems of p-values is that they take something numerical (a z-score: how many standard errors the estimate is from zero) and nonlinearly transform it into what are a set of words (“significant,” “highly significant,” “marginally significant,” etc.). And you can’t really do much with a p-value. A p-value of .01 does not represent 5 times the evidence or 5 times the effect of a p-value of .05. (The Bayes factor doesn’t work that way either.)

  2. Richard: Indeed something to be proud of. That kind of accuracy is possible in physics (at great effort) but not in anything biological.

  3. John, you pointed out one of the biggest issues I have to face daily: the (almost) total unawareness of the meaning of numbers. Orders of magnitude are not understood, especially when we deal with such outliers… Thanks for sharing!

  4. Agustin F. CORREA

    the smal ,probabilities become important when you perform size transformations
    When you count particles, you have a disitribution of number vs particle diameter. If you wan to obtain the distribution in Volumen is almsot mandatory to work with Bounded distribution or with intrinsic bounded distributions either beta or Kumaraswamy distribtutions.
    Scientists who work in crystallysation used to apply tanh bounded ( Lakatos, Bickle), in cooper lixiviation Herbst applied the versatile gamma distribution without bounded.
    In this kind of transfotmation we have to avoiid the Groenland -Afrika effect.
    either when we transform number to volume upper bounded distributions are necessary and when we transform volume to number lower bounded distributions should be utilised.

Comments are closed.