Bad normal approximation

Sometimes you can approximate a binomial distribution with a normal distribution. Under the right conditions, a Binomial(n, p) has approximately the distribution of a normal with the same mean and variance, i.e. mean np and variance np(1-p). The approximation works best when n is large and p is near 1/2.

This afternoon I was reading a paper that used a normal approximation to a binomial when n was around 10 and p around 0.001.  The relative error was enormous. The paper used the approximation to find an analytical expression for something else and the error propagated.

A common rule of thumb is that the normal approximation works well when np > 5 and n(1-p) > 5.  This says that the closer p is to 0 or 1, the larger n needs to be. In this case p was very small, but n was not large enough to compensate since np was on the order of 0.01, far less than 5.

Another rule of thumb is that normal approximations in general hold well near the center of the distribution but not in the tails. In particular the relative error in the tails can be unbounded. This paper was looking out toward the tails, and relative error mattered.

For more details, see these notes on the normal approximation to the binomial.

Tagged with:
Posted in Statistics
2 comments on “Bad normal approximation
  1. I think this is why many a/b testing tools will tell you it have significant results very early on, then the significance will disappear, and often return later. Obviously part of this is because these significance aren’t intended for repeated measurements. But I think it’s also because a z-test is so convenient that you don’t want to have to switch to a t-test some of the time.

  2. katastrofa says:

    Your post reminded me of the “proxy integration” approximation used for CDO pricing, which did exactly that: used the normal distribution to approximate a quasi-binomial [1] one in the tails.

    [1] Quasi-binomial, because it counts the number of successes in n trials with a different (but small) success probability for each trial.