Sometimes you can derive a probability distributions from a list of properties it must have. For example, there are several properties that lead inevitably to the normal distribution or the Poisson distribution.

Although such derivations are attractive, they don’t apply that often, and they’re suspect when they do apply. There’s often some effect that keeps the prerequisite conditions from being satisfied in practice, so the derivation doesn’t lead to the right result.

The Poisson may be the best example of this. It’s easy to argue that certain count data have a Poisson distribution, and yet empirically the Poisson doesn’t fit so well because, for example, you have a mixture of two populations with different rates rather than one homogeneous population. (Averages of Poisson distributions have a Poisson distribution. Mixtures of Poisson distributions don’t.)

The best scenario is when a theoretical derivation agrees with empirical analysis. Theory suggests the distribution should be X, and our analysis confirms that. Hurray! The theoretical and empirical strengthen each other’s claims.

Theoretical derivations can be useful even when they disagree with empirical analysis. The theoretical distribution forms a sort of baseline, and you can focus on how the data deviate from that baseline.

## More posts on probability distributions

For daily posts on probability, follow @ProbFact on Twitter.

To quote myself, “If spherical horses arrived according to an exponential distribution, physicists and queueing theorists would rule the world.”

“There’s often some effect that keeps the prerequisite conditions from being satisfied in practice, so the derivation doesn’t lead to the right result.” John, I appreciate your post, but I think that this seems to conflate the issue of deriving a statistical distribution according to some desiderata with the issue of fitting a distribution to real world data. Is it correct to say that most of the named statistical distributions were “derived” by listing out desiderata and then through pondering and trial and error a PDF was achieved, and then in hindsight the said distribution was named and defined as such, thereby making a “derivation” pointless (because you don’t really derive something which is true by definition)?

I wouldn’t say “most” distributions were derived from a list of desiderata, or at least not directly.

@James, I would say that most distributions are derived as functions of (or generalizations of) simpler distributions. From the simplest of all, the Bernoulli yes/no distribution, the binomial, geometric, and negative binomial arise in obvious ways as you iterate. From the exponential, you get the Erlang, which generalizes to the Gamma. Etc.

The exponential, though, is what you get if you ask “what kind of lifetime distribution would have the same expected remaining lifetime regardless of current age?”. That’s the kind of theoretical derivation I took John to be talking about. (And it’s a good example of something that doesn’t happen nearly as often in real life as would be convenient…) There aren’t many distributions that arise from first principles that way, so far as I know.

So normal distribution has many known, or not-so-well-known charaterisation. Like X or X1,X2,… Xn iid (zero mean), then X1 should be normal if one of these true:

X1+X2 same distribution X1-X2

X1+X2+Xn/sqrt(n) same distribution X1

mean(X),std(X) are independent

dist of X has the maximal entropy (among dist with finite mean and -inf…inf support)

etc

But of course, Gaussian distribution was not derived to satisfy any of these. Instead, it was derived as a limit distribution of the number of head – number of tail/sqrt(number of flop), as number of flop goes to infinity. It was a surprising result of de Moivre 1783.

I would say it was a definitely a result of a derivation, but of course not a derivation based on some basic principles or desired features.

That’s interesting. On several occasions, my colleagues in the biology department have come to me asking for reassurance that their data is normally distributed before applying some parametric test. My sense, after reading a few papers on normality tests, was that testing for normality is a touchy subject indeed. My colleagues, as domain experts in biology, were probably better equipped to say whether their uncertainty about the r.v. of interest was best characterized by a normal distribution, being the sum of many independent events. The prior probability of the data being normally distributed in biological science is in many cases so high that a normality test would probably not be instructive– 5% of the time you’d have scientists panicking about applying a T-test for little reason.