Naming probability functions

Given a random variable X, you often want to compute the probability that X will take on a value less than x or greater than x. Define the functions

F_X(x) = Prob(X ≤ x)

and

G_X(x) = Prob(X > x)

What do you call F and G? I tend to call them the CDF (cumulative distribution function) and CCDF (complementary cumulative distribution function) but conventions vary.

The names of software functions to compute these two functions can be confusing. For example, Python (SciPy) uses the names cdf and sf (the latter for “survival function”) while the R functions to compute the CDF take an optional argument to return the CCDF instead [1].

In the Emacs calculator, the function ltpn computes the CDF. At first glace I thought this was horribly cryptic. It’s actually a very nice naming convention; it just wasn’t what I was expecting.

The “ltp” stands for lower tail probability and “n” stands for normal. The complementary probability function is utpn where “utp” stands for upper tail probability. Unlike other software libraries, Emacs gives symmetric names to these two symmetrically related functions.

“Lower tail” probability is clearer than “cumulative” probability because it leaves no doubt whether you’re accumulating from the left or the right.

You can replace the “n” at the end of ltpn and utpn with the first letters of binomial, chi-square, t, F, and Poisson to get the corresponding functions for these distributions. For example, utpt gives the upper tail probability for the Student t distribution [2].

The Emacs calculator can be quirky, but props to the developers for choosing good names for the probability functions.

Probability functions in R and in SciPy
Three surprises with bc
Embedding code in org-mode

[1] Incidentally, the CCDF cannot always be computed by simply computing the CDF first and subtracting the value from 1. In theory this is possible, but not in floating point practice. See the discussion of erf and erfc in this post for an explanation.

[2] These names are very short and only a handful of distribution families are supported. But that’s fine in context. The reason to use the Emacs calculator is to do a quick calculation without having to leave Emacs, not to develop production quality statistical software.

Related posts