Probability distribution parameterizations in SciPy

Parameterizations are the bane of statistical software. One of the most common errors is to assume that one software package uses the same parameterization as another package. For example, some packages specify the exponential distribution in terms of the mean but others use the rate.

Python’s SciPy library has a somewhat unusual approach to parameterization with some advantages. SciPy makes every continuous distribution a location-scale family, even those distributions that typically do not have a location or scale parameter. This eliminates, for example, the question of whether an exponential distribution is parameterized by its mean or its rate. There is no mean or rate parameter per se. But there is a scale parameter, which happens to also be the mean.

Some methods on distribution classes have unusual names. For example, the inverse CDF function, often called the quantile function, is ppf for “percentile point function.” The complementary CDF function, or CCDF, is called sf for “survival function.” (Survival function is not an unusual name, though my preference would have been ccdf since that would make the API more symmetric.)

Discrete distributions in SciPy do not have a scale parameter. Also instead of a pdf method the discrete distributions have a pmf method; continuous functions have a probability density function but discrete methods have a probability mass function.

One surprise with SciPy distributions is that the SciPy implementation of the lognormal distribution does not correspond to the definition I’m more familiar with unless the location is 0. In order to be consistent with other continuous distributions, SciPy shifts the PDF argument x whereas I believe it is more common to shift log(x). This isn’t just a difference in parameterization. It actually amounts to different distributions.

For more details, see these notes on distributions in SciPy. See also these notes on distributions in R and in Mathematica for comparison.

4 thoughts on “Probability distribution parameterizations in SciPy”

Brent Woodruff

3 February 2010 at 15:12

Gotta agree with you here that it’s an unconventional approach. It wouldn’t be so bad if someone would do a page like the NumPy for Matlab Users page, but make it “SciPy for R and Matlab users”.

I would actually love to see someone do this. It’d be pinned up at my desk in no time and would no doubt save me from reading help() all day.

Holger

4 February 2010 at 02:31

I think there is a small typo in your first paragraph: to my understanding the mean is the location, not the scale of every exponential family. At least it is so for the gaussian. The scale corresponds to the variance.

John

4 February 2010 at 06:00

Holger: If you write the exponential PDF as exp(-x/mu) / mu, mu is the mean and the scale. I suppose what’s confusing is that for the exponential, the mean is also the standard deviation.

Holger

4 February 2010 at 06:50

John: that clears it up. Thanks for the clarification.

Comments are closed.

Related posts

4 thoughts on “Probability distribution parameterizations in SciPy”