Parameterizations are the bane of statistical software. One of the most common errors is to assume that one software package uses the same parameterization as another package. For example, some packages specify the exponential distribution in terms of the mean but others use the rate.
Python’s SciPy library has a somewhat unusual approach to parameterization with some advantages. SciPy makes every continuous distribution a location-scale family, even those distributions that typically do not have a location or scale parameter. This eliminates, for example, the question of whether an exponential distribution is parameterized by its mean or its rate. There is no mean or rate parameter per se. But there is a scale parameter, which happens to also be the mean.
Some methods on distribution classes have unusual names. For example, the inverse CDF function, often called the quantile function, is
ppf for “percentile point function.” The complementary CDF function, or CCDF, is called
sf for “survival function.” (Survival function is not an unusual name, though my preference would have been
ccdf since that would make the API more symmetric.)
Discrete distributions in SciPy do not have a scale parameter. Also instead of a
pmf method; continuous functions have a probability density function but discrete methods have a probability mass function.
One surprise with SciPy distributions is that the SciPy implementation of the lognormal distribution does not correspond to the definition I’m more familiar with unless the location is 0. In order to be consistent with other continuous distributions, SciPy shifts the PDF argument x whereas I believe it is more common to shift log(x). This isn’t just a difference in parameterization. It actually amounts to different distributions.
For daily tips on Python and scientific computing, follow @SciPyTip on Twitter.