Here are some notes on how to work with probability distributions using the SciPy numerical library for Python.
Functions related to probability distributions are located in
scipy.stats. The general pattern is
There are 81 supported continuous distribution families and 12 discrete distribution families. Some distributions have obvious names:
f, etc. The only possible surprise is that all distributions begin with a lower-case letter, even those corresponding to a proper name (e.g. Cauchy). Other distribution names are less obvious:
expon for the exponential,
chi2 for chi-squared distribution, etc.
Each distribution supports several functions. The density and cumulative distribution functions are
cdf respectively. (Discrete distributions use
pmf rather than
ppf for “percentage point function.” I’d never heard that terminology and would have expected something like “quantile.”
scipy.stats.beta.cdf(0.1, 2, 3) evaluates the CDF of a beta(2, 3) random variable at 0.1.
Random values are generated using
rvs which takes an optional
size argument. The size is set to 1 by default.
scipy.stats.norm.rvs(2, 3) generates a random sample from a normal (Gaussian) random variable with mean 2 and standard deviation 3. The function call
scipy.stats.norm.rvs(2, 3, size = 10) returns an array of 10 samples from the same distribution.
The command line
help() facility does not document the distribution parameterizations, but the external documentation does. Most distributions are parameterized in terms of location and scale. This means, for example, that the exponential distribution is parameterized in terms of its mean, not its rate. Somewhat surprisingly, the exponential distribution has a location parameter. This means, for example, that
scipy.stats.expon.pdf(x, 7) evaluates at x the PDF of an exponential distribution with location 7. This is not what I expected. I assumed there would be no location parameter and that the second argument, 7, would be the mean (scale). Instead, the location was set to 7 and the scale was left at its default value 1. Writing
scipy.stats.expon.pdf(x, scale=7) would have given the expected result because the default location value is 0.
SciPy also provides constructors for objects representing random variables.
x = scipy.stats.norm(3, 1); x.cdf(2.7) returns the same value as
scipy.stats.norm.cdf(2.7, 3, 1).
Constructing objects representing random variables encapsulates the differences between distributions in the constructors. For example, some distributions take more parameters than others and so their object constructors require more arguments. But once a distribution object is created, its PDF, for example, can be called with a single argument. This makes it easier to write code that takes a general distribution object as an argument.