This page summarizes how to work with univariate probability distributions using Python’s SciPy library. See also notes on working with distributions in Mathematica, Excel, and R/S-PLUS.

Probability distribution classes are located in ` scipy.stats`

.

The methods on continuous distribution classes are as follows.

Method | Meaning |
---|---|

`pdf` | Probability density function |

`cdf` | Cumulative distribution function |

`sf` | Survival function = complementary CDF |

`ppf` | Percentile point function (i.e. CDF inverse) |

`isf` | Inverse survival function (Complementary CDF inverse) |

`stats` | Mean, variance, skew, or kurtosis |

`moment` | Non-central moments |

`rvs` | Random samples |

Functions such as `pdf`

and `cdf`

are defined over the entire real line. For example, the beta distribution is commonly defined on the interval [0, 1]. If you ask for the `pdf`

outside this interval, you simply get 0. If you ask for the `cdf`

to the left of the interval you get 0, and to the right of the interval you get 1.

Distributions have a general form and a “frozen” form. The general form is stateless: you supply the distribution parameters as arguments to every call. The frozen form creates an object with the distribution parameters set. For example, you could evaluate the PDF of a normal(3, 4) distribution at the value 5 by

stats.norm.pdf(5, 3, 4)

or by

mydist = stats.norm(3, 4) mydist.pdf(5)

Note that the argument of the PDF, in this example 5, comes before the distribution parameters. Note also that for discrete distributions, one would call `pmf`

(probability *mass* function) rather than the `pdf`

(probability *density* function).

## Distributions and parameterizations

SciPy makes every continuous distribution into a location-scale family, including some distributions that typically do not have location scale parameters. This unusual approach has its advantages. For example, the question of whether an exponential distribtion is parameterized in terms of its mean or its rate goes away: there is no mean or rate parameter *per se*, only a scale parameter like every other continuous distribution.

The table below only lists parameters in addition to location and scale.

Distribution | SciPy name | Parameters |
---|---|---|

beta | `beta` | `shape1` , `shape2` |

binomial | `binom` | `size` , `prob` |

Cauchy | `cauchy` | |

chi-squared | `chi2` | `df` |

exponential | `expon` | |

F | `f` | `df1` , `df2` |

gamma | `gamma` | `shape` |

geometric | `geom` | `p` |

hypergeometric | `hypergeom` | `M` , `n` , `N` |

inverse gamma | `invgamma` | `shape` |

log-normal | `lognorm` | `sdlog` |

logistic | `logistic` | |

negative binomial | `nbinom` | `size` , `prob` |

normal | `norm` | |

Poisson | `poisson` | `lambda` |

Student t | `t` | `df` |

uniform | `unif` | |

Weibull | `exponweib` | `exponent` , `shape` |

SciPy does not have a simple Weibull distribution but instead has a generalization of the Weibull called the exponentiated Weibull. Set the exponential parameter to 1 and you get the ordinary Weibull distribution.

The hypergeometric distribution gives the probability of various numbers of red balls when *N* balls are taken from an urn containing *n* red balls and *M*–*n* blue balls. Note that another popular convention uses the number of red and blue balls rather than the number of red balls and the total number of balls.

Note that the parameters for the log-normal are the mean and standard deviation of the log of the distribution, not the mean and standard deviation of the distribution itself.

The PDF or PMF of a distribution is contained in the `extradoc`

string. For example:

>>> stats.poisson.extradoc Poisson distribution poisson.pmf(k, mu) = exp(-mu) * mu**k / k! for k >= 0

The lognormal distribution as implemented in SciPy may not be the same as the lognormal distribution implemented elsewhere. When the location parameter is 0, the `stats.lognorm`

with parameter `s`

corresponds to a lognormal(0, s) distribution as defined here. But if the location parameter is not 0, `stats.lognorm`

does not correspond to a log-normal distribution under the other distribution. The difference is whether the PDF contains log(*x*-μ) or log(*x*) – μ.

For more information, see scipy.stats online documentation.

Need help moving to the Python stack for scientific computing?