These notes explain how to compute probabilities for common statistical distributions using Mathematica. See also notes on working with distributions in R and S-PLUS, Excel, and in Python with SciPy.

## Distribution objects

Statistical distributions are standard in Mathematica version 6. Prior to that version, you had to load either the `DiscreteDistributions`

or `ContinuousDistributions`

package. For example, to load the latter you would enter the following.

<<Statistics`ContinuousDistributions`

As with everything else in Mathematica, names use Pascal case (concatenated capitalized words). The name of every distribution object ends with Distribution. For example, the Mathematica object representing the normal (Gaussian) distribution is `NormalDistribution`

. The arguments to a distribution object constructor are the distribution parameters. (See notes below about possible problems with parameterization conventions.)

## Probability density function (PDF)

To calculate the PDF (probability density function) of a distribution, pass the distribution as the first argument to `PDF[]`

and the PDF argument as the second argument. For example,

PDF[ GammaDistribution[2, 3], 17.2 ]

gives the value of f_{X}(17.2) where f_{X} is the PDF of a random variable X with a gamma distribution with shape parameter 2 and scale paremters 3. For another example,

f[x_] := PDF[ NormalDistribution[0, 1], x ]

defines a function f as the PDF of a standard normal random variable.

Note that Mathematica uses the term “PDF” for both continuous and discrete random variables. Technically, discrete distributions have or probability *mass* functions but Mathematica ignores this pedantic detail.

## Cumulative density function (CDF)

Mathematica computes the CDF (cumulative density function) of a distribution analogously to the way it computes the PDF. For example,

g[x_] := CDF[ NormalDistribution[0, 1], x ]

defines g to be CDF of a standard normal random variable.

## Quantiles (inverse CDF)

To compute the quantile function, i.e. the inverse of the CDF function, use the Mathematica function `Quantile[]`

analogous to the functions `PDF[]`

and `CDF[]`

described above.

## Other associated functions

You can find the mean or variance of a distribution by passing a distribution object to `Mean[]`

or `Variance[]`

respectively. To get a random sample, pass a distribution object to ` Random[]`

. To get an array of random samples, call `RandomArray[]`

.

## Distribution names

The following gives Mathematica names and parameterizations for common distributions.

Distribution | Mathematica name | Parameters |
---|---|---|

beta | `BetaDistribution` | a, b |

binomial | `BinomialDistribution` | n, p |

Cauchy | `CauchyDistribution` | location, scale |

chi-squared | `ChiSquareDistribution` | df |

exponential | `ExponentialDistribution` | rate |

F | `FRatioDistribution` | df1, df2 |

gamma | `GammaDistribution` | shape, scale |

geometric | `GeometricDistribution` | p |

hypergeometric | `HypergeometricDistribution` | n, s, total |

Laplace | `LaplaceDistribution` | mean, scale |

log-normal | `LogNormalDistribution` | meanlog, sdlog |

logistic | `LogisticDistribution` | location, scale |

negative binomial | `NegativeBinomialDistribution` | n, p |

normal | `NormalDistribution` | mean, sd |

Poisson | `PoissonDistribution` | lambda |

Student t | `StudentTDistribution` | df |

uniform | `UniformDistribution` | min, max |

Weibull | `WeibullDistribution` | shape, scale |

Note that `ChiSquareDistribution`

contains the word “Square” but not “Squared.” Also, Student’s t distribution is `StudentTDistribution`

and not `TDistribution`

.

The Laplace distribution is also known as the double exponential distribution.

## Notes on parameterizations

You always need to verify parameterizations in statistical software to avoid unexpected results. One way to do this is to pass a distribution object to the `Mean[]`

and `Variance[]`

functions to see whether you get what you expect

The exponential distribution is sometimes parameterized in terms of its mean, but Mathematica uses the rate, the reciprocal of the mean or scale.

Mathematica parameterizes the geometric distribution in terms of its shape and scale. Some other packages use the shape and the rate (reciprocal of the scale).

There are two common parameterizations for a hypergeometric distribution. Suppose an urn has *M* red balls and *N* blue balls. You draw *n* balls at once and want to know the probability of various numbers of red balls in your sample. Some software packages parameterize the hypergeometric distribution in terms of *n*, *M*, and *N*, but Mathematica uses *n*, *M*, and the total number of balls, *M*+*N*.

If *X* has a log-normal distribution, then log(*X*) has a normal distribution. Note that the mean and standard deviation parameters are the mean and standard deviation of log(*X*), not of *X* itself. Said another way, *X* has the same distribution as exp(*Y*) where *Y* is a normal random variable with mean and standard deviation given by the parameters.