# Distributions in Mathematica

These notes explain how to compute probabilities for common statistical distributions using Mathematica. See also notes on working with distributions in R and S-PLUS, Excel, and in Python with SciPy.

## Distribution objects

Statistical distributions are standard in Mathematica version 6. Prior to
that version, you had to load either the `DiscreteDistributions`

or
`ContinuousDistributions`

package. For example, to load the latter you would
enter the following.

<<Statistics`ContinuousDistributions`

As with everything else in Mathematica, names use Pascal case
(concatenated capitalized words). The name of every distribution object ends
with Distribution. For example, the Mathematica object representing the
normal (Gaussian) distribution is `NormalDistribution`

. The arguments to a
distribution object constructor are the distribution parameters. (See notes
below about possible problems with parameterization conventions.)

## Probability density function (PDF)

To calculate the PDF (probability density function) of a distribution,
pass the distribution as the first argument to `PDF[]`

and the PDF argument as
the second argument. For example,

PDF[ GammaDistribution[2, 3], 17.2 ]

gives the value of f_{X}(17.2) where f_{X} is the PDF of a
random variable X with a gamma distribution
with shape parameter 2 and scale paremters 3. For another example,

f[x_] := PDF[ NormalDistribution[0, 1], x ]

defines a function f as the PDF of a standard normal random variable.

Note that Mathematica uses the term "PDF" for both continuous and
discrete random variables. Technically, discrete distributions have or
probability *mass* functions but Mathematica ignores this pedantic
detail.

## Cumulative density function (CDF)

Mathematica computes the CDF (cumulative density function) of a distribution analogously to the way it computes the PDF. For example,

g[x_] := CDF[ NormalDistribution[0, 1], x ]

defines g to be CDF of a standard normal random variable.

## Quantiles (inverse CDF)

To compute the quantile function, i.e. the inverse of the CDF function,
use the Mathematica function `Quantile[]`

analogous to the functions
`PDF[]`

and `CDF[]`

described above.

## Other associated functions

You can find the mean or variance of a distribution by passing a
distribution object to `Mean[]`

or `Variance[]`

respectively. To get a random sample, pass a distribution object to ```
Random[]
```

. To get an array of random samples, call `RandomArray[]`

.

## Distribution names

The following gives Mathematica names and parameterizations for common distributions.

Distribution |
Mathematica name |
Parameters |

beta | `BetaDistribution` |
a, b |

binomial | `BinomialDistribution` |
n, p |

Cauchy | `CauchyDistribution` |
location, scale |

chi-squared | `ChiSquareDistribution` |
df |

exponential | `ExponentialDistribution` |
rate |

F | `FRatioDistribution` |
df1, df2 |

gamma | `GammaDistribution` |
shape, scale |

geometric | `GeometricDistribution` |
p |

hypergeometric | `HypergeometricDistribution` |
n, s, total |

Laplace | `LaplaceDistribution` |
mean, scale |

log-normal | `LogNormalDistribution` |
meanlog, sdlog |

logistic | `LogisticDistribution` |
location, scale |

negative binomial | `NegativeBinomialDistribution` |
n, p |

normal | `NormalDistribution` |
mean, sd |

Poisson | `PoissonDistribution` |
lambda |

Student t | `StudentTDistribution` |
df |

uniform | `UniformDistribution` |
min, max |

Weibull | `WeibullDistribution` |
shape, scale |

Note that `ChiSquareDistribution`

contains the word "Square" but not
"Squared." Also, Student's t distribution is `StudentTDistribution`

and not `TDistribution`

.

The Laplace distribution is also known as the double exponential distribution.

## Notes on parameterizations

You always need to verify parameterizations in statistical software to
avoid unexpected results. One way to do this is to pass a distribution
object to the `Mean[]`

and `Variance[]`

functions to see whether you get what
you expect

The exponential distribution is sometimes parameterized in terms of its mean, but Mathematica uses the rate, the reciprocal of the mean or scale.

Mathematica parameterizes the geometric distribution in terms of its shape and scale. Some other packages use the shape and the rate (reciprocal of the scale).

There are two common parameterizations for a hypergeometric distribution. Suppose an urn has M red balls and N blue balls. You draw n balls at once and want to know the probability of various numbers of red balls in your sample. Some software packages parameterize the hypergeometric distribution in terms of n, M, and N, but Mathematica uses n, M, and the total number of balls, M+N.

If X has a log-normal distribution, then log(X) has a normal distribution. Note that the mean and standard deviation parameters are the mean and standard deviation of log(X), not of X itself. Said another way, X has the same distribution as exp(Y) where Y is a normal random variable with mean and standard deviation given by the parameters.