These notes explain how to compute probabilities for common statistical distributions using Mathematica. See also notes on working with distributions in R and S-PLUS, Excel, and in Python with SciPy.
Distribution objects
Statistical distributions are standard in Mathematica version 6. Prior to that version, you had to load either the DiscreteDistributions
or ContinuousDistributions
package. For example, to load the latter you would enter the following.
<<Statistics`ContinuousDistributions`
As with everything else in Mathematica, names use Pascal case (concatenated capitalized words). The name of every distribution object ends with Distribution. For example, the Mathematica object representing the normal (Gaussian) distribution is NormalDistribution
. The arguments to a distribution object constructor are the distribution parameters. (See notes below about possible problems with parameterization conventions.)
Probability density function (PDF)
To calculate the PDF (probability density function) of a distribution, pass the distribution as the first argument to PDF[]
and the PDF argument as the second argument. For example,
PDF[ GammaDistribution[2, 3], 17.2 ]
gives the value of fX(17.2) where fX is the PDF of a random variable X with a gamma distribution with shape parameter 2 and scale parameters 3. For another example,
f[x_] := PDF[ NormalDistribution[0, 1], x ]
defines a function f as the PDF of a standard normal random variable.
Note that Mathematica uses the term “PDF” for both continuous and discrete random variables. Technically, discrete distributions have or probability mass functions but Mathematica ignores this pedantic detail.
Cumulative density function (CDF)
Mathematica computes the CDF (cumulative density function) of a distribution analogously to the way it computes the PDF. For example,
g[x_] := CDF[ NormalDistribution[0, 1], x ]
defines g to be CDF of a standard normal random variable.
Quantiles (inverse CDF)
To compute the quantile function, i.e. the inverse of the CDF function, use the Mathematica function Quantile[]
analogous to the functions PDF[]
and CDF[]
described above.
Other associated functions
You can find the mean or variance of a distribution by passing a distribution object to Mean[]
or Variance[]
respectively. To get a random sample, pass a distribution object to Random[]
. To get an array of random samples, call RandomArray[]
.
Distribution names
The following gives Mathematica names and parameterizations for common distributions.
Distribution | Mathematica name | Parameters |
---|---|---|
beta | BetaDistribution |
a, b |
binomial | BinomialDistribution |
n, p |
Cauchy | CauchyDistribution |
location, scale |
chi-squared | ChiSquareDistribution |
df |
exponential | ExponentialDistribution |
rate |
F | FRatioDistribution |
df1, df2 |
gamma | GammaDistribution |
shape, scale |
geometric | GeometricDistribution |
p |
hypergeometric | HypergeometricDistribution |
n, s, total |
Laplace | LaplaceDistribution |
mean, scale |
log-normal | LogNormalDistribution |
meanlog, sdlog |
logistic | LogisticDistribution |
location, scale |
negative binomial | NegativeBinomialDistribution |
n, p |
normal | NormalDistribution |
mean, sd |
Poisson | PoissonDistribution |
lambda |
Student t | StudentTDistribution |
df |
uniform | UniformDistribution |
min, max |
Weibull | WeibullDistribution |
shape, scale |
Note that ChiSquareDistribution
contains the word “Square” but not “Squared.” Also, Student’s t distribution is StudentTDistribution
and not TDistribution
.
The Laplace distribution is also known as the double exponential distribution.
Notes on parameterizations
You always need to verify parameterizations in statistical software to avoid unexpected results. One way to do this is to pass a distribution object to the Mean[]
and Variance[]
functions to see whether you get what you expect
The exponential distribution is sometimes parameterized in terms of its mean, but Mathematica uses the rate, the reciprocal of the mean or scale.
Mathematica parameterizes the geometric distribution in terms of its shape and scale. Some other packages use the shape and the rate (reciprocal of the scale).
There are two common parameterizations for a hypergeometric distribution. Suppose an urn has M red balls and N blue balls. You draw n balls at once and want to know the probability of various numbers of red balls in your sample. Some software packages parameterize the hypergeometric distribution in terms of n, M, and N, but Mathematica uses n, M, and the total number of balls, M+N.
If X has a log-normal distribution, then log(X) has a normal distribution. Note that the mean and standard deviation parameters are the mean and standard deviation of log(X), not of X itself. Said another way, X has the same distribution as exp(Y) where Y is a normal random variable with mean and standard deviation given by the parameters.