Probability distributions in R

This page summarizes how to work with univariate probability distributions in R and S-PLUS. See also notes on working with distributions in Mathematica, Excel, and in Python with SciPy.

R and S-PLUS use prefixes and bases to denote functions related to a distribution. The prefixes are d, p, q, and r. The bases are the name of the distribution family such as norm for the normal distribution.

The prefix d is for density, i.e. PDF.

The prefix p is for CDF (cumulative density function), unless the argument lower.tail = FALSE is supplied, in which case it turns into the CCDF (complementary CDF).

The prefix q is for the CDF inverse, unless the argument lower.tail = FALSE is
supplied, in which case it turns into the CCDF inverse.

The prefix r is for random sample.

The first argument to a distribution-related function is the ostensible argument. Next come the distribution parameters followed by other options.


pnorm(0.77, 0, 2.1) computes FX(0.77) where X is a normal random variable with mean 0 and standard deviation 2.1 and FX is its CDF.

dbeta(0.7, 2.1, 3.4) computes fX(0.7) where X is a beta random variable with parameters 2.1 and 3.4 and fX is its PDF.

qgamma(0.1, 3.1, 1.0, lower.tail = FALSE) finds a value y so that P(Y > y) = 0.1 where Y has a gamma distribution with shape 3.1 and scale 1.

Distributions and parameterizations

Distribution Base name    Parameters
beta beta shape1, shape2
binomial binom size, prob
Cauchy cauchy location, scale
chi-squared chisq df
exponential exp rate
F f df1, df2
gamma gamma shape, rate
geometric geom p
hypergeometric hyper m, n, k
log-normal lnorm meanlog, sdlog
logistic logis location, scale
negative binomial nbinom size, prob
normal norm mean, sd
Poisson pois lambda
Student t t df
uniform unif min, max
Weibull weibull shape, scale

Note that the exponential is parameterized in terms of the rate, the reciprocal of the mean.

The gamma can be parameterized by its shape and either the rate or the scale. The rate is the default argument by position, but you can specify the scale by name.

The hypergeometric distribution gives the probability of various numbers of red balls when k balls are taken from an urn containing m red balls and n blue balls. Note that another popular convention uses the number of red balls and the total number of balls m+n.

Note that the parameters for the log-normal are the mean and standard deviation of the log of the distribution, not the mean and standard deviation of the distribution itself.