This page summarizes how to work with univariate probability distributions in R and S-PLUS. See also notes on working with distributions in Mathematica, Excel, and in Python with SciPy.
R and S-PLUS use prefixes and bases to denote functions related to a distribution. The prefixes are d
, p
, q
, and r
. The bases are the name of the distribution family such as norm
for the normal distribution.
The prefix d
is for density, i.e. PDF.
The prefix p
is for CDF (cumulative density function), unless the argument lower.tail = FALSE
is supplied, in which case it turns into the CCDF (complementary CDF).
The prefix q
is for the CDF inverse, unless the argument lower.tail = FALSE
is
supplied, in which case it turns into the CCDF inverse.
The prefix r
is for random sample.
The first argument to a distribution-related function is the ostensible argument. Next come the distribution parameters followed by other options.
Examples
pnorm(0.77, 0, 2.1)
computes FX(0.77) where X is a normal random variable with mean 0 and standard deviation 2.1 and FX is its CDF.
dbeta(0.7, 2.1, 3.4)
computes fX(0.7) where X is a beta random variable with parameters 2.1 and 3.4 and fX is its PDF.
qgamma(0.1, 3.1, 1.0, lower.tail = FALSE)
finds a value y so that P(Y > y) = 0.1 where Y has a gamma distribution with shape 3.1 and scale 1.
Distributions and parameterizations
Distribution | Base name | Parameters |
---|---|---|
beta | beta |
shape1 , shape2 |
binomial | binom |
size , prob |
Cauchy | cauchy |
location , scale |
chi-squared | chisq |
df |
exponential | exp |
rate |
F | f |
df1 , df2 |
gamma | gamma |
shape , rate |
geometric | geom |
p |
hypergeometric | hyper |
m , n , k |
log-normal | lnorm |
meanlog , sdlog |
logistic | logis |
location , scale |
negative binomial | nbinom |
size , prob |
normal | norm |
mean , sd |
Poisson | pois |
lambda |
Student t | t |
df |
uniform | unif |
min , max |
Weibull | weibull |
shape , scale |
Note that the exponential is parameterized in terms of the rate, the reciprocal of the mean.
The gamma can be parameterized by its shape and either the rate or the scale. The rate is the default argument by position, but you can specify the scale by name.
The hypergeometric distribution gives the probability of various numbers of red balls when k balls are taken from an urn containing m red balls and n blue balls. Note that another popular convention uses the number of red balls and the total number of balls m+n.
Note that the parameters for the log-normal are the mean and standard deviation of the log of the distribution, not the mean and standard deviation of the distribution itself.