Probit regression

The previous post looked at how probability predictions from a logistic regression model vary as a function of the fitted parameters. This post goes through the same exercise for probit regression and compares the two kinds of nonlinear regression.

Generalized linear models and link functions

Logistic and probit regression are minor variations on a theme. They are both forms of generalized linear models (GLM), but with different link functions.

In logistic regression, the link function is

\mbox{logit}(p) = \log\left( \frac{p}{1-p} \right)

whereas in probit regression the link function is

\Phi^{-1}(p)

where Φ is the standard normal distribution CDF, i.e.

\Phi(x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x \exp(-t^2/2)\, dt

The probability prediction from a generalized linear model is the inverse of the link function applied to a linear model. In logistic regression we had

p(x) = \text{logit}^{-1}(a + bx) = \frac{1}{1 + \exp(-a -bx)}

and in probit regression we have

p(x) = \text{probit}^{-1}(a + bx) = \Phi(a + bx).

Comparing logistic and probit curves

As a rule of thumb, probit regression coefficients are roughly equal to logistic regression coefficients divided by 1.6. [1] To see this, here’s a plot of the logistic curve from the previous post, which used a = 3 and b = 4, and a plot of the probit curve where a = 3/1.6 and b = 4/1.6.

logistic and rescaled probit curves

And because the curves are so close together, we’ll also plot their difference to see a little better how they differ.

Difference between rescaled probit and logitistic curves

If the curves are so similar, why do we have both? Because one may be easier to work with in some context and the other in other contexts.

Sensitivity to parameters

In this post we’ll look at how

p(x, a, b) = \Phi(a + bx)

varies with each of its parameters.

As with the logistic curve before, the probit curve has zero curvature when x = –a/b, which corresponds to p = 1/2.

For the logistic curve, the slope at p = 1/2 was simply b, whereas here with the probit curve the slope at the flattest part is b/√(2π).

The rate of change of p with respect to a is

\frac{\partial p}{\partial a} = \frac{\exp\left( -\frac{1}{2} (a + bx)^2 \right)}{\sqrt{2\pi}}

To find where this is maximizes we find where

\frac{\partial^2 p}{\partial x\, \partial a} = \frac{-b \exp\left( -\frac{1}{2} (a + bx)^2 \right)(a + bx)}{\sqrt{2\pi}}

is zero, which is again at x = –a/b. By plugging this back in we find the maximum rate of change of p with respect to a is 1/√(2π).

As before, the dependence on b is a little more complicated than the dependence on a. At the middle of the curve, i.e. when p = 1/2, and so x = –a/b, we have

\left.\frac{\partial p}{\partial b}\right|_{x = -a/b} = -\frac{a}{b\sqrt{2\pi}}

The places where the sensitivity to b are maximized are a little easier to find this time.

\frac{\partial^2 p}{\partial x \, \partial b} = -\frac{(b^2x^2 + abx -1) \exp\left(-\frac{1}{2}(a + bx)^2 \right)}{\sqrt{2\pi}}

and so the places most sensitive to a change in b can be found by solving a quadratic equation, and it works out that

x = \frac{-a \pm \sqrt{a^2 + 4}}{2b}

More regression posts

[1] Andrew Gelman and Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press 2007.