Probit regression

The previous post looked at how probability predictions from a logistic regression model vary as a function of the fitted parameters. This post goes through the same exercise for probit regression and compares the two kinds of nonlinear regression.

Generalized linear models and link functions

Logistic and probit regression are minor variations on a theme. They are both forms of generalized linear models (GLM), but with different link functions.

In logistic regression, the link function is

\mbox{logit}(p) = \log\left( \frac{p}{1-p} \right)

whereas in probit regression the link function is

\Phi^{-1}(p)

where Φ is the standard normal distribution CDF, i.e.

\Phi(x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x \exp(-t^2/2)\, dt

The probability prediction from a generalized linear model is the inverse of the link function applied to a linear model. In logistic regression we had

p(x) = \text{logit}^{-1}(a + bx) = \frac{1}{1 + \exp(-a -bx)}

and in probit regression we have

p(x) = \text{probit}^{-1}(a + bx) = \Phi(a + bx).

Comparing logistic and probit curves

As a rule of thumb, probit regression coefficients are roughly equal to logistic regression coefficients divided by 1.6. [1] To see this, here’s a plot of the logistic curve from the previous post, which used a = 3 and b = 4, and a plot of the probit curve where a = 3/1.6 and b = 4/1.6.

logistic and rescaled probit curves

And because the curves are so close together, we’ll also plot their difference to see a little better how they differ.

Difference between rescaled probit and logitistic curves

If the curves are so similar, why do we have both? Because one may be easier to work with in some context and the other in other contexts.

Sensitivity to parameters

In this post we’ll look at how

p(x, a, b) = \Phi(a + bx)

varies with each of its parameters.

As with the logistic curve before, the probit curve has zero curvature when x = –a/b, which corresponds to p = 1/2.

For the logistic curve, the slope at p = 1/2 was simply b, whereas here with the probit curve the slope at the flattest part is b/√(2π).

The rate of change of p with respect to a is

\frac{\partial p}{\partial a} = \frac{\exp\left( -\frac{1}{2} (a + bx)^2 \right)}{\sqrt{2\pi}}

To find where this is maximizes we find where

\frac{\partial^2 p}{\partial x\, \partial a} = \frac{-b \exp\left( -\frac{1}{2} (a + bx)^2 \right)(a + bx)}{\sqrt{2\pi}}

is zero, which is again at x = –a/b. By plugging this back in we find the maximum rate of change of p with respect to a is 1/√(2π).

As before, the dependence on b is a little more complicated than the dependence on a. At the middle of the curve, i.e. when p = 1/2, and so x = −a/b, we have

\left.\frac{\partial p}{\partial b}\right|_{x = -a/b} = -\frac{a}{b\sqrt{2\pi}}

The places where the sensitivity to b are maximized are a little easier to find this time.

\frac{\partial^2 p}{\partial x \, \partial b} = -\frac{(b^2x^2 + abx -1) \exp\left(-\frac{1}{2}(a + bx)^2 \right)}{\sqrt{2\pi}}

and so the places most sensitive to a change in b can be found by solving a quadratic equation, and it works out that

x = \frac{-a \pm \sqrt{a^2 + 4}}{2b}

More regression posts

[1] Andrew Gelman and Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press 2007.

Sensitivity of logistic regression prediction on coefficients

The output of a logistic regression model is a function that predicts the probability of an event as a function of the input parameter. This post will only look at a simple logistic regression model with one predictor, but similar analysis applies to multiple regression with several predictors.

p(x) = \frac{1}{1 + \exp(-a + bx)}

Here’s a plot of such a curve when a = 3 and b = 4.

Flattest part

The curvature of the logistic curve is small at both extremes. As x comes in from negative infinity, the curvature increases, then decreases to zero, then increases again, then decreases as x goes to positive infinity. We quantified this statement in another post where we calculate the curvature. The curvature is zero at the point where the second derivative of p

p''(x) = \frac{b^2 \exp(a + bx)\left(\exp(a +bx) -1\right)}{(1 + \exp(a + bx))^3}

is zero, which occurs when x = −a/b. At that point p = 1/2, so the curve is flattest where the probability crosses 1/2. In the graph above, this happens at x = −0.75.

A little calculation shows that the slope at the flattest part of the logistic curve is simply b.

Sensitivity to parameters

Now how much does the probability prediction p(x) change as the parameter a changes? We now need to consider p as a function of three variables, i.e. we need to consider a and b as additional variables. The marginal change in p in response to a change in a is the partial derivative of p with respect to a.

To know where this is maximized with respect to x, we take the partial derivative of the above expression with respect to x

\frac{\partial^2 p}{\partial x\, \partial a} = \frac{b(\exp(a + bx) - 1) \exp(a + bx)}{(1 + \exp(a + bx))^3}

which is zero when  x = −a/b, the same place where the logistic curve is flattest. And the partial of p with respect to a at that point is simply 1/4, independent of b. So a small change Δa results in a change of approximately Δa/4 at the flattest part of the logistic curve and results in less change elsewhere.

What about the dependence on b? That’s more complicated. The rate of change of p with respect to b is

\frac{\partial p}{\partial b} = \frac{\exp(a + bx) x }{(1 + \exp(a + bx))^2}

and this is maximized where

\frac{\partial^2 p}{\partial x \partial b} = 0

which in turn requires solving a nonlinear equation. This is easy to do numerically in a specific case, but not easy to work with analytically in general.

However, we can easily say how p changes with b near the point x = −a/b. This is not where the partial of p with respect to b is maximized, but it’s a place of interest because it has come up two times above. At that point the derivative of p with respect to b is −a/4b. So if a and b have the same sign, then a small increase in b will result in a small decrease in p and vice versa.