The previous post looked at how probability predictions from a logistic regression model vary as a function of the fitted parameters. This post goes through the same exercise for probit regression and compares the two kinds of nonlinear regression.
Generalized linear models and link functions
Logistic and probit regression are minor variations on a theme. They are both forms of generalized linear models (GLM), but with different link functions.
In logistic regression, the link function is
whereas in probit regression the link function is
where Φ is the standard normal distribution CDF, i.e.
The probability prediction from a generalized linear model is the inverse of the link function applied to a linear model. In logistic regression we had
and in probit regression we have
Comparing logistic and probit curves
As a rule of thumb, probit regression coefficients are roughly equal to logisitic regression coefficients divided by 1.6.  To see this, here’s a plot of the logistic curve from the previous post, which used a = 3 and b = 4, and a plot of the probit curve where a = 3/1.6 and b = 4/1.6.
And because the curves are so close together, we’ll also plot their difference to see a little better how they differ.
If the curves are so similar, why do we have both? Because one may be easier to work with in some context and the other in other contexts.
Sensitivity to parameters
In this post we’ll look at how
varies with each of its parameters.
As with the logistic curve before, the probit curve has zero curvature when x = –a/b, which corresponds to p = 1/2.
For the logistic curve, the slope at p = 1/2 was simply b, whereas here with the probit curve the slope at the flattest part is b/√(2π).
The rate of change of p with respect to a is
To find where this is maximizes we find where
is zero, which is again at x = –a/b. By plugging this back in we find the maximum rate of change of p with respect to a is 1/√(2π).
As before, the dependence on b is a little more complicated than the dependence on a. At the middle of the curve, i.e. when p = 1/2, and so x = –a/b, we have
The places where the sensitivity to b are maximized are a little easier to find this time.
and so the places most sensitive to a change in b can be found by solving a quadratic equation, and it works out that
 Andrew Gelman and Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press 2007.