Yesterday I posted a working paper version of an article I’ve been working on with Jairo Fúquene and Luis Pericchi: A Case for Robust Bayesian priors with Applications to Binary Clinical Trials.

Bayesian analysis begins with a prior distribution, a function summarizing what is believed about an experiment before any data are collected. The prior is updated as data become available and becomes the posterior distribution, a function summarizing what is currently believed in light of the data collected so far. As more data are collected, the relative influence of the prior decreases and the influence of the data increases. Whether a prior is robust depends on the rate at which the influence of the prior decreases.

There are essentially three approaches to how the influence of the prior on the posterior should vary as a function of the data.

**Robustness with respect to the prior**. When the data and the prior disagree, give more weight to the*data*.**Conjugate priors**. The influence of the prior is independent of the extent to which it agrees with the data.**Robustness with respect to the data**. When the data and the prior disagree, give more weight to the*prior*.

When I say “give more weight to the data” or “give more weight to the prior,” I’m not talking about making *ad hoc* exceptions to Bayes theorem. The weight given to one or the other falls out of the usual application of Bayes theorem. Roughly speaking, robustness has to do with the relative thickness of the tails of the prior and the likelihood. A model with thicker tails on the prior will be robust with respect to the prior, and a model with thicker tails on the likelihood will be robust with respect to the data.

Each of the three approaches above are appropriate in different circumstances. When priors come from well-understood physical principles, it may make sense to use a model that is robust with respect to the data, i.e. to suppress outliers. When priors are based on vague beliefs, it may make more sense to be robust with respect to the prior. Between these extremes, particularly when a large amount of data is available, conjugate priors may be appropriate.

When the data and the prior are in rough agreement, the contribution of a robust prior to the posterior is comparable to the contribution that a conjugate prior would have had. (And so using robust proper priors leads to greater variance reduction than using improper priors.) But as the level of agreement decreases, the contribution of a robust prior to the posterior also decreases.

In the paper, we show that with a binomial likelihood, the influence of a conjugate prior grows without bound as the prior mean goes to infinity. However, with a Student-*t* prior, the influence of the prior is bounded as the prior mean increases. For a Cauchy prior, the influence of the prior is bounded as the location parameter goes to infinity.

It’s easy to confuse a robust prior and a vague conjugate prior. Our paper shows how in a certain sense, even an “informative” Cauchy distribution is less informative than a “non-informative” conjugate prior.

This looks similar to our weakly informative prior distribution for logistic regression; see here: http://www.stat.columbia.edu/~gelman/research/unpublished/priors10.pdf

We evaluate our model using examples and cross-validation, and you use algebra, but the general approach seems similar.

Shouldn’t these be the other way around?

“Roughly speaking, robustness has to do with the relative thickness of the tails of the prior and the likelihood. A model with thicker tails on the prior will be robust with respect to the prior, and a model with thicker tails on the likelihood will be robust with respect to the data.”

No, I believe that’s right.

With a binomial likelihood and a Cauchy prior, for example, the effect of the prior is bounded as the prior mode goes to infinity. In that case the model is robust with respect to the prior.