Andrew Gelman has some interesting comments on non-informative priors this morning. Rather than thinking of the prior as a static thing, think of it as a way to prime the pump.

… a non-informative prior is a placeholder: you can use the non-informative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. …

At first this may sound like tweaking your analysis until you get the conclusion you want. It’s like the old joke about consultants: the client asks what 2+2 equals and the consultant counters by asking the client what he wants it to equal. But that’s not what Andrew is recommending.

A prior distribution cannot strictly be non-informative, but there are common intuitive notions of what it means to be non-informative. It may be helpful to substitute “convenient” or “innocuous” for “non-informative.” My take on Andrew’s advice is something like this.

Start with a prior distribution that’s easy to use and that nobody is going to give you grief for using. Maybe the prior doesn’t make much difference. But if your convenient/innocuous prior leads to too vague a conclusion, go back and use a more realistic prior, one that requires more effort or risks more criticism.

It’s odd that realistic priors can be more controversial than unrealistic priors, but that’s been my experience. It’s OK to be unrealistic as long as you’re conventional.

* * *

Do I understand correctly that this is explaining the pragmatic statistical process of using “conventional wisdom” first & foremost, even if it’s less robust or realistic than your own analysis/experience? Additionally, reversion to your own analysis/experience is worthwhile only if the initial outcomes aren’t satisfactory?

Is this a way of explaining that the pain and effort required to justify the unconventional method/prior (i.e.: “to go against the grain”) is only worthwhile if the conventional prior is too badly flawed, even if less useful initially?

Insightful and very useful concept. Don’t waste a lot of energy fighting a lot of battles since the cost in credibility is too great. Save it for right battles and only when necessary, even if you’re “right”. Kind of a “Discretion is the greater part of Valor” type of thing…

I’m going to ask this question, with only practical concerns in mind:

If you’re doing predictive modeling, would it make sense to tune the prior parameters to optimize some cross-validation metric? Is this done in practice? It seems to me that this would be better than blindly using non-informative priors, and possibly better than trying to quantify prior knowledge (which is never straightforward). Obviously you would want to sanity check your optimized priors. To me, the practical benefit of Bayesian methods is to avoid magnitude errors of coefficient estimates when dealing with limited data or rare events. Why not just use optimization to achieve this?

Dave: Yes, I’ve done what you suggest. I agree with the subjective Bayes philosophy as an ideal. But in all but the simplest models, nobody actually has prior beliefs on the parameters. These parameters are technical artifacts. People have prior beliefs regarding the

consequencesof the parameters, and that’s where to specify priors. So I’ve written software that elicits prior probabilities and then uses optimization to find the hyperparameters that lead to the elicited values.I’d constrain the optimization a bit, adding a penalty term for small prior precisions. That way you can still get close to the elicited values without letting the optimization choose highly informative priors.

I don’t think this approach is too common. When I put this in a journal article, some people seemed to think it was original. I doubt I was the first to do this, but their reaction suggested that the idea isn’t too widespread in practice.

Thanks. I’m glad I’m not the only one who has thought of this, and glad to hear you’ve had some success with it. Seems like an obvious thing to do (although harder to implement, and more time consuming to run).

The essential step is to test your model’s sensitivity to the specified priors. If its not sensitive, then its not an issue.

Dave and John:

I have also been thinking a lot about how to put priors on the “data level” instead of on the “parameter level”. This seems like such a useful concept that:

1. There should be a name for this approach.

2. There should be some literature.

3. There should be an R/Python package.

Or at least there should exist one paper explaining why it is a bad idea…

Here’s a simple example of eliciting priors in a way that the expert find meaningful: ask for quantiles rather than distribution parameters. Doctors, for example, don’t have prior beliefs for Greek letters.

Here’s a brief report on how to solve for parameters from quantiles for several common distributions.