The following is a direct quote from Anthony O’Hagan’s book Bayesian Inference. I’ve edited the quote only to enumerate the points.

Why should one use Bayesian inference, as opposed to classical inference? There are various answers. Broadly speaking, some of the arguments in favour of the the Bayesian approach are that it is

- fundamentally sound,
- very flexible,
- produces clear and direct inferences,
- makes use of all available information.

I’ll elaborate briefly on each of O’Hagan’s points.

Bayesian inference has a solid philosophical foundation. It is consistent with certain axioms of rational inference. Non-Bayesian systems of inference, such as fuzzy logic, must violate one or more of these axioms; their conclusions are rationally satisfying to the extent that they approximate Bayesian inference.

Bayesian inference is at the same time rigid and flexible. It is rigid in the sense that all inference follows the same form: set up a likelihood and a prior, then calculate the posterior by conditioning on observed data via Bayes theorem. But this rigidity channels creativity into useful directions. It provides a template for setting up complex models when necessary.

Frequentist inferences are awkward to explain. For example, confidence intervals and p-values are tedious to define rigorously. Most consumers of confidence intervals and p-values do not know what they mean and implicitly assume Bayesian interpretations. The difference is not simply pedantic. Particularly with regard to p-values, the common understanding can be grossly inaccurate. By contrast, Bayesian counterparts are simple to define and interpret. Bayesian credible intervals are exactly what most people think confidence intervals are. And a Bayesian hypotheses test simply compares the probability of each hypothesis via Bayes factors.

Sometimes the *necessity* of specifying prior distributions is seen as a drawback to Bayesian inference. On the other hand, the *ability* to specify prior distributions means that more information can be incorporated in an inference. See Musicians, drunks, and Oliver Cromwell for a colorful illustration from Jim Berger on the need to incorporate prior information.

**Related posts**:

Nice post! Do you have a reference to fuzzy logic violating an axiom used to derive probability theory? I’ll admit that I’ve only lately gotten interested in probabilistic reasoning and inference via Sewel Wright’s “path analysis” and going through the upcoming book by Koller and Friedman on graphical models. Before that I’d been a Kosko fuzzy logic fan boy, it all seemed very intuitive, except for the centers-of-mass de-fuzzification.

Neal, see Probability theory: the logic of science by E. T. Jaynes.

Thanks – Mark Reid suggested the same book for different reasons