E. T. Jaynes gave a speech entitled A Backward Look to the Future in which he looked back on his long career as a physicist and statistician. The speech contains several quotes related to my recent post on what a probability means.

Jaynes advocated the view of probability theory as logic extended to include reasoning on incomplete information. **Probability need not have anything to do with randomness**. Jaynes believed that frequency interpretations of probability are unnecessary and misleading.

… think of probability theory as extended logic, because then probability

distributions are justified in terms of their demonstrable information content, rather than their imagined—and as it now turns out, irrelevant—frequency connections.

He concludes with this summary of his approach to probability.

As soon as we recognize that

probabilities do not describe reality—only our information about reality—the gates are wide open to the optimal solution of problems of reasoning from that information.

Sure, but “optimal” seems a bit strong!

Agreed. “Optimal” is a loaded term: optimal by what criteria? Does the labor required to compute the solution fit in? A lot of sub-optimal things begin to look closer to optimal when you take more factors — especially human limitations — into account.

Although I found Berger’s book on Bayesian statistics tough going, he does do a lot with Loss functions and utility. As an undergraduate physics major and later engineer I was taught a lot of probability in the classic guise. However, this idea that errors have costs, and you can’t really make the right choice unless and until you decide where your risks are was revealing and interesting. Not everyone has such a good understanding of the decision they are about to make, so I guess they go with the Loss function equivalent of the uninformed Prior. Yet, often, as in ROC curves and elsewhere, there’s a natural place for costs of being wrong to be introduced, and I wonder why courses don’t use them more.

Loss functions are contentious and easily lead to bike shed arguments. And when people can’t agree on a loss function, the mathematicians win by default: everyone uses classical methods that implicitly assume a squared-error loss function that makes the mathematics most convenient.