Occam’s razor says that if two models fit equally well, the simpler model is likely to be a better description of reality. Why should that be?
A paper by Jim Berger suggests a Bayesian justification of Occam’s razor: simpler hypotheses have higher posterior probabilities when they fit well.
A simple model makes sharper predictions than a more complex model. For example, consider fitting a linear model and a cubic model. The cubic model is more general and fits more data. The linear model is more restrictive and hence easier to falsify. But when the linear and cubic models both fit, Bayes’ theorem “rewards” the linear model for making a bolder prediction. See Berger’s paper for a details and examples.
From the conclusion of the paper:
Ockham’s razor, far from being merely an ad hoc principle, can under many practical situations in science be justified as a consequence of Bayesian inference. Bayesian analysis can shed new light on what the notion of “simplest” hypothesis consistent with the data actually means.