Posts Tagged ‘Information theory’

Learning is not the same as gaining information

Friday, April 4th, 2008

Learning is not the same as just gaining information. Sometimes learning means letting go of previously held beliefs. While this is true in life in general, my point here is to show how this holds true when using the mathematical definition of information.

The information content of a probability density function p(x) is given by

integral of p(x) log p(x)

Suppose we have a Beta(2, 6) prior on the probability of success for a binary outcome.

plot of beta(2,6) density

The prior density has information content 0.597. Then suppose we observe a success. The posterior density is distributed as Beta(3, 6). The posterior density has information 0.516, less information than the prior density.

plot of beta(3,6) density

Observing a success pulled the posterior density toward the right. The posterior density is a little more diffuse than the prior and so has lower information content. In that sense, we know less than before we observed the data! Actually, we’re less certain than we were before observing the data. But if the true probability of response is larger than our prior would indicate, we’re closer to the truth by becoming less confident of our prior belief, and we’ve learned something.

Probability and information

Monday, March 3rd, 2008

E. T. Jaynes gave a speech entitled A Backward Look to the Future in which he looked back on his long career as a physicist and statistician. The speech contains several quotes related to my recent post on what a probability means.

Jaynes advocated the view of probability theory as logic extended to include reasoning on incomplete information. Probability need not have anything to do with randomness. Jaynes believed that frequency interpretations of probability are unnecessary and misleading.

… think of probability theory as extended logic, because then probability
distributions are justified in terms of their demonstrable information content, rather than their imagined — and as it now turns out, irrelevant — frequency connections.

He concludes with this summary of his approach to probability.

As soon as we recognize that probabilities do not describe reality — only our information about reality — the gates are wide open to the optimal solution of problems of reasoning from that information.