The beta-binomial model is the “hello world” example of Bayesian statistics. I would call it a toy model, except it is actually useful. It’s not nearly as complicated as most models used in application, but it illustrates the basics of Bayesian inference. Because it’s a conjugate model, the calculations work out trivially.
I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.
Here’s a little Python code for computing this. Enter the a and b parameters of the prior and the posterior to compute how much information was gained.
from scipy.integrate import quad from scipy.stats import beta as beta from scipy import log2 def infogain(post_a, post_b, prior_a, prior_b): p = beta(post_a, post_b).pdf q = beta(prior_a, prior_b).pdf (info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1) return info
This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine
quad needs some help when the beta distributions become more concentrated.
You can see that surprising input carries more information. For example, suppose your prior is beta(3, 7). This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.
print( infogain(4, 7, 3, 7) ) print( infogain(3, 8, 3, 7) )
The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.