Babies and the beta-binomial distribution

About half of children are boys and half are girls, but that doesn’t mean that every couple is equally likely to have a boy or a girl each time they conceive a child. And evidence suggests that indeed the probability of conceiving a girl varies per couple.

I will simplify things for this post and look at a hypothetical situation abstracting away the complications of biology. This post fills in the technical details of a thread I posted on Twitter this morning.

Suppose the probability p that a couple will have a baby girl has a some distribution centered at 0.5 and symmetric about that point. Then half of all births on the planet will be girls, but that doesn’t mean that a particular couple is equally likely to have a boy or a girl.

How could you tell the difference empirically? You couldn’t if every family had one child. But suppose you studied all families with four children, for example. You’d expect 1 in 16 such families to have all boys, and 1 in 16 families to have all girls. If the proportions are higher than that, and they are, then that suggests that the distribution on p, the probability of a couple having a girl, is not constant.

Suppose the probability of a couple having girls has a beta(a, b) distribution. We would expect a and b to be approximately equal, since about half of babies are girls, and we’d expect a and b to be large, i.e. for the distribution be fairly concentrated around 1/2. For example, here’s a plot with ab = 100.

Then the probability distribution for the number of girls in a family of n children is given by a beta-binomial distribution with parameters n, a, and b. That is, the probability of x girls in a family of size n is given by

\text{Prob}(X = x) = \binom{n}{x} \frac{B(x + a, n - x + b)}{B(a, b)}

The mean of this distribution is na/(a+b). And so if a = b then the mean is n/2, half girls and half boys.

But the variance is more interesting. The variance is

\frac{nab(a + b + n)}{(a + b)^2(a + b +1)} = n \,\,\frac{a}{a + b} \,\,\frac{b}{a + b} \,\,\frac{a + b + n}{a + b + 1}

The variance of a binomial, corresponding to a constant p, is np(1-p). In the equation above, p corresponds to a/(a+b), and (1-p) corresponds to b/(a+b). And there’s an extra term,

\frac{a + b + n}{a + b + 1}

which is larger than 1 when n > 1. This says a beta binomial random variable always has more variance than the corresponding binomial distribution with the same mean.

Now suppose a family has had n children, with g girls and ng boys. Then the posterior predictive probability of a girl on the next birth is

\frac{a + g}{a + b + n}

If g = n/2 then this probability is 1/2. But if g > n/2 then the probability is greater than 1/2. And the smaller a and b are, the more the probability exceeds 1/2.

The binomial model is the limit of the beta-binomial model as a and b go to infinity (proportionately). In the limit, the probability above equals a/(a+b), independent of g and n.

Related posts

One thought on “Babies and the beta-binomial distribution

  1. From what perspective? The generic ∼51.3% male bias [1], WHO’s male to female ratio (105 to 100) [2], lower male-survivability, birth location[3], or the season[4] (which should climb assuming climate change)?

    [1] https colon //
    [2] https colon //
    [3] https colon //
    [4] https colon //

Leave a Reply

Your email address will not be published. Required fields are marked *