Suppose X is a beta(a, b) random variable and Y is a beta(c, d) random variable. Define the function
g(a, b, c, d) = Prob(X > Y).
At one point I spent a lot of time developing accurate and efficient ways to compute g under various circumstances. I did this because when I worked at MD Anderson Cancer Center we spent thousands of CPU-hours computing this function inside clinical trial simulations.
For this post I want to outline how to compute g in a minimal environment, i.e. a programming language with no math libraries, when the parameters a, b, c, and d are all integers. The emphasis is on portability, not efficiency. (If we had access to math libraries, we could solve the problem for general parameters using numerical integration.)
In this technical report I develop several symmetries and recurrence relationships for the function g. The symmetries can be summarized as
g(a, b, c, d) = g(d, c, b, a) = g(d, b, c, a) = 1 − g(c, d, a, b).
Without loss of generality we assume a is the smallest argument. If not, apply the symmetries above to make the first argument the smallest.
We will develop a recursive algorithm for computing the function g by recurring on a. The algorithm will terminate when a = 1, and so we want a to be the smallest parameter.
In the technical report cited above, I develop the recurrence relation
g(a + 1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a
h(a, b, c, d) = B(a + c, b + d) / B(a, b) B(c, d)
and B(x, y) is the beta function.
The equation above explains how to compute g(a + 1, b, c, d) after you’ve already computed g(a, b, c, d), which is what I often needed to do in applications. But we want to do the opposite to create a recursive algorithm:
g(a, b, c, d) = g(a – 1, b, c, d) + h(a – 1, b, c, d)/(a – 1)
for a > 1.
For the base case, when a = 1, we have
g(1, b, c, d) = B(c, b + d) / B(c, d).
All that’s left to develop our code is to evaluate the Beta function.
B(x, y) = Γ(x + y) / Γ(x) Γ(y)
Γ(x) = (x – 1)!.
So all we need is a way to compute the factorial, and that’s such a common task that it has become a cliché.
The symmetries and recurrence relation above hold for non-integer parameters, but you need a math library to compute the gamma function if the parameters are not integers. If you do have a math library, the algorithm here won’t be the most efficient one unless you need to evaluate g for a sequence of parameters that differ by small integers.
For more on random inequalities, here’s the first of a nine-part sequence of blog posts.