# Random inequalities V: beta distributions

I’ve put a lot of effort into writing software for evaluating random inequality probabilities with beta distributions because such inequalities come up quite often in application. For example, beta inequalities are at the heart of the Thall-Simon method for monitoring single-arm trials and adaptively randomized trials with binary endpoints.

It’s not easy to evaluate P(X > Y) accurately and efficiently when X and Y are independent random variables. I’ve seen several attempts that were either inaccurate or slow, including a few attempts on my part. Efficiency is important because this calculation is often in the inner loop of a simulation study. Part of the difficulty is that the calculation depends on four parameters and no single algorithm will work well for all parameter combinations.

Let g(a, b, c, d) equal P(X > Y) where X ~ beta(a, b) and Y ~ beta(c, d). Then the function g has several symmetries.

• g(a, b, c, d) = 1 – g(c, d, a, b)
• g(a, b, c, d) = g(d, c, b, a)
• g(a, b, c, d) = g(d, b, c, a)

The first two relations were published by W. R. Thompson in 1933, but as far as I know the third relation first appeared in this technical report in 2003.

For special values of the parameters, the function g(a, b, c, d) can be computed in closed form. Some of these special cases are when

• one of the four parameters is an integer
• a + b + c + d = 1
• a + b = c + d = 1.

The function g(a, b, c, d) also satisfies several recurrence relations that make it possible to bootstrap the latter two special cases into more results. Define the beta function B(a, b) as Γ(a, b)/(Γ(a) Γ(b)) and define h(a, b, c, d) as B(a+c, b+d)/( B(a, b) B(c, d) ). Then the following recurrence relations hold.

• g(a+1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a
• g(a, b+1, c, d) = g(a, b, c, d) – h(a, b, c, d)/b
• g(a, b, c+1, d) = g(a, b, c, d) – h(a, b, c, d)/c
• g(a, b, c, d+1) = g(a, b, c, d) + h(a, b, c, d)/d Previous posts on random inequalities:

# Random inequalities IV: Cauchy distributions

Two weeks ago I wrote a series of posts on random inequalities: part I, part II, part III. In the process of writing these, I found an error in a tech report I wrote five years ago. I’ve posted a corrected version and describe the changes here.

Suppose X1 is a Cauchy random variable with median m1 and scale s1 and similarly for X2. Then X1X2 is a Cauchy random variable with median m1m2 and scale s1 + s2. Then P(X1 > X2) equals

P(X1X2 > 0) = P(m1m2  + (s1 + s2) C > 0)

where C is a Cauchy random variable with median 0 and scale 1.  This reduces to

P(C < (m1m2)/(s1 + s2)) = 1/2 + atan( (m1m2)/(s1 + s2) )/π.

The original version was missing the factor of 1/2. This is obviously wrong because it would say that P(X1 > X2) is negative when m1 < m2.

By the way, I was told in college that the Cauchy distribution is an impractical curiosity, something more useful for developing counterexamples than modeling real phenomena. That was an overstatement. Thick-tailed distributions like the Cauchy often arise in applications, sometimes directly (see Noise, The Black Swan) or indirectly (for example, robust or default prior distributions).

Update: See part V on beta distributions. # Random inequalities III: numerical results

The first post in this series introduced random inequalities. The second post discussed random inequalities can could be computed in closed form. This post considers random inequalities that must be evaluated numerically.

The simplest and most obvious approach to computing P(X > Y) is to generate random samples from X and Y and count how often the sample from X is bigger than the sample from Y. However, this approach is only accurate on average. In any particular sample, the error may be large. Even if you are willing to tolerate, for example, a 5% chance of being outside your error tolerance, the required number of random draws may be large. The more accuracy you need, the worse it gets since your accuracy only improves as n-1/2 where n is the number of samples. For numerical integration methods, the error decreases as a much larger negative power of n depending on the method. In the timing study near the end of this technical report, numerical integration was 2,875 times faster than simulation for determining beta inequality probabilities to just two decimal places. For greater desired accuracy, the advantage of numerical integration would increase.

Why is efficiency important? For some applications it may not be, but often these random inequalities are evaluated in the inner loop of a simulation. A simulation study that takes hours to run may be spending most of its time evaluating random inequalities.

As derived in the previous post, the integral to evaluate is given by The technical report Numerical computation of stochastic inequality probabilities gives some general considerations for numerically evaluating random inequalities, but each specific distribution family must be considered individually since each may present unique numerical challenges. The report gives some specific techniques for beta and Weibull distributions.  The report Numerical evaluation of gamma inequalities gives some techniques for gamma distributions.

Numerical integration can be complicated by singularities in integrand for extreme values of distribution parameters. For example, if X ~ beta(a, b) the integral for computing P(X > Y) has a singularity if either a or b are less than 1. For general advice on numerically integrating singular integrands, see “The care and treatment of singularities” in Numerical Methods that Work by Forman S. Acton.

Once you come up with a clever numerical algorithm for evaluating a random inequality, you need to have a way to test the results.

One way to test numerical algorithms for random inequalities is to compare simulation results to integration results. This will find blatant errors, but it may not be as effective in uncovering accuracy problems. It helps to use random parameters for testing.

Another way to test numerical algorithms in this context is to compute both P(X > Y) and P(Y > X). For continuous random variables, these two values should add to 1. Of course such a condition is not sufficient to guarantee accuracy, but it is a simple and effective test. The Inequality Calculator software reports both P(X > Y) and P(Y > X) partly for the convenience of the user but also as a form of quality control: if the two probabilities do not add up to approximately 1, you know something has gone wrong. Along these lines, it’s useful to verify that P(X > Y) = 1/2 if X and Y are identically distributed.

Finally, integrands may have closed-form solutions for special parameters. For example, Exact calculation of beta inequalities gives closed-form results for many special cases of the parameters. As long as the algorithm being tested does not depend on these special values, these special values provide valuable test cases. # Random inequalities II: analytical results

My previous post introduced random inequalities and their application to Bayesian clinical trials. This post will discuss how to evaluate random inequalities analytically. The next post in the series will discuss numerical evaluation when analytical evaluation is not possible.

For independent random variables X and Y, how would you compute P(X>Y), the probability that a sample from X will be larger than a sample from Y? Let fX be the probability density function (PDF) of X and let FX be the cumulative distribution function (CDF) of X. Define fY and FY similarly. Then the probability P(X > Y) is the integral of fX(x) fY(y) over the part of the xy plane below the diagonal line x = y. This result makes intuitive sense: fX(x) is the density for x and FY(x)  is the probability that Y is less than x. Sometimes this integral can be evaluated analytically, though in general it must be evaluated numerically. The technical report Numerical computation of stochastic inequality probabilities explains how P(X > Y) can be computed in closed form for several common distribution families as well as how to evaluate inequalities involving other distributions numerically.

Exponential: If X and Y are exponential random variables with mean μX and μY respectively, then P(X > Y) = μX/(μX + μY).

Normal: If X and Y are normal random variables with mean and standard deviation (μX, σX) and (μY, σY) respectively, then P(X > Y) = Φ((μX – μY)/√(σX2 + σY2)) where Φ is the CDF of a standard normal distribution.

Gamma:  If X and Y are gamma random variables with shape and scale (αX, βX) and (αY, βY) respectively, then P(X > Y) = IxX/(βX + βY)) where Ix is the incomplete beta function with parameters αY and αX, i.e. the CDF of a beta distribution with parameters αY and αX.

The inequality P(X > Y) where X and Y are beta random variables comes up very often in applications. This inequality cannot be computed in closed form in general, though there are closed-form solutions for special values of the beta parameters. If X ~ beta(a, b) and Y ~ beta(c, d), the probability P(X > Y) can be evaluated in closed form if

1. one of the parameters a, b, c, or d is an integer,
2. a + b + c + d = 1, or
3. a + b = c + d = 1.

These last two cases can be combined with a recurrence relation to compute other probabilities. See Exact calculation of beta inequalities for more details.

Sometimes you need to calculate P(X > max(Y, Z)) for three independent random variables. This comes up, for example, when computing adaptive randomization probabilities for a three-arm clinical trial. For a time-to-event trial as implemented here, the random variables have a gamma distribution. See Numerical evaluation of gamma inequalities for analytical as well as numerical results for computing P(X > max(Y, Z)) in that case.

The next post in this series will discuss how to evaluate random inequalities numerically when closed-form integration is not possible.

Update: See Part IV of this series for results with the Cauchy distribution. # Random inequalities I: introduction

Many Bayesian clinical trial methods have at their core a random inequality. Some examples from M. D. Anderson: adaptive randomization, binary safety monitoring, time-to-event safety monitoring. These method depends critically on evaluating P(X > Y) where X and Y are independent random variables. Roughly speaking, P(X > Y) is the probability that the treatment represented by X is better than the treatment represented by Y. In a trial with binary outcomes, X and Y may be the posterior probabilities of response on each treatment. In a trial with time-to-event outcomes, X and Y may be posterior probabilities of median survival time on two treatments.

People often have a little difficulty understanding what P(X > Y) means. What does it mean? If we take a sample from X and a random sample from Y, P(X >Y) is the probability that the former is larger than the latter. Most confusion around random inequalities comes from thinking of random variables as constants, not random quantities. Here are a couple examples.

First, suppose X and Y have normal distributions with standard deviation 1. If X has mean 4 and Y has mean 3, what is P(X > Y)? Some would say 1, because X is bigger than Y. But that’s not true. X has a larger mean than Y, but fairly often a sample from Y will be larger than a sample from XP(X > Y) = 0.76 in this case.

Next, suppose X and Y are identically distributed. Now what is P(X > Y)? I’ve heard people say zero because the two random variables are equal. But they’re not equal. Their distribution functions are equal but the two random variables are independent. P(X > Y) = 1/2 by symmetry.

I believe there’s a psychological tendency to underestimate large inequality probabilities. (I’ve had several discussions with people who would not believe a reported inequality probability until they computed it themselves. These discussions are important when the decision whether to continue a clinical trial hinges on the result.) For example, suppose X and Y represent the probability of success in a trial in which there were 17 successes out of 30 on X and 12 successes out of 30 on Y. Using a beta distribution model, the density functions of X and Y are given below. The density function for X is essentially the same as Y but shifted to the right. Clearly P(X > Y) is greater than 1/2. But how much greater than a half? You might think not too much since there’s a lot of mass in the overlap of the two densities. But P(X > Y) is a little more than 0.9.

The image above and the numerical results mentioned in this post were produced by the Inequality Calculator software.

Part II will discuss analytically evaluating random inequalities. Part III will discuss numerically evaluating random inequalities. # How to calculate binomial probabilities

Suppose you’re doing something that has probability of success p and probability of failure q = 1-p. If you repeat what you’re doing m+n times, the probability of m successes and n failures is given by Now suppose m and n are moderately large. The terms (m+n)! and m! n! will be huge, but the terms pm and qn will be tiny. The huge terms might overflow, and the tiny terms might underflow, even though the final result may be a moderate-sized number. The numbers m and n don’t have to be that large for this to happen since factorials grow very quickly. On a typical system, overflow happens if m+n > 170. How do you get to the end result without having to deal with impossibly large or small values along the way?

The trick is to work with logs. You want to first calculate log( (m+n)! ) – log( m! ) – log( n! ) + m log( p ) + n log( q ) , then exponentiate the result. This pattern comes up constantly in statistical computing.

Libraries don’t always include functions for evaluating factorial or log factorial. They’re more likely to include functions for Γ(x) and its log. For positive integers n, Γ(n+1) = n!. Now suppose you have a function lgamma that computes log Γ(x). You might write something like this.

    double probability(double p, double q, int m, int n)
{
double temp = lgamma(m + n + 1.0);
temp -=  lgamma(n + 1.0) + lgamma(m + 1.0);
temp += m*log(p) + n*log(q);
return exp(temp);
}

The function lgamma is not part of the ANSI standard library for C or C++, but it is part of the POSIX standard. On Unix-like systems, lgamma is included in the standard library. However, Microsoft does not include lgamma as part of the Visual C++ standard library. On Windows you have to either implement your own lgamma function or grab an implementation from somewhere like Boost.

Here’s something to watch out for with POSIX math libraries. I believe the POSIX standard does not require a function called gamma for evaluating the gamma function Γ(x). Instead, the standard requires functions lgamma for the log of the gamma function and tgamma for the gamma function itself. (The mnemonic is “t” for “true,” meaning that you really want the gamma function.) I wrote a little code that called gamma and tested it on OS X and Red Hat Linux this evening. In both cases gcc compiled the code without warning, even with the -Wall and -pedantic warning flags. But on the Mac, gamma was interpreted as tgamma and on the Linux box it was interpreted as lgamma. This means that gamma(10.0) would equal 362880 on the former and 12.8018 on the latter.

If you don’t have access to an implementation of log gamma, see How to compute log factorial. 