Beta inequalities and cross ratios

Posted on 28 June 2023 by John

When I worked at MD Anderson Cancer Center, we spent a lot of compute cycles evaluating the function g(a, b, c, d), defined as the probability that a sample from a beta(a, b) random variable is larger than a sample from a beta(c, d) random variable. This function was often in the inner loop of simulations that ran for hours or even days.

I developed ways to evaluate this function more efficiently because it was a bottleneck. Along the way I found a new symmetry. W. R. Thompson had studied what I call the function g back in 1933 and reported two symmetries:

g(a, b, c, d) = 1 − g(c, d, a, b)

and

g(a, b, c, d) = g(d, c, b, a).

I found that

g(a, b, c, d) = g(d, b, c, a)

as well. See a proof here.

You can conclude from these rules that

g(a, b, c, d) = g(d, c, b, a) = g(d, b, c, a) = g(a, c, b, d)
g(a, b, c, d) = 1 − g(c, d, a, b)

I was just looking at a book that mentioned the symmetries of the cross ratio which I will denote

r(a, b, c, d) = (a − c)(b − d) / (b − c)(a −d).

Here is Theorem 4.2 from [1] written in my notation.

Let a, b, c, d be four points on a projective line with cross ratio r(a, b, c, d) = λ. Then we have

1. r(a, b, c, d) = r(b, a, d, c) = r(c, d, a, b) = r(d, c, b, a).
2. r(a, b, d, c) = 1/λ
3. r(a, c, b, d) = 1 − λ
4. the values for the remaining permutations are consequences of these three basic rules.

This looks awfully familiar. Rules 1 and 3 for cross ratios correspond to rules 1 and 2 for beta inequalities, though not in the same order. Both g and r are invariant under reversing their arguments, but are otherwise invariant under different permutations of the arguments.

Both g and r take on 6 distinct values, taking on each 4 times. I feel like there is some deeper connection here but I can’t see it. Maybe I’ll come back to this later when I have the time to explore it. If you see something, please leave a comment.

There is no rule for beta inequalities analogous to rule 2 for cross ratios, at least not that I know of. I don’t know of any connection between g(a, b, c, d) and g(a, b, d, c).

Update: There cannot be a function h such that g(a, b, d, c) is a function of g(a, b, c, d) alone because I found parameters that lead to the same value of the latter but different values of the former. If there is a relation between g(a, b, c, d) and g(a, b, d, c) and it must involve the parameters and not just the value of g.

[1] Jürgen Richter-Gebert. Perspectives on Projective Geometry: A Guided Tour Through Real and Complex Geometry. Springer 2011.

Beta inequalities with integer parameters

Posted on 16 September 2021 by John

Suppose X is a beta(a, b) random variable and Y is a beta(c, d) random variable. Define the function

g(a, b, c, d) = Prob(X > Y).

At one point I spent a lot of time developing accurate and efficient ways to compute g under various circumstances. I did this because when I worked at MD Anderson Cancer Center we spent thousands of CPU-hours computing this function inside clinical trial simulations.

For this post I want to outline how to compute g in a minimal environment, i.e. a programming language with no math libraries, when the parameters a, b, c, and d are all integers. The emphasis is on portability, not efficiency. (If we had access to math libraries, we could solve the problem for general parameters using numerical integration.)

In this technical report I develop several symmetries and recurrence relationships for the function g. The symmetries can be summarized as

g(a, b, c, d) = g(d, c, b, a) = g(d, b, c, a) = 1 − g(c, d, a, b).

Without loss of generality we assume a is the smallest argument. If not, apply the symmetries above to make the first argument the smallest.

We will develop a recursive algorithm for computing the function g by recurring on a. The algorithm will terminate when a = 1, and so we want a to be the smallest parameter.

In the technical report cited above, I develop the recurrence relation

g(a + 1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a

where

h(a, b, c, d) = B(a + c, b + d) / B(a, b) B(c, d)

and B(x, y) is the beta function.

The equation above explains how to compute g(a + 1, b, c, d) after you’ve already computed g(a, b, c, d), which is what I often needed to do in applications. But we want to do the opposite to create a recursive algorithm:

g(a, b, c, d) = g(a – 1, b, c, d) + h(a − 1, b, c, d)/(a − 1)

for a > 1.

For the base case, when a = 1, we have

g(1, b, c, d) = B(c, b + d) / B(c, d).

All that’s left to develop our code is to evaluate the Beta function.

Now

B(x, y) = Γ(x + y) / Γ(x) Γ(y)

and

Γ(x) = (x − 1)!.

So all we need is a way to compute the factorial, and that’s such a common task that it has become a cliché.

The symmetries and recurrence relation above hold for non-integer parameters, but you need a math library to compute the gamma function if the parameters are not integers. If you do have a math library, the algorithm here won’t be the most efficient one unless you need to evaluate g for a sequence of parameters that differ by small integers.

***

For more on random inequalities, here’s the first of a nine-part sequence of blog posts.

Beta inequalities in R

Posted on 11 October 2012 by John

Someone asked me yesterday for R code to compute the probability P(X > Y + δ) where X and Y are independent beta random variables. I’m posting the solution here in case it benefits anyone else.

For an example of why you might want to compute this probability, see A Bayesian view of Amazon resellers.

Let X be a Beta(a, b) random variable and Y be a Beta(c, d) random variable. Denote PDFs by f and CDFs by F. Then the probability we need is

$P(X > Y + \delta) &=& \int_\delta^1 \int_0^{x-\delta} f_X(x) \, f_Y(y)\, dy,dx \ &=& \int_\delta^1 f_X(x)\, F_Y(x-\delta) \, dx$

If you just need to compute this probability a few times, here is a desktop application to compute random inequalities.

But if you need to do this computation repeated inside R code, you could use the following.

beta.ineq <- function(a, b, c, d, delta)
{
    integrand <- function(x) { dbeta(x, a, b)*pbeta(x-delta, c, d) }
    integrate(integrand, delta, 1, rel.tol=1e-4)$value
}

The code is as good or as bad as R’s integrate function. It’s probably accurate enough as long as none of the parameters a, b, c, or d are near zero. When one or more of these parameters is small, the integral is harder to compute numerically.

There is no error checking in the code above. A more robust version would verify that all parameters are positive and that delta is less than 1.

Here’s the solution to the corresponding problem for gamma random variables, provided delta is zero: A support one-liner.

And here is a series of blog posts on random inequalities.

Random inequalities IX: new tech report

Posted on 20 November 2009 by John

Just posted: Exact calculation of inequality probabilities. This report summarizes previous results for calculating P(X > Y) where X and Y are random variables.

Previous posts on random inequalities:

Random inequalities VIII: folded normals

Posted on 13 July 2009 by John

Someone who ran into my previous posts on random inequalities asked me how to compute random inequalities for folded normals. (A folded normal random variable is the absolute value of a normal random variable.) So the question is how to compute

Pr(|X| > |Y|)

where X and Y are normally distributed. Here’s my reply as a short tech report: Inequality probabilities for folded normal random variables.

Previous posts in this series:

Introduction
Analytical results
Numerical results
Cauchy distributions
Beta distributions
Gamma distributions
Three or more random variables

Random inequalities VII: three or more variables

Posted on 6 September 2008 by John

The previous posts in this series have looked at P(X > Y), the probability that a sample from a random variable X is greater than a sample from an independent random variable Y. In applications, X and Y have different distributions but come from the same distribution family.

Sometimes applications require computing P(X > max(Y, Z)). For example, an adaptively randomized trial of three treatments may be designed to assign a treatment with probability equal to the probability that that treatment has the best response. In a trial with a binary outcome, the variables X, Y, and Z may be beta random variables representing the probability of response. In a trial with a time-to-event outcome, the variables might be gamma random variables representing survival time.

Sometimes we’re interested in the opposite inequality, P(X < min(Y,Z)). This would be the case if we thought in terms of failures rather than responses, or wanted to minimize the time to a desirable event rather than maximizing the time to an undesirable event.

The maximum and minimum inequalities are related by the following equation:

P(X < min(Y, Z)) = P(X > max(Y, Z)) + 1 − P(X > Y) − P(X > Z).

These inequalities are used for safety monitoring rules as well as to determine randomization probabilities. In a trial seeking to maximize responses, a treatment arm X might be dropped if P(X > max(Y,Z)) becomes too small.

In principle one could design an adaptively randomized trial with n treatment arms for any n ≥ 2 based on

P(X₁ > max(X₂, …, X_n)).

In practice, the most common value of n by far is 2. Sometimes n is 3. I’m not familiar with an adaptively randomized trial with more than three arms. I’ve heard of an adaptively randomized trial that was designed with five arms, but I don’t believe the trial ran.

Computing P(X₁ > max(X₂, …, X_n)) by numerical integration becomes more difficult as n increases. For large n, simulation may be more efficient than integration. Computing P(X₁ > max(X₂, …, X_n)) for gamma random variables with n=3 was unacceptably slow in a previous version of our adaptive randomization software. The search for a faster algorithm lead to this paper: Numerical Evaluation of Gamma Inequalities.

Previous posts on random inequalities:

Random inequalities VI: gamma distributions

Posted on 30 August 2008 by John

This post looks at computing P(X > Y) where X and Y are gamma random variables. These inequalities are central to the Thall-Wooten method of monitoring single-arm clinical trials with time-to-event outcomes. They also are central to adaptively randomized clinical trials with time-to-event outcomes.

When X and Y are gamma random variables P(X > Y) can be computed in terms of the incomplete beta function. Suppose X has shape α_X and scale β_X and Y has shape α_Y and scale β_Y. Then β_XY/(β_X Y+ β_YX) has a beta(α_Y, α_X) distribution. (This result is well-known in the special case of the scale parameters both equal to 1. I wrote up the more general result here but I don’t imagine I was the first to stumble on the generalization.) It follows that

P(X < Y) = P(B < β_X/(β_X+ β_Y))

where B is a beta(α_Y, α_X) random variable.

For more details, see Numerical evaluation of gamma inequalities.

Previous posts on random inequalities:

Random inequalities V: beta distributions

Posted on 21 August 2008 by John

I’ve put a lot of effort into writing software for evaluating random inequality probabilities with beta distributions because such inequalities come up quite often in application. For example, beta inequalities are at the heart of the Thall-Simon method for monitoring single-arm trials and adaptively randomized trials with binary endpoints.

It’s not easy to evaluate P(X > Y) accurately and efficiently when X and Y are independent random variables. I’ve seen several attempts that were either inaccurate or slow, including a few attempts on my part. Efficiency is important because this calculation is often in the inner loop of a simulation study. Part of the difficulty is that the calculation depends on four parameters and no single algorithm will work well for all parameter combinations.

Let g(a, b, c, d) equal P(X > Y) where X ~ beta(a, b) and Y ~ beta(c, d). Then the function g has several symmetries.

g(a, b, c, d) = 1 − g(c, d, a, b)
g(a, b, c, d) = g(d, c, b, a)
g(a, b, c, d) = g(d, b, c, a)

The first two relations were published by W. R. Thompson in 1933, but as far as I know the third relation first appeared in this technical report in 2003.

For special values of the parameters, the function g(a, b, c, d) can be computed in closed form. Some of these special cases are when

one of the four parameters is an integer
a + b + c + d = 1
a + b = c + d = 1.

The function g(a, b, c, d) also satisfies several recurrence relations that make it possible to bootstrap the latter two special cases into more results. Define the beta function

B(a, b) = Γ(a, b)/(Γ(a) Γ(b))

as usual and define

h(a, b, c, d) as B(a+c, b+d)/( B(a, b) B(c, d) ).

Then the following recurrence relations hold.

g(a+1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a
g(a, b+1, c, d) = g(a, b, c, d) – h(a, b, c, d)/b
g(a, b, c+1, d) = g(a, b, c, d) – h(a, b, c, d)/c
g(a, b, c, d+1) = g(a, b, c, d) + h(a, b, c, d)/d

For more information about beta inequalities, see these papers:

Previous posts on random inequalities:

Random inequalities IV: Cauchy distributions

Posted on 9 August 2008 by John

Two weeks ago I wrote a series of posts on random inequalities: part I, part II, part III. In the process of writing these, I found an error in a tech report I wrote five years ago. I’ve posted a corrected version and describe the changes here.

Suppose X₁ is a Cauchy random variable with median m₁ and scale s₁ and similarly for X₂. Then X₁ − X₂ is a Cauchy random variable with median m₁ – m₂ and scale s₁ + s₂. Then P(X₁ > X₂) equals

P(X₁ − X₂ > 0) = P(m₁ − m₂ + (s₁ + s₂) C > 0)

where C is a Cauchy random variable with median 0 and scale 1. This reduces to

P(C < (m₁ − m₂)/(s₁ + s₂)) = 1/2 + atan( (m₁ − m₂)/(s₁ + s₂) )/π.

The original version was missing the factor of 1/2. This is obviously wrong because it would say that P(X₁ > X₂) is negative when m₁ < m₂.

By the way, I was told in college that the Cauchy distribution is an impractical curiosity, something more useful for developing counterexamples than modeling real phenomena. That was an overstatement. Thick-tailed distributions like the Cauchy often arise in applications, sometimes directly (see Noise, The Black Swan) or indirectly (for example, robust or default prior distributions).

Update: See part V on beta distributions.

Random inequalities III: numerical results

Posted on 26 July 2008 by John

The first post in this series introduced random inequalities. The second post discussed random inequalities can could be computed in closed form. This post considers random inequalities that must be evaluated numerically.

The simplest and most obvious approach to computing P(X > Y) is to generate random samples from X and Y and count how often the sample from X is bigger than the sample from Y. However, this approach is only accurate on average. In any particular sample, the error may be large. Even if you are willing to tolerate, for example, a 5% chance of being outside your error tolerance, the required number of random draws may be large. The more accuracy you need, the worse it gets since your accuracy only improves as n^−1/2 where n is the number of samples. For numerical integration methods, the error decreases as a much larger negative power of n depending on the method. In the timing study near the end of this technical report, numerical integration was 2,875 times faster than simulation for determining beta inequality probabilities to just two decimal places. For greater desired accuracy, the advantage of numerical integration would increase.

Why is efficiency important? For some applications it may not be, but often these random inequalities are evaluated in the inner loop of a simulation. A simulation study that takes hours to run may be spending most of its time evaluating random inequalities.

As derived in the previous post, the integral to evaluate is given by

$\begin{eqnarray*} P(X \geq Y) &=& \int \!\int _{[x > y]} f_X(x) f_Y(y) \, dy \, dx \\ &=& \int_{-\infty}^\infty \! \int_{-\infty}^x f_X(x) f_Y(y) \, dy \, dx \\ &=& \int_{-\infty}^\infty f_X(x) F_Y(x) \, dx end{eqnarray*}$

The technical report Numerical computation of stochastic inequality probabilities gives some general considerations for numerically evaluating random inequalities, but each specific distribution family must be considered individually since each may present unique numerical challenges. The report gives some specific techniques for beta and Weibull distributions. The report Numerical evaluation of gamma inequalities gives some techniques for gamma distributions.

Numerical integration can be complicated by singularities in integrand for extreme values of distribution parameters. For example, if X ~ beta(a, b) the integral for computing P(X > Y) has a singularity if either a or b are less than 1. For general advice on numerically integrating singular integrands, see “The care and treatment of singularities” in Numerical Methods that Work by Forman S. Acton.

Once you come up with a clever numerical algorithm for evaluating a random inequality, you need to have a way to test the results.

One way to test numerical algorithms for random inequalities is to compare simulation results to integration results. This will find blatant errors, but it may not be as effective in uncovering accuracy problems. It helps to use random parameters for testing.

Another way to test numerical algorithms in this context is to compute both P(X > Y) and P(Y > X). For continuous random variables, these two values should add to 1. Of course such a condition is not sufficient to guarantee accuracy, but it is a simple and effective test. The Inequality Calculator software reports both P(X > Y) and P(Y > X) partly for the convenience of the user but also as a form of quality control: if the two probabilities do not add up to approximately 1, you know something has gone wrong. Along these lines, it’s useful to verify that P(X > Y) = 1/2 if X and Y are identically distributed.

Finally, integrands may have closed-form solutions for special parameters. For example, Exact calculation of beta inequalities gives closed-form results for many special cases of the parameters. As long as the algorithm being tested does not depend on these special values, these special values provide valuable test cases.