Integration trick

Here’s a clever example from Paul Nahin’s new book Inside Interesting Integrals. Suppose you want to evaluate

\int_{-1}^1 \frac{\cos(x)}{\exp(1/x) + 1}\,dx

Since the range of integration is symmetric around zero, you might think to see whether the integrand is an odd function, in which case the integral would be zero. (More on such symmetry tricks here.) Unfortunately, the integrand is not odd, so that trick doesn’t work directly. However, it does help indirectly.

You can split any function f(x) into its even and odd parts.

f_e(x) = \frac{f(x) + f(-x)}{2} \\ f_o(x) = \frac{f(x) - f(-x)}{2}

The integral of a function over a symmetric interval is the integral of its even part because its odd part integrates to zero. The even part of the integrand above works out to be simply cos(x)/2 and so the integral evaluates to sin(1).

How Lp spaces nest

A function f(x) is said to be in Lp if the integral of |f(x)|p is finite. So if you know f is in Lp for some value of p, can conclude that f is also in Lq for some q? In general, no. But for two important special cases, yes. The special cases have to do with the domain of f.

If f is a function on a finite measure space and p > q, then every function in Lp is in Lq. In particular probability spaces are finite measure spaces, and so if a random variable’s pth moment exists, the qth moment exists.

If the domain of f is the integers with counting measure, f is a sequence and the “integral” of |f(x)|p is simply a sum. Now the relationship between Lp spaces is reversed: p > q, every sequence in Lq is in Lp.

There are two reasons a function could fail to be in Lp: singularities and fat tails. In a finite measure space, there are no tails; you only have to worry about singularities. On the integers, there are no singularities; you only have to worry about tails.

Now consider functions on the whole real line. Membership in Lp says nothing about membership in Lq because now you can have singularities and fat tails.

Avoiding underflow in Bayesian computations

Here’s a common problem that arises in Bayesian computation. Everything works just fine until you have more data than you’ve seen before. Then suddenly you start getting infinite, NaN, or otherwise strange results. This post explains what might be wrong and how to fix it.

A posterior density is (proportional to) a likelihood function times a prior distribution. The likelihood function is a product. The number of data points is the number of terms in the product. If these numbers are less than 1, and you multiply enough of them together, the result will be too small to represent in a floating point number and your calculation will underflow to zero. Then subsequent operations with this number, such as dividing it by another number that has also underflowed to zero, may produce an infinite or NaN result.

The instinctive reaction regarding underflow or overflow is to use logs. And often that works. If you wanted to know where the maximum of the posterior density occurs, you could find the maximum of the logarithm of the posterior density. But in Bayesian computations you often need to integrate the posterior density times some functions. Now you cannot just work with logs because logs and integrals don’t commute.

One way around the problem is to multiply the integrand by a constant so large that there is no danger of underflow. Multiplying by constants does commute with integration.

So suppose your integrand is on the order of 10^-400, far below the smallest representable number. Do you need extended precision arithmetic? No, you just need to understand your problem.

If you multiply your integrand by 10^400 before integrating, then your integrand is roughly 1 in magnitude. Then do you integration, and remember that the result is 10^400 times the actual value.

You could take the following steps.

  1. Find m, the maximum of the log of the integrand.
  2. Let I be the integral of exp( log of the integrand – m ).
  3. Keep track that your actual integral is exp(m) I, or that its log is m + log I.

Note that you can be extremely sloppy in this process. You don’t need an accurate estimate of the maximum of the integrand per se. If you’re within a few dozen orders of magnitude, for example, that could be sufficient to carry out your integration without underflow.

One way to estimate the maximum is to use a frequentist estimator of your parameters as an approximate MLE, and assume that this is approximately where your posterior density takes on its maximum value. This might actually be very accurate, but it doesn’t need to be.

Note also that it’s OK if some of the evaluations of your integrand underflow to zero. You just don’t want the entire integral to underflow to zero. Places where the integrand is many orders of magnitude less than its maximum don’t contribute to the integration anyway. (It’s important that your integration pays attention to the region where the integrand is largest. A naive integration could entirely miss the most important region and completely underestimate the integral. But that’s a matter for another blog post.)


Consulting in Bayesian analysis

Fractional integration

Define the integration operator I by

(I,f)(x) = int_a^x f(t), dt

so I f is an antiderivative of f.

Define the second antiderivative I2 by applying I to f twice:

(I_2, f)(x) = int_a^x int_a^t f(t_1), dt_1, dt

It turns out that

(I_2, f)(x) = int_a^x (x - t) , f(t), dt

To see this, notice that both expressions for I2 are equal when x = a, and they have the same derivative, so they must be equal everywhere.

Now define I3 by applying I to I2 and so forth defining In for all positive integers n. Then one can show that the nth antiderivative is given by

(I_n, f)(x) = frac{1}{Gamma(n)} int_a^x (x - t)^{n-1} , f(t), dt

The right side of this equation makes sense for non-integer values of n, and so we can take it as a definition of fractional integrals when n is not an integer.

Related post: Fractional derivatives

Numerical integration trick

Suppose you want to evaluate the following integral:

int_{30}^infty \exp\left( -frac{100}{x+6} \right) , frac{1}{x^3} , \dx

We’d like to do a change of variables to make the range of integration finite, and we’d like the transformed integral to be easy to evaluate numerically.

The change of variables t = 1/x2 transforms the range of integration from (30, ∞) to (0, 1/900). I’ll explain where this choice came from and why it results in a nice function to integrate.

The integrand is essentially 1/x3 for large x because the exponential term approaches 1. If the integrand were exactly 1/x3 then the change of variables t = 1/x2 would make the integrand constant. So we’d expect this change of variables to give us an integrand that behaves well. In particular, the integral should be bounded at 0.

(If you’re not careful, a change of variables may just swap one kind of improper integral for another. That is, you can make the domain of integration finite, but at the cost of introducing a singularity in the transformed integrand. However, we don’t have that problem here because our change of variables was just right.)

In general, if you need to integrate a function that behaves like 1/xn as x goes to infinity, try the change of variables t = 1/xn-1.

Related post: The care and treatment of singularities

SciPy integration misunderstanding

Today I needed to compute an integral similar to this:

int_{1000}^infty frac{dx}{100, x^3}

I used the following SciPy code to compute the integral:

from scipy.integrate import quad

def f(x):
    return 0.01*x**-3

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6)
print integral, error

My intention was to compute the integral to 6 significant figures. (epsrel is a shortened form of epsilon relative, i.e. relative error.) To my surprise, the estimated error was larger than the value of the integral. Specifically, the integral was computed as 5.15 × 10-9 and the error estimate was 9.07 × 10-9.

What went wrong? The integration routine quad lets you specify either a desired bound on your absolute error (epsabs) or a desired bound on your relative error (epsrel). I assumed that since I specified the relative error, the integration would stop when the relative error requirement was met. But that’s not how it works.

The quad function has default values for both epsabs and epsrel.

def quad(... epsabs=1.49e-8, epsrel=1.49e-8, ...):

I thought that since I did not specify an absolute error bound, the bound was not effective, or equivalently, that the absolute error target was 0. But no! It was as if I’d set the absolute error bound to 1.49 × 10-8. Because my integral is small (the exact value is 5 × 10-9) the absolute error requirement is satisfied before the relative error requirement and so the integration stops too soon.

The solution is to specify an absolute error target of zero. This condition cannot be satisfied, and so the relative error target will determine when the integration stops.

integral, error = quad(f, 1000, sp.inf, epsrel = 1e-6, epsabs = 0)

This correctly computes the integral as 5 × 10-9 and estimates the integration error as 4 ×10-18.

It makes some sense for quad to specify non-zero default values for both absolute and relative error, though I imagine most users expect small relative error rather than small absolute error, so perhaps the latter could be set to 0 by default.

Care and treatment of singularities

My favorite numerical analysis book is Numerical Methods that Work. In the hardcover version of the book, the title was impressed in the cover and colored in silver letters. Before the last word of the title, there was another word impressed but not colored in. You had to look at the cover carefully to see that it actually says “Numerical methods that usually work.”

First published in 1970, the book’s specific references to computing power are amusingly dated. But book’s advice is timeless.

The chapter entitled “The care and treatment of singularities” gives several approaches for integrating functions that have a singularity. This post will elaborate on the simplest example from that chapter.

Suppose you want to compute the following integral.

int_0^1 frac{dx}{sqrt{sin(x)}}

The integrand is infinite at 0. There are a couple common hacks to get around this problem, and neither works well. One is simply to re-define the integrand so that it has some finite value at 0. That keeps the integration program from complaining about a division by zero, but it doesn’t help evaluate the integral accurately.

Another hack is to change the lower limit of integration from 0 to something a little larger. OK, but how much larger? And even if you replace 0 with something small, you still have the problem that your integrand is nearly singular near 0. Simple integration routines assume integrands are polynomial-like, and this integrand is not polynomial-like near zero.

The way to compute this integral is to subtract off the singularity. Since sin(x) is asymptotically equal to x as x goes to 0, √sin(x) is asymptotically √x. So if we subtract 1/√x, we’re left with a bounded integrand, and one that is considerably more polynomial-like than the one we started with. We then integrate by hand what we subtracted off. That is, we replace our original integral with the following pair of integrals.

int_0^1 left( frac{1}{sqrt{sin(x)}} - frac{1}{sqrt{x}} right) ,dx + int_0^1 frac{dx}{sqrt{x}}

Compute the first numerically and the second analytically. Simpson’s rule with intervals of size 1/64 gives 0.03480535 for the first integral, which is correct to seven decimal places. The second integral is simply 2, so the total is 2.03480535.

Now lets look back at how our two hacks would have worked. Suppose we define our integrand f(x) to be 0 at x = 0. Then applying Simpson’s rule with step size 1/64 would give an integral of 3.8775, which is completely wrong. Bringing the step size down to 2^-20 makes things worse: the integral would then be 4.036. What if instead of defining the integrand to be 0 at 0, we define it to be some large value, say 10^6? In that case Simpson’s rule with step size 1/64 returns 5212.2 as the integral. Lowering the step size to 2^-20 improves the integral to 4.351, but still every figure is wrong since the integral is approximately 2.

What if we’d replaced 0 with 0.000001 and used Simpson’s rule on the original integral? That way we’d avoid the problem of having to define the integrand at 0. Then we’d have two sources of error. First the error from not integrating from 0 to 0.000001. That error is 0.002, which is perhaps larger than you’d expect: the change to the integral is 2000x larger than the change to the lower limit of integration. But this problem is insignificant compared to the next.

The more serious problem is applying Simpson’s rule to the integral even if we avoid 0 as the lower limit of integration. We still have a vertical asymptote, even if our limits of integration avoid the very singularity, and no polynomial ever had a vertical asymptote. With such decidedly non-polynomial behavior, we’d expect Simpsons rule and its ilk to perform poorly.

Indeed that’s what happens. With step size 1/64, Simpson’s rule gives a value of 9.086 for the integral, which is completely wrong since we know the integral is roughly 2.0348. Even with step size 2^-20, i.e. over a million function evaluations, Simpson’s rule gives 4.0328 for the integral.

Incidentally, if we had simply approximated our integrand 1/√sin(x) with 1/√x, we would have estimated the integral as 2.0 and been correct to two significant figures, while Simpson’s rule applied naively gets no correct figures. On the other hand, Simpson’s rule applied cleverly (i.e. by subtracting off the singularity) gives eight correct figures with little effort. As I concluded here, a crude method cleverly applied beats a clever technique crudely applied.

Related posts:

Saved by symmetry

When I solve a problem by appealing to symmetry, students’ jaws drop. They look at me as if I’d pulled a rabbit out of a hat.

I used think of these tricks as common knowledge, but now I think they’re common knowledge in some circles (e.g. physics) and not as common in others. These tricks are simple, but not as many people as I’d thought have been trained to spot opportunities to apply them.

Here’s an example.

Pop quiz 1: Evaluate the following intimidating integral.

int_{-infty}^infty x log(1 + x^2) e^{-x^2}, dx

Solution: Zero, by symmetry, because the integrand is odd.

The integrand is an odd function (i.e. f(-x) = -f(x)), and the integrand of an odd function over a symmetric interval is zero. This is because the region below the x-axis is symmetric to the region above the x-axis as the following graph shows.

plot of x log(1 + x^2) exp(-x^2)

The example above is not artificial: similar calculations come up constantly in applications. For example, the Fourier series of an odd function contains only sine terms, no cosine terms. Why? The integrals to compute the Fourier coefficients for the cosine terms all involve odd functions integrated over an interval symmetric about zero.

Another common application of symmetry is evaluating derivatives. The derivative of an odd function is an even function and vice versa.

Pop quiz 2: What is the coefficient of x5 in the Taylor series of cos(1 + x2)?

Solution: Zero, by symmetry, because cos(1 + x2) is an even function.

Odd functions of x have only odd powers of x in their Taylor series and even functions have only even powers of x in their Taylor series. Why? Because the coefficients come from derivatives evaluated at zero.

If f(x) is an odd function, all of its even derivatives are odd functions. These derivatives are zero at x = 0, and so all the coefficients of even powers of x are zero. A similar argument shows that even functions have only even powers of x in their Taylor series.

Symmetry tricks are obvious in hindsight. The hard part is learning to recognize when they apply. Symmetries are harder to recognize, but also more valuable, in complex situations. The key is to think about the problem you’re trying to solve before you dive into heads-down calculation.

Related posts:

Three surprises with the trapezoid rule

The trapezoid rule is a very simple method for estimating integrals. The idea is to approximate the area under a curve by a bunch of thin trapezoids and add up the areas of the trapezoids as suggested in the image below.

This is an old idea, probably older than the formal definition of an integral. In general it gives a crude estimation of the integral. If the width of the trapezoids is h, the error in using the trapezoid rule is roughly proportional to h2. It’s easier to do better. For example, Simpson’s rule is a minor variation on the trapezoid rule that has error proportional to h5.

So if the trapezoid rule is old and inaccurate, why does anyone care about it? Here are the surprises.

  1. You can still get a publication out of the trapezoid rule! In 1994, a doctor published a paper reinventing the trapezoid rule. Not only did the editors not recognize this ancient algorithm, the paper has been cited many times since it was published. (Update: more about the trapezoid paper here.)
  2. Although the trapezoid rule is inefficient in general, it can be shockingly efficient for periodic functions.
  3. The trapezoid rule can also be shockingly efficient for analytic functions that go to zero quickly, so called double exponential functions.

The last two observations are more widely applicable than you might think at first. What if you want to integrate something that isn’t periodic and isn’t a double exponential function? You may be able to do a change of variables that makes your integrand have one of these special forms. The article Fast Numerical Integration explains an integration method based on double exponential functions and includes C++ source code.

The potential efficiency of the trapezoid rule illustrates a general principle: a crude method cleverly applied can beat a clever technique crudely applied. The simplest numerical integration technique, one commonly taught in freshman calculus, can be extraordinarily efficient when applied with skill to the right problem. Conversely, a more sophisticated integration technique such as Gauss quadrature can fail miserably when naively applied.

Related posts:

Optical illusion, mathematical illusion

Someone sent me a link to an optical illusion while I was working on a math problem. The two things turned out to be related.

In the image below, what look like blues spiral and green spirals are actually exactly the same color. The spiral that looks blue is actually green inside, but the magenta stripes across it make the green look blue. I know you don’t believe me; I didn’t believe it either. See this blog post for an explanation, including a magnified close-up of the image. Or open it in an image editor and use a color selector to see for yourself.

My math problem was also a case of two things that look different even though they are not. Maybe you can think back to a time as a student when you knew your answer was correct even though it didn’t match the answer in the back of the book. The two answers were equivalent but written differently. In an algebra class you might answer 5 / √ 3 when the book has 5 √ 3 / 3. In trig class you might answer 1 – cos2x when the book has sin2x. In a differential equations class, equivalent answers may look very different since arbitrary constants can obfuscate differences.

In my particular problem, I was looking at weights for Gauss-Hermite integration. I was trying to reconcile two different expressions for the weights, one in some software I’d written years ago and one given in A&S. I thought I’d found a bug, at least in my comments if not in my software. My confusion was analogous to not recognizing a trig identity.  I wish I could say that the optical illusion link made me think that the two expressions may be the same and they just look different because of a mathematical illusion. That would make a good story. Instead, I discovered the equivalence of the two expressions by brute force, having Mathematica print out the values so I could compare them. Only later did I see the analogy between my problem and the optical illusion.

In case you’re interested in the details, my problem boiled down to the equivalence between Hn+1(xi)2 and 4n2Hn-1(xi)2 where Hn(x) is the nth Hermite polynomial and xi is the ith root of Hn. Here’s why these are the same. The Hermite polynomials satisfy a recurrence relation Hn+1(x) = 2x Hn(x) – 2n Hn-1(x) for all x. Since Hn(xi) = 0, Hn+1(xi) = -2nHn-1(xi). Now square both sides.

Related post: Orthogonal polynomials

Quasi-random sequences in art and integration

Sometimes when people say they want random points, that’s not what they really want. Random points clump more than most people expect. Quasi-random sequences are not random in any mathematical sense, but they might match popular expectations of randomness better than the real thing, especially for aesthetic applications. If by “random” someone means “scattered around without a clear pattern and not clumped together” then quasi-random sequences might do the trick.

Here are the first 50 points in a quasi-random sequence of points in the unit square.

quasi-random points

By contrast, here are 50 points in a unit square whose coordinates are uniform random samples.

random points

The truly random points clump together. Notice the cluster of three points in the top right corner. There are few other instances of pairs of points being very close together. Also, there are fairly large areas that don’t contain any random points. The quasi-random points by contrast are better spread out. They have a self-avoiding property that keeps them from clustering, and they fill the space more efficiently.

Quasi-random sequences could be confused with pseudo-random sequences. They’re not at all the same thing. Pseudo-random sequences are computer-generated sequences that in many ways behave as if they were truly random, even though they were produced by deterministic algorithms. For many practical purposes, including sensitive statistical tests, pseudo-random sequences are simply random sequences. (The “truly” random points above were technically “pseudo-random” points.)

The quasi-random points above were part of a Sobol sequence, a common quasi-random sequence. Other quasi-random sequences include the Halton sequence and the Hammersley sequence. Mathematically, these sequences are defined has having low-discrepancy. Roughly speaking, this means the “discrepancy” between the number of points that actually fall in a volume and the number of points you’d expect to fall in the same volume is small. See the Wikipedia article on quasi-random sequences for more mathematical details.

Besides being aesthetically useful, quasi-random sequences are useful in applied mathematics. Because these sequences explore a space more efficiently than random sequences, quasi-random sequences sometimes lead to more efficient high-dimensional integration algorithms than Monte Carlo integration. Quasi-Monte Carlo integration, i.e. integration based on quasi-random sequences rather than random sequences, is popular in financial applications. Art Owen has written insightful papers on Quasi-Monte Carlo integration (QMC). He has classified which integration problems can be efficiently computed via QMC methods and which cannot. In a nutshell, QMC works well when the effective dimension of a problem is significantly lower than the actual dimension. For example, a financial model might ostensibly depend on 1000 variables, but 50 of those variables contribute far more to the integrand than all the other variables. The integrand might essentially be a function of only 50 variables. In that case, QMC will work well. Note that it is not necessary to identify these 50 variables or do any change of variables. QMC just magically takes advantage of the situation.

One disadvantage of QMC integration is that it doesn’t naturally lead to an estimate of its own accuracy, unlike Monte Carlo integration. Several hybrid approaches have been proposed to combine QMC integration and Monte Carlo integration to get the efficiency of the former and the error estimates of the latter. For example, one could randomly jitter the quasi-random points or randomly permute their components. Some of these results are in Art Owen’s papers.

To read more about quasi-random sequences, see the book Random Number Generation and Quasi-Monte Carlo Methods.

Randomization and random number generation consulting

Numerical integration article posted

This weekend CodeProject posted an article I wrote entitled Fast numerical integration.

The algorithm in the article, introduced in the 1970’s by Masatake Mori and Hidetosi Takahasi, is indeed fast. It integrates analytical functions over bounded intervals with the most accuracy for a fixed number of integration points.

The CodeProject article includes source code and is mostly about the code. See this page for mathematical details.

Random inequalities III: numerical results

The first post in this series introduced random inequalities. The second post discussed random inequalities can could be computed in closed form. This post considers random inequalities that must be evaluated numerically.

The simplest and most obvious approach to computing P(X > Y) is to generate random samples from X and Y and count how often the sample from X is bigger than the sample from Y. However, this approach is only accurate on average. In any particular sample, the error may be large. Even if you are willing to tolerate, for example, a 5% chance of being outside your error tolerance, the required number of random draws may be large. The more accuracy you need, the worse it gets since your accuracy only improves as n-1/2 where n is the number of samples. For numerical integration methods, the error decreases as a much larger negative power of n depending on the method. In the timing study near the end of this technical report, numerical integration was 2,875 times faster than simulation for determining beta inequality probabilities to just two decimal places. For greater desired accuracy, the advantage of numerical integration would increase.

Why is efficiency important? For some applications it may not be, but often these random inequalities are evaluated in the inner loop of a simulation. A simulation study that takes hours to run may be spending most of its time evaluating random inequalities.

As derived in the previous post, the integral to evaluate is given by

begin{eqnarray*} P(X geq Y) &=& int !int _{[x > y]} f_X(x) f_Y(y) , dy , dx \ &=& int_{-infty}^infty ! int_{-infty}^x f_X(x) f_Y(y) , dy , dx \ &=& int_{-infty}^infty f_X(x) F_Y(x) , dx end{eqnarray*}

The technical report Numerical computation of stochastic inequality probabilities gives some general considerations for numerically evaluating random inequalities, but each specific distribution family must be considered individually since each may present unique numerical challenges. The report gives some specific techniques for beta and Weibull distributions.  The report Numerical evaluation of gamma inequalities gives some techniques for gamma distributions.

Numerical integration can be complicated by singularities in integrand for extreme values of distribution parameters. For example, if X ~ beta(a, b) the integral for computing P(X > Y) has a singularity if either a or b are less than 1. For general advice on numerically integrating singular integrands, see “The care and treatment of singularities” in Numerical Methods that Work by Forman S. Acton.

Once you come up with a clever numerical algorithm for evaluating a random inequality, you need to have a way to test the results.

One way to test numerical algorithms for random inequalities is to compare simulation results to integration results. This will find blatant errors, but it may not be as effective in uncovering accuracy problems. It helps to use random parameters for testing.

Another way to test numerical algorithms in this context is to compute both P(X > Y) and P(Y > X). For continuous random variables, these two values should add to 1. Of course such a condition is not sufficient to guarantee accuracy, but it is a simple and effective test. The Inequality Calculator software reports both P(X > Y) and P(Y > X) partly for the convenience of the user but also as a form of quality control: if the two probabilities do not add up to approximately 1, you know something has gone wrong. Along these lines, it’s useful to verify that P(X > Y) = 1/2 if X and Y are identically distributed.

Finally, integrands may have closed-form solutions for special parameters. For example, Exact calculation of beta inequalities gives closed-form results for many special cases of the parameters. As long as the algorithm being tested does not depend on these special values, these special values provide valuable test cases.

Random inequalities II: analytical results

My previous post introduced random inequalities and their application to Bayesian clinical trials. This post will discuss how to evaluate random inequalities analytically. The next post in the series will discuss numerical evaluation when analytical evaluation is not possible.

For independent random variables X and Y, how would you compute P(X>Y), the probability that a sample from X will be larger than a sample from Y? Let fX be the probability density function (PDF) of X and let FX be the cumulative distribution function (CDF) of X. Define fY and FY similarly. Then the probability P(X > Y) is the integral of fX(x) fY(y) over the part of the x-y plane below the diagonal line x = y.

begin{eqnarray*} P(X geq Y) &=& int !int _{[x > y]} f_X(x) f_Y(y) , dy , dx \ &=& int_{-infty}^infty ! int_{-infty}^x f_X(x) f_Y(y) , dy , dx \ &=& int_{-infty}^infty f_X(x) F_Y(x) , dx end{eqnarray*}

This result makes intuitive sense: fX(x) is the density for x and FY(x)  is the probability that Y is less than x. Sometimes this integral can be evaluated analytically, though in general it must be evaluated numerically. The technical report Numerical computation of stochastic inequality probabilities explains how P(X > Y) can be computed in closed form for several common distribution families as well as how to evaluate inequalities involving other distributions numerically.

Exponential: If X and Y are exponential random variables with mean μX and μY respectively, then P(X > Y) = μX/(μX + μY).

Normal: If X and Y are normal random variables with mean and standard deviation (μX, σX) and (μY, σY) respectively, then P(X > Y) = Φ((μX – μY)/√(σX2 + σY2)) where Φ is the CDF of a standard normal distribution.

Gamma:  If X and Y are gamma random variables with shape and scale (αX, βX) and (αY, βY) respectively, then P(X > Y) = IxX/(βX + βY)) where Ix is the incomplete beta function with parameters αY and αX, i.e. the CDF of a beta distribution with parameters αY and αX.

The inequality P(X > Y) where X and Y are beta random variables comes up very often in applications. This inequality cannot be computed in closed form in general, though there are closed-form solutions for special values of the beta parameters. If X ~ beta(a, b) and Y ~ beta(c, d), the probability P(X > Y) can be evaluated in closed form if

  1. one of the parameters a, b, c, or d is an integer,
  2. a + b + c + d = 1, or
  3. a + b = c + d = 1.

These last two cases can be combined with a recurrence relation to compute other probabilities. See Exact calculation of beta inequalities for more details.

Sometimes you need to calculate P(X > max(Y, Z)) for three independent random variables. This comes up, for example, when computing adaptive randomization probabilities for a three-arm clinical trial. For a time-to-event trial as implemented here, the random variables have a gamma distribution. See Numerical evaluation of gamma inequalities for analytical as well as numerical results for computing P(X > max(Y, Z)) in that case.

The next post in this series will discuss how to evaluate random inequalities numerically when closed-form integration is not possible.

Update: See Part IV of this series for results with the Cauchy distribution.

Diagramming modes of convergence

Sometimes a good diagram is a godsend, reducing the entropy in your head at a glance.

When I was studying integration theory, I ran across a diagram something like the following in the out-of-print book Elements of Integration by Robert G. Bartle.

convergence for a general measure space

Each arrow summarizes a theorem. The four corners stand for modes of convergence: almost everywhere, almost uniform, Lp, and convergence in measure. A solid line from one mode to another means that if a sequence of function converges in the first mode then it also converges in the other. A dashed line means that a subsequence converges in the second mode. For example, if a sequence of functions fn converge to a function f, then the sequence also converges in measure to f, and a subsequence converges to f almost everywhere.

Note that this isn’t the typical mathematical diagram where the corners represent spaces and the arrows represent functions. I don’t recall seeing anything like this outside of Bartle’s book.

For a finite measure space, additional relationships hold as summarized in the diagram below.

convergence for a finite measure space

If the sequence fn is uniformly dominated by an Lp function g, then even more relationships hold.

dominated convergence

For more details, see modes of convergence. These notes also contain counterexamples that show no other relationships exist. Three simple counterexamples suffice to show no arrows are missing.