# Gamma distribution tail probability approximation

This post will approximate of the tail probability of a gamma random variable using the heuristic given in the previous post.

## The gamma distribution

Start with the integral defining Γ(a). Divide the integrand by Γ(a) so that it integrates to 1. This makes the integrand into a probability density, and the resulting probability distribution is called the gamma distribution with shape a. To evaluate the probability of this distribution’s tail, we need to integrate

This distribution has mean and variance a, and mode a-1.

We’re interested in approximating the integral above for moderately large x relative to a.

## Approximation

To apply the 1/e heuristic we need to find a value of t such that

ta-1 exp(-t) = xa-1 exp(-x)/e = xa-1 exp(-x – 1).

This equation would have to be solved numerically, but lets try something very simple first and see how it works. Let’s assume the ta-1 term doesn’t change as much as the the exp(-t) term between x and x + 1. This says an approximate solution to our equation for t is simply

t = x + 1

and so the base of our rectangle runs from x to x + 1, i.e. has width 1. So the area of our rectangle is simply its height. This gives us

In other words, the CDF is approximately the PDF. That’s all nice and simple, but how good is it? Here’s a plot for a = 5.

So is this good or not? It’s not a question of whether but of which. It’s a poor approximation for some parameters but a very good approximation for others. The approximation improves when a is closer to 1 and when x is larger.

We were kinda sloppy in solving for t above. Would it help if we had a more accurate value of t? No, that doesn’t make much difference, and it actually makes things a little worse. The important issue is to identify over what range of a and x the approximation works well.

## Asymptotic series

It turns out that our crude integration estimate happens to correspond to the first term in an asymptotic series for our integral. If X is a gamma random variable with shape a then we have the following asymptotic series for the tail probability.

So if we truncate the portion inside the parentheses to just 1, we have the approximation above. The approximation error from truncating an asymptotic series is approximately the first term that was left out.

So if x is much larger than a, the approximation error is small. Since the expected value of X is a, the tail probabilities correspond to values of x larger than a anyway. But if x is only a little larger than a, the approximation above won’t be very good, and adding more terms in the series won’t help either.

Here’s a contour plot of the absolute error:

The deep blue area below the diagonal of the graph shows the error is smallest when x > a. In 3D, the plot has a high plateau in the top left and a valley in the lower right, with a rapid transition between the two, suggested by the density of contour lines near the diagonal.

## Epilogue

In the previous post I said that the next post would give an example where the 1/e heuristic doesn’t work well. This post has done that. It’s also given an example when the heuristic works very well. Both are the same example but with different parameters. When x >> a the method works well, and the bigger x is relative to a, the better it works.

# Quasi-Monte Carlo integration: periodic and nonperiodic

Monte Carlo integration, or “integration by darts,” is a general method for evaluating high-dimensional integrals. Unfortunately it’s slow. This motivated the search for methods that are like Monte Carlo integration but that converge faster.

Quasi-Monte Carlo (QMC) methods use low discrepancy sequences that explore a space more efficiently than random samples. These sequences are deterministic but in some ways behave like random sequences, hence the name.

On the right problem, QMC integration can be very efficient. On other problems QMC can be mediocre. This post will give an example of each.

## Computing ζ(3)

The first example comes from computing the sum

which equals ζ(3), the Riemann zeta function evaluated at 3.

A few days ago I wrote about ways to compute ζ(3). The most direct way of computing the sum is slow. There are sophisticated ways of computing the sum that are much faster.

There is an integral representation of ζ(3), and so one way to numerically compute ζ(3) would be to numerically compute an integral.

Let’s see how well quasi-Monte Carlo integration works on this integral. We will use a lattice rule, evaluating our integrand at the points of a quasi-Monte Carlo sequence

and averaging the results. (Lattice sequences are a special case of quasi-Monte Carlo sequences.)

The notation {x} denotes the fractional part of x. For example, {π} = 0.14159….

Here’s some Python code to compute the integral.

    import numpy as np

def f(x): return 1/(1 + x[0]*x[1]*x[2])

v = np.sqrt(np.array([2,3,5]))
def g(N): return sum( f(n*v % 1) for n in range(1,N+1) ) / N


And here’s a plot that shows how well this does as a function of N.

The results are mediocre. The averages are converging to the right value as N increases, but the accuracy isn’t great. When N = 100 the error is 0.001, larger than the error from summing 100 terms of the sum defining ζ(3).

First, this doesn’t mean that evaluating ζ(3) by computing an integral is necessarily a bad idea; a more efficient integration method might compute ζ(3) fairly efficiently.

Second, this doesn’t mean that QMC is a poor integration method, only that there are better ways to compute this particular integral. QMC is used a lot in practice, especially in financial applications.

## Periodic integrands

The lattice rule above works better on periodic integrands. Of course in practice a lot of integrands are not periodic. But if you have an efficient way to evaluate periodic integrands, you can do a change of variables to turn your integration problem into one involving a periodic integrand.

We’ll use the same QMC sequence as above with the integrand

f(x, y, z) = x (1-x) sin πy

which has integral 1/3π over the unit cube. Here’s a plot of the accuracy as the number of integration points N increases.

This is much better than before. The error is 0.0007 when N = 100. As N increases, the error goes down more rapidly than with the example above. When computing ζ(3), the error is about 0.001, whether N = 100 or 1,000. In this example, when N = 1,000 the error drops to 0.00003.

This example is periodic in x, y, and z, but the periodic extension is only continuous. The extension has a cusp when x or y take on integer values, and so the derivative doesn’t exist at those points.

If we square our integrand, setting

f(x, y, z) = x² (1-x)² sin² πy

then the periodic extension is differentiable, and the integration method converges more quickly. The exact integral over the unit cube is 1/60.

Now our approximation error is about 7.3 ×10-5 when N = 100, 7.8 ×10-6 when N = 1,000, and 9.5 ×10-8 when N = 10,000.

Lattice rules work best on smooth, periodic integrands. If the integrand is smooth but not periodic and you want to use a lattice rule for integration, you’ll get better results if you first do a change of variables to create a smooth, periodic integrand.

# Anything that starts with an integral sign

Ran across a fun quote this afternoon:

There are in this world optimists who feel that any symbol that starts off with an integral sign must necessarily denote something that will have every property that they should like an integral to possess. This of course is quite annoying to us rigorous mathematicians; what is even more annoying is that by doing so they often come up with the right answer.

Bulletin of the American Mathematical Society, v. 69, p. 611, 1963.

As much as mathematicians like to joke about the gaps between pure and applied math, as in the quote above, the gap is not as wide as supposed. Things like delta “functions,” or differentiating non-differentiable functions, or summing divergent series have a rigorous theory behind them.

It is true that these things were used in practice before theory caught up, but the theory caught up a long time ago. There are still instances where hand-waving calculation is ahead of theory (or just wrong), and there always will be, but distribution theory and asymptotic analysis provided a rigorous justification for a lot of seemingly non-rigorous math decades ago. This is well-known to some mathematicians and unknown to others.

That said, you still have to be careful. Some formal calculations lead to false results. They don’t just lack rigorous theory; they’re simply wrong.

# Not quite going in circles

Sometimes you feel like you’re running around in circles, not making any progress, when you’re on the verge of a breakthrough. An illustration of this comes from integration by parts.

A common example in calculus classes is to integrate ex sin(x) using integration by parts (IBP). After using IBP once, you get an integral similar to the one you started with. And if you apply IBP again, you get exactly the integral you started with.

At this point you believe all is lost. Apparently IBP isn’t going to work and you’ll need to try another approach.

But then the professor pulls a rabbit out of a hat. By moving the integral on the right side to the left, you can solve for the unknown integral in terms of the debris IBP left along the way.

So you weren’t going in circles after all. You were making progress when it didn’t feel like it.

It’s not that gathering unknowns to one side is a new idea; you would have seen that countless times before you get to calculus. But that’s not how integration usually works. You typically start with the assigned integral and steadily chip away at it, progressing in a straight line toward the result. The trick is seeing that a simple idea from another context can be applied in this context.

In the calculation above we first let u = ex and v’ = sin(x) on the left. Then when we come to the first integral on the right, we again set u = ex and this time v’ = cos(x).

But suppose you come to the second integral and think “This is starting to look futile. Maybe I should try something different. This time I’ll let ex be the v‘ term rather than the u term.” In that case you really will run in circles. You’ll get exactly the same expression on both sides.

It’s hard to say in calculus, or in life in general, whether persistence or creativity is called for. But in this instance, persistence pays off. If you set u to the exponential term both times, you win; if you switch horses mid stream, you lose.

Another way to look at the problem is that the winning strategy is to be persistent in your approach to IBP, but use your creativity at the end to realize that you’re nearly done, just when it may seem you need to start over.

If you’d like to read more about coming back where you started, see Coming full circle and then this example from signal processing.

# Approximating rapidly divergent integrals

A while back I ran across a paper [1] giving a trick for evaluating integrals of the form

where M is large and f is an increasing function. For large M, the integral is asymptotically

That is, the ratio of A(M) to I(M) goes to 1 as M goes to infinity.

This looks like a strange variation on Laplace’s approximation. And although Laplace’s method is often useful in practice, no applications of the approximation above come to mind. Any ideas? I have a vague feeling I could have used something like this before.

There is one more requirement on the function f. In addition to being an increasing function, it must also satisfy

In [1] the author gives several examples, including using f(x) = x². If we wanted to approximate

the method above gives

exp(10000)/200 = 4.4034 × 104340

whereas the correct value to five significant figures is 4.4036 × 104340.

Even getting an estimate of the order of magnitude for such a large integral could be useful, and the approximation does better than that.

[1] Ira Rosenholtz. Estimating Large Integrals: The Bigger They Are, The Harder They Fall. The College Mathematics Journal, Vol. 32, No. 5 (Nov., 2001), pp. 322-329

# Curse of dimensionality and integration

The curse of dimensionality refers to problems whose difficulty increases exponentially with dimension. For example, suppose you want to estimate the integral of a function of one variable by evaluating it at 10 points. If you take the analogous approach to integrating a function of two variables, you need a grid of 100 points. For a function of three variables, you need a cube of 1000 points, and so forth.

You cannot estimate high-dimensional integrals this way. For a function of 100 variables, using a lattice with just two points in each direction would require 2100 points.

There are much more efficient ways to approximate integrals than simply adding up values at grid points, assuming your integrand is smooth. But when applying any of these methods to multi-dimensional integrals, the computational effort increases exponentially with dimension.

The methods that first come to mind don’t scale well with dimension, but that doesn’t necessarily mean there aren’t any methods that do scale well.

Are there numerical integration methods whose computational requirement scale slower than exponentially with dimension? In general, no. You can beat the curse of dimension for special integrals. And if you’re content with probabilistic bounds rather than deterministic bounds, you can get around the curse by using Monte Carlo integration, sort of [1].

If you want worst-case error bounds, you cannot escape the curse of dimensionality [2]. You can require that the functions you’re integrating have so many derivatives and that these derivatives are bounded by some constant, but the amount of computation necessary to guarantee that the error is below a specified threshold increases exponentially with dimension.

## More integration posts

[1] In one sense Monte Carlo methods are independent of dimension. But in an important sense they scale poorly with dimension. See a resolution of this tension here.

[2] The curse of dimensionality for numerical integration of smooth functions. A. Hinrichs, E. Novak, M. Ullrich and H. Woźniakowski. Mathematics of Computation, Vol. 83, No. 290 (November 2014), pp. 2853-2863

# Trapezoid rule and Romberg integration

This post will look at two numerical integration methods, the trapezoid rule and Romberg’s algorithm, and memoization. This post is a continuation of ideas from the recent posts on Lobatto integration and memoization.

Although the trapezoid rule is not typically very accurate, it can be in special instances, and Romberg combined it with extrapolation to create a very accurate method.

## Trapezoid rule

The trapezoid is the simplest numerical integration method. The only thing that could be simpler is Riemann sums. By replacing rectangles of Riemann sums with trapezoids, you can make the approximation error an order of magnitude smaller.

The trapezoid rule is crude, and hardly recommended for practical use, with two exceptions. It can be remarkably efficient for periodic functions and for analytic functions that decay double exponentially. The trapezoid rule works so well in these cases that it’s common to transform a general function so that it has one of these forms so the trapezoid rule can be applied.

To be clear, the trapezoid rule for a given step size h may not be very accurate. But for periodic and double exponential functions the error decreases exponentially as h decreases.

Here’s an implementation of the trapezoid rule that follows the derivation directly.

    def trapezoid1(f, a, b, n):
integral = 0
h = (b-a)/n
for i in range(n):
integral += 0.5*h*(f(a + i*h) + f(a + (i+1)*h))
return integral


This code approximates the integral of f(x) over [a, b] by adding up the areas of n trapezoids. Although we want to keep things simple, a slight change would make this code twice as efficient.

    def trapezoid2(f, a, b, n):
integral = 0.5*( f(a) + f(b) )
h = (b-a)/n
for i in range(1, n):
integral += f(a + i*h)
return h*integral


Now we’re not evaluating f twice at every interior point.

## Estimating error

Suppose you’ve used the trapezoid rule once, then you decide to use it again with half as large a step size in order to compare the results. If the results are the same within your tolerance, then presumably you have your result. Someone could create a function where this comparison would be misleading, where the two results agree but both are way off. But this is unlikely to happen in practice. As Einstein said, God is subtle but he is not malicious.

If you cut your step size h in half, you double your number of integration points. So if you evaluated your integrand at n points the first time, you’ll evaluate it at 2n points the second time. But half of these points are duplicates. It would be more efficient to save the function evaluations from the first integration and reuse them in the second integration, only evaluating your function at the n new integration points.

It would be most efficient to write your code to directly save previous results, but using memoization would be easier and still more efficient than redundantly evaluating your integrand. We’ll illustrate this with Python code.

Now let’s integrate exp(cos(x)) over [0, π] with 4 and then 8 steps.

    from numpy import exp, cos, pi

print( trapezoid2(f, 0, pi, 4) )
print( trapezoid2(f, 0, pi, 8) )


This prints

    3.97746388
3.97746326


So this suggests we’ve already found our integral to six decimal places. Why so fast? Because we’re integrating a periodic function. If we repeat our experiment with exp(x) we see that we don’t even get one decimal place agreement. The code

    print( trapezoid2(exp, 0, pi, 4 ) )
print( trapezoid2(exp, 0, pi, 8 ) )


prints

    23.26
22.42


## Eliminating redundancy

The function trapezoid2 eliminated some redundancy, but we still have redundant function evaluations when we call this function twice as we do above. When we call trapezoid2 with n = 4, we do 5 function evaluations. When we call it again with n = 8 we do 9 function evaluations, 4 of which we’ve done before.

As we did in the Lobatto quadrature example, we will have our integrand function sleep for 10 seconds to make the function calls obvious, and we will add memoization to have Python to cache function evaluations for us.

    from time import sleep, time
from functools import lru_cache

@lru_cache()
def f(x):
sleep(10)
return exp(cos(x))

t0 = time()
trapezoid2(f, 0, pi, 4)
t1 = time()
print(t1 - t0)
trapezoid2(f, 0, pi, 8)
t2 = time()
print(t2 - t1)


This shows that the first integration takes 50 seconds and the second requires 40 seconds. The first integration requires 5 function evaluations and the second requires 9, but the latter is faster because it only requires 4 new function evaluations.

## Romberg integration

In the examples above, we doubled the number of integration intervals and compared results in order to estimate our numerical integration error. A natural next step would be to double the number of intervals again. Maybe by comparing three integrations we can see a pattern and project what the error would be if we did more integrations.

Werner Romberg took this a step further. Rather than doing a few integrations and eye-balling the results, he formalized the inference using Richardson extrapolation to project where the integrations are going. Specifically, his method applies the trapezoid rule at 2m points for increasing values of m. The method stops when either the maximum value of m has been reached or the difference between successive integral estimates is within tolerance. When Romberg’s method is appropriate, it converges very quickly and there is no need for m to be large.

To illustrate Romberg’s method, let’s go back to the example of integrating exp(x) over [0, π]. If we were to use the trapezoid rule repeatedly, we would get these results.

     Steps    Results
1  37.920111
2  26.516336
4  23.267285
8  22.424495
16  22.211780
32  22.158473


This doesn’t look promising. We don’t appear to have even the first decimal correct. But Romberg’s method applies Richardson extrapolation to the data above to produce a very accurate result.

    from scipy.integrate import romberg

r = romberg(exp, 0, pi, divmax = 5)
print("exact:   ", exp(pi) - 1)
print("romberg: ", r)


This produces

    exact:    22.1406926327792
romberg:  22.1406926327867


showing that although none of the trapezoid rule estimates are good to more than 3 significant figures, the extrapolated estimate is good 12 figures, almost to 13 figures.

If you pass the argument show=True to romberg you can see the inner workings of the integration, including a report that the integrand was evaluated 33 times, i.e. 1 + 2m times when m is given by divmax.

It seems mysterious how Richardson extrapolation could take the integral estimates above, good to three figures, and produce an estimate good to twelve figures. But if we plot the error in each estimate on a log scale it becomes more apparent what’s going on.

The errors follow nearly a straight line, and so the extrapolated error is “nearly” negative infinity. That is, since the log errors nearly follow a straight line going down, polynomial extrapolation produces a value whose log error is very large and negative.

# Scaling and memoization

The previous post explained that Lobatto’s integration method is more efficient than Gaussian quadrature when the end points of the interval need to be included as integration points. It mentioned that this is an advantage when you need to integrate over a sequence of contiguous intervals, say [1, 2] then [2, 3], because the function being integrated only needs to be evaluated at the common end points once.

This occurs in application, for example, when numerically solving differential equations. An integral might need to be evaluated at a sequence of contiguous intervals, one for each time step.

This post will illustrate the time savings from the combination of Lobatto integration and memoization. i.e. caching function evaluations. Some languages have a built-in feature for memoization. In other languages you may need to write your own memoization code.

## Scaling

In order to integrate functions over intervals other than [-1, 1] we need a change of variables to rescale the domain. We’ll incorporate that in the code below.

Here is our code to integrate a function f over an interval [a, b].

    def lobatto(f, a, b):
# Change integration interval to [-1, 1]
c = (b-a)/2
d = (b+a)/2
# Multiply by c because c is the
# Jacobian of the change of variables
return c*integrate(lambda x: f(c*x + d),
lobatto_points, lobatto_weights)


## Reducing function evaluations

Next, we create a function to integrate which takes 10 second to evaluate. This is an artificial example, but the time required for numerical integration often is dominated by function evaluations. Here we choose an example that makes this obvious.

    from time import sleep, time

def my_slow_function(x):
sleep(10)
return x**3 + x**2


The following code integrates my_slow_function over three contiguous intervals.

    t0 = time()
lobatto(my_slow_function, 1, 2)
lobatto(my_slow_function, 2, 3)
lobatto(my_slow_function, 3, 4)
t1 = time()
print("Elapsed time: ", t1-t0)


This code takes 150 seconds because each integration requires five function evaluations at 10 seconds each.

However, by adding one line of code we can reduce the run time to 130 seconds. We add the decorator functools.lru_cache() to ask Python to cache evaluations of our integrand.

    @functools.lru_cache()
def my_slow_function(x):
sleep(10)
return x**3 + x**2


Now the three integrations above take 130 seconds because my_slow_function is only evaluated at 2 and 3 one time each.

You could write your own code to cache function evaluations, and that might be worthwhile if efficiency is a priority, but it’s easy to let the language do it for you.

# Lobatto integration

A basic idea in numerical integration is that if a method integrates polynomials exactly, it should do well on polynomial-like functions [1]. The higher the degree of polynomial it integrates exactly, the more accurate we expect it will be on functions that behave like polynomials.

The best known example of this is Gaussian quadrature. However, this post shows why for some applications you might want to use Lobatto quadrature instead. Lobatto quadrature is similar in spirit to Gaussian quadrature, but allocates points to solve a slightly different optimization problem.

If you need to integrate a function over an interval, it matters a great deal where you choose to evaluate the function. The optimal choice, in term of what polynomials you can integrate exactly, is to use Gaussian quadrature. By evaluating the integrand at n optimally chosen points you can integrate polynomials of degree 2n – 1 or less exactly.

So if Gaussian integration is optimal, why would you want to do anything else? The Gaussian integration points are all interior to the interval of integration. In some applications, you have to evaluate your integrand at the end points, and so you want to optimize subject to a constraint: how can you best allocate your integration points subject to the constraint that two of your integration points are the two ends of the interval? The solution is Lobatto quadrature, also called Gauss-Lobatto quadrature.

By evaluating the integrand at n points, two of which are the end points, Lobatto’s method exactly integrates polynomials of degree 2n-3.

Suppose for whatever reason you already know the value of the integrand evaluated at the end points. You’ve got m more function evaluations to spend. If you use those m points for Gaussian quadrature, you can exactly integrate polynomials of degree

2m – 1

or less. But if you use Lobatto quadrature, your m interior evaluations plus your two known values at the end points give you a total of m+2 function evaluations, and so can integrate polynomials of degree

2(m + 2) – 3 = 2m + 1

or less exactly, two degrees higher than if we had used Gaussian quadrature.

Next, suppose you only know the value of the function you’re integrating at one end point. Say you’ve already integrate f(x) over the interval [1, 2] using Lobatto quadrature, and now you want to integrate over [2, 3]. You already know the value f(2) from your previous integration.

Suppose you have m new function evaluations you can afford, and you don’t know f(3). If you use Lobatto quadrature, f(3) has to come out of your function evaluation budget, so you can afford m – 1 interior integration points. You then know the value of f(x) at m+1 points: f(2) came for free, you evaluated f(x) at m – 1 interior points and at x = 3. This lets you exactly integrate polynomials of degree

2(m + 1) – 3 = 2m – 1

or less, the same as Gaussian quadrature. But if you then need to integrate over [3, 4], knowing f(3) gives you a head start on the next integration, and so on.

## Weights and integration points

You can look up the weights and integration points for Gaussian quadrature and Lobatto quadrature in, for example, Abramowitz and Stegun.

There is a nice symmetry between the two integration methods: Gaussian quadrature uses integration points based on the zeros of Legendre polynomials, and weights that depend on the derivatives of these polynomials. Lobatto quadrature is the other way around: integration points are given by the zeros of derivatives of Legendre polynomials, and the weights involve the Legendre polynomials themselves.

## Python example

Here we’ll implement the Gauss and Lobatto rules of order five. Most of the code is data on integration points and weights.

    gauss_points = [
-0.906179845938664,
-0.538469310105683,
0.0,
+0.538469310105683,
+0.906179845938664
]

gauss_weights = [
0.23692688505618908,
0.47862867049936647,
0.5688888888888889,
0.47862867049936647,
0.23692688505618908
]

lobatto_points = [
-1.0,
-0.6546536707079771,
0,
+0.6546536707079771,
+1.0
]

lobatto_weights = [
0.1,
0.5444444444444444,
0.7111111111111111,
0.5444444444444444,
0.1
]


The integration points and weights are symmetrical, so you could make the code more compact at the expense of making it a little more complicated. Putting the + in front of positive integration points is a little unconventional, but it emphasizes the symmetry by making the positive and negative weights align vertically.

Here’s our integration code:

    def integrate(f, xs, ws):
return sum(f(xs[i])*ws[i] for i in range(5))


where we pass it the function to integrate and either Gauss data or Lobatto data.

The following verifies that with 5 integration points, Gauss should be able to exactly integrate a 9th order polynomial, and Lobatto should be able to integrate a 7th order polynomial.

    print( integrate(lambda x: x**9 + 1,
gauss_points, gauss_weights) )

print( integrate(lambda x: x**7 + 1,
lobatto_points, lobatto_weights) )


Both print 2.0 as expected. The integral of an odd function over [-1, 1] is zero, and the integral of 1 over the same interval is 2.

Now let’s use both to integrate cosine over [-1, 1].

    print( integrate(cos, gauss_points, gauss_weights) )
print( integrate(cos, lobatto_points, lobatto_weights) )


The exact integral is 2 sin(1). Here are the results.

    Exact:   1.682941969
Gauss:   1.682941970
Lobatto: 1.682942320


So Gauss is correct to 8, almost 9, decimal places, and Lobatto is correct to 5 decimal places.

Next, let’s hard code a 3rd order Gauss rule for comparison.

    def gauss3(f):
k = 0.6**0.5
s = f(-k)*5 + f(0)*8 + f(k)*5
return s/9


We can verify that it integrates fifth order polynomials exactly:

    print(gauss3(lambda x: x**5 + 1))

and we can use it to integrate cosine:

    print(gauss3(cos))

This returns 1.68300, a little less accurate then the Lobatto rule above, illustrating that typically Lobatto will be more accurate than Gauss with the same number of function evaluations interior to the interval of integration.

## More on numerical integration

[1] Polynomials can’t have horizontal asymptotes, for example, and so we should not be surprised that a method that integrates high order polynomials exactly could still do poorly on, say, a normal probability density.

# Integral approximation trick

Here’s a simple integration approximation that works remarkably well in some contexts.

Suppose you have an integrand that looks roughly like a normal density, something with a single peak that drops off fairly quickly on either side of the peak. The majority of integrals that arise in basic applications of probability and statistics fit this description.

You can approximate the value of the integral as the area of a box. The height of the box is the value of the integrand at its peak. The width of the box is the FWHM: full width at half maximum, i.e. the distance between two points on where the integrand take on half its maximum value.

This post will include several examples to illustrate how well, or how poorly, the technique works.

## Gaussian example

First let’s look at exp(-x²), the normal density function apart for some scaling factors, integrated over the whole real line. The maximum of the function is 1, occurring at x = 0. So half the maximum is 1/2, and exp(-x²) = 1/2 at x = ± √ log 2. The FWHM is thus 2 √ log 2, and since our maximum is 1, the FWHM is the integral approximation. This gives 1.665, whereas the exact value is √π = 1.772.

## Polynomial example

Next, let’s look at integrating 1/(1 + x2n) for positive integers n. In every case, the maximum is 1. The integrand takes on half its maximum value at x = ± 1, so the FWHM is always 2. So we would approximate the integral as 2 in every case, independent of n.

When n = 1, the correct value of the integral is π and so our approximation of 2 is pretty bad. But that’s not surprising. We said above that our integrand needs to drop off fairly quickly, but that’s not the case here. We have a fat tailed distribution, the Cauchy, and so it’s not surprising that our approximation is poor.

However, as n increases, our integral drops off more quickly, and the approximation improves. When n = 2, the exact value of the integral is 2.221. When n = 3, the integral is 2.094. When n = 10 the integral is 2.008. And in the limit as n goes to infinity, the integral is 2 and our approximation is exact.

## Logistic example

The integrands in the examples above were all symmetric about the origin. That may give the misleading impression that our approximation only applies to symmetric integrals. The examples also all had a maximum of 1, which could give the impression that the integral approximation is just the FWHM, not the FWHM times the maximum. We’ll correct both of these impressions in the next example.

Our task is to approximate the integral

which comes up in Bayesian logistic regression with an improper prior.

The integrand is symmetric about 0 when a = b. But otherwise the integrand is not symmetric, and the peak occurs at log(a/b) rather than at 0 and the value at that peak is

We’ll use the following Python code to help with our calculations.

    from scipy import log, exp
from scipy.optimize import brentq

def logistic_max(a, b):
return a**a * b** b /(a+b)**(a+b)

def fwhm(a, b):
hm = 0.5*logistic_max(a, b)
f = lambda x: a*x - (a+b)*log(1 + exp(x)) - log(hm)
left = brentq(f, -2, 0)
right = brentq(f, 0, 2)
return right - left

def approx(a, b):
return logistic_max(a, b)*fwhm(a, b)


And now for a couple examples. When a = 3 and b = 2, we get an approximate value of 0.0761, compared to an exact value of 0.0833, a relative error of 8.6%.

When a  = 10 and b = 12, we get an approximate value of 2.647 × 10-7, compared to an exact value of 2.835 × 10-7, a relative error of 6.6%.