Overflow

Suppose you’re doing something that has probability of success p and probability of failure q = 1 − p. If you repeat what you’re doing m+n times, the probability of m successes and n failures is given by

$\frac{(m+n)!}{m!\, n!} p^m q^n$

Now suppose m and n are moderately large. The terms (m+n)! and m! n! will be huge, but the terms p^m and qⁿ will be tiny. The huge terms might overflow, and the tiny terms might underflow, even though the final result may be a moderate-sized number. The numbers m and n don’t have to be that large for this to happen since factorials grow very quickly. On a typical system, overflow happens if m+n > 170. How do you get to the end result without having to deal with impossibly large or small values along the way?

The trick is to work with logs. You want to first calculate

log( (m+n)! ) − log( m! ) − log( n! ) + m log( p ) + n log( q )

then exponentiate the result. This pattern comes up constantly in statistical computing.

Libraries don’t always include functions for evaluating factorial or log factorial. They’re more likely to include functions for Γ(x) and its log. For positive integers n, Γ(n+1) = n!. Now suppose you have a function lgamma that computes log Γ(x). You might write something like this.

    double probability(double p, double q, int m, int n)
    { 
        double temp = lgamma(m + n + 1.0);
        temp -=  lgamma(n + 1.0) + lgamma(m + 1.0);
        temp += m*log(p) + n*log(q);
        return exp(temp);
    }

The function lgamma is not part of the ANSI standard library for C or C++, but it is part of the POSIX standard. On Unix-like systems, lgamma is included in the standard library. However, Microsoft does not include lgamma as part of the Visual C++ standard library. On Windows you have to either implement your own lgamma function or grab an implementation from somewhere like Boost.

Here’s something to watch out for with POSIX math libraries. I believe the POSIX standard does not require a function called gamma for evaluating the gamma function Γ(x). Instead, the standard requires functions lgamma for the log of the gamma function and tgamma for the gamma function itself. (The mnemonic is “t” for “true,” meaning that you really want the gamma function.) I wrote a little code that called gamma and tested it on OS X and Red Hat Linux this evening. In both cases gcc compiled the code without warning, even with the -Wall and -pedantic warning flags. But on the Mac, gamma was interpreted as tgamma and on the Linux box it was interpreted as lgamma. This means that gamma(10.0) would equal 362880 on the former and 12.8018 on the latter.

If you don’t have access to an implementation of log gamma, see How to compute log factorial.

Suppose you need to evaluate the function f(x) = log(1 + e^x). The most obvious code to write something like this in C:

    double f(double x) { return log(1 + exp(x)); }

This is so simple you don’t even test it. Then someone calls you to say they’re getting strange output such as 1.#INF depending on your environment. What’s going on?

For large values of x, e^x is much bigger than 1, so f(x) is approximately log(e^x) which equals x. So, for example, f(1000) should return something very close to 1000. But instead, you get gibberish. In most environments a floating point number can be as large as about 10³⁰⁸, but exp(1000) is about 2×10⁴³⁴ and so the result overflows. Then the code takes the log of “infinity.”

But f(1000) shouldn’t overflow; the result is about 1000. Our function result can be contained in a floating point number, but our intermediate result exp(1000) cannot be. We need some way of telling the computer “Hold on. I know this is going to be a huge number, but trust me for a minute. After I add 1 I’m going to take a log and bring the result back down to a medium-sized number.” Unfortunately computers don’t work that way.

One solution would be to see whether x is so large that exp(x) will overflow, and in that case f(x) could just return x. So our second attempt might look like this:

    double f(double x) { return (x > X_MAX) ? x : log(1 + exp(x)); }

I’ll come back later to what X_MAX should be. Suppose you’ve found a good value for this constant and release your code again. Then you get another call. They say your code is returning zero when it should not.

Someone is trying to compute f(-50). They know the answer is small, but it shouldn’t be zero. How can that be? Since you just learned about overflow, you suspect the problem might have to do with the opposite problem, underflow. But that’s not quite it. The result of exp(-50) does not underflow; it’s about 2×10⁻²². But it is so small that machine precision cannot distinguish 1+exp(-50) from 1. So the code returns log(1), which equals zero. This may or may not be a problem. The correct answer is very small, so the absolute error is small but the relative error is 100%. Whether that is a problem depends on what you’re going to do with the result. If you’re going to add it to a moderate size number, no big deal. If you’re going to divide by it, it’s a very big deal.

Now what do you do? You think back to your calculus class and remember that for small x, log(1 + x) is approximately x with error on the order of x². (To see this, expand log(1 + x) in a power series centered at 1.) If x is so small that 1 + x equals 1 inside the computer, x² must be very small indeed. So f(x) could return exp(x) if exp(x) is sufficiently small. This gives our third attempt.

double f(double x)
{ 
    if (x > X_MAX) return x; 
    if (x < X_MIN) return exp(x);
    return log(1.0 + exp(x));
}

This is a big improvement. It can still underflow for large negative x, but in that case the function itself is underflowing, not just an intermediate result.

Now what do we make X_MAX and X_MIN? For X_MAX we don’t really have to worry about exp(x) overflowing. We can stop when x is so large that exp(x) + 1 equals exp(x) to machine precision. In C there is a header file float.h that contains a constant DBL_EPSILON which is the smallest number ε such that 1 + ε does not equal 1 in machine precision. So it turns out we can set X_MAX equal to -log(DBL_EPSILON). Similarly we could set X_MIN equal to log(DBL_EPSILON).

There’s still a small problem. When x is large and negative, but not so negative that 1 + exp(x) equals 1 in the computer, we could lose precision in computing log(1 + exp(x)).

I have just posted an article on CodeProject entitled Avoiding Overflow, Underflow, and Loss of Precision that has more examples where the most direct method for evaluating a function may not work. Example 2 in that paper explains why directly computing log(1 + y) can be a problem even if y is not so small that 1 + y equals 1 to machine precision. The article explains how to compute log(1 + y) in that case. Setting y = exp(x) explains how to finish the example here when x is moderately large and negative.

How to calculate binomial probabilities

Overflow and loss of precision