Suppose you need to evaluate the function *f*(*x*) = log(1 +* e ^{x}*). The most obvious code to write something like this in C:

double f(double x) { return log(1 + exp(x)); }

This is so simple you don’t even test it. Then someone calls you to say they’re getting strange output such as `1.#INF`

depending on your environment. What’s going on?

For large values of *x*, *e ^{x}* is much bigger than 1, so

*f*(

*x*) is approximately log(

*e*) which equals

^{x}*x*. So, for example,

`f(1000)`

should return something very close to 1000. But instead, you get gibberish. In most environments a floating point number can be as large as about 10^{308}, but

`exp(1000)`

is about 2×10^{434}and so the result overflows. Then the code takes the log of “infinity.”

But `f(1000)`

shouldn’t overflow; the result is about 1000. Our function result can be contained in a floating point number, but our intermediate result `exp(1000)`

cannot be. We need some way of telling the computer “Hold on. I know this is going to be a huge number, but trust me for a minute. After I add 1 I’m going to take a log and bring the result back down to a medium sized number.” Unfortunately computers don’t work that way.

One solution would be to see whether `x`

is so large that `exp(x)`

will overflow, and in that case `f(x)`

could just return `x`

. So our second attempt might look like this:

double f(double x) { return (x > X_MAX) ? x : log(1 + exp(x)); }

I’ll come back later to what `X_MAX`

should be. Suppose you’ve found a good value for this constant and release your code again. Then you get another call. They say your code is returning zero when it should not.

Someone is trying to compute *f*(-50). They know the answer is small, but it shouldn’t be zero. How can that be? Since you just learned about overflow, you suspect the problem might have to do with the opposite problem, underflow. But that’s not quite it. The result of `exp(-50)`

does not underflow; it’s about 2×10^{-22}. But it is so small that machine precision cannot distinguish `1+exp(-50)`

from 1. So the code returns log(1), which equals zero. This may or may not be a problem. The correct answer is very small, so the *absolute* error is small but the *relative* error is 100%. Whether that is a problem depends on what you’re going to do with the result. If you’re going to add it to a moderate size number, no big deal. If you’re going to divide by it, it’s a very big deal.

Now what do you do? You think back to your calculus class and remember that for small *x*, log(1 + *x*) is approximately *x* with error on the order of *x*^{2}. (To see this, expand log(1 + *x*) in a power series centered at 1.) If x is so small that 1 + *x* equals 1 inside the computer, *x*^{2} must be very small indeed. So `f(x)`

could return `exp(x)`

if `exp(x)`

is sufficiently small. This gives our third attempt.

double f(double x) { if (x > X_MAX) return x; if (x < X_MIN) return exp(x); return log(1.0 + exp(x)); }

This is a big improvement. It can still underflow for large negative x, but in that case the function itself is underflowing, not just an intermediate result.

Now what do we make `X_MAX`

and `X_MIN`

? For `X_MAX`

we don’t really have to worry about `exp(x)`

overflowing. We can stop when `x`

is so large that `exp(x) + 1`

equals `exp(x)`

to machine precision. In C there is a header file `float.h`

that contains a constant `DBL_EPSILON`

which is the smallest number ε such that 1 + ε does not equal 1 in machine precision. So it turns out we can set `X_MAX`

equal to `-log(DBL_EPSILON)`

. Similarly we could set `X_MIN `

equal to `log(DBL_EPSILON)`

.

There’s still a small problem. When `x`

is large and negative, but not so negative that `1 + exp(x)`

equals 1 in the computer, we could lose precision in computing `log(1 + exp(x))`

.

I have just posted an article on CodeProject entitled Avoiding Overflow, Underflow, and Loss of Precision that has more examples where the most direct method for evaluating a function may not work. Example 2 in that paper explains why directly computing `log(1 + y)`

can be a problem even if `y`

is not so small that `1 + y`

equals 1 to machine precision. The article explains how to compute `log(1 + y)`

in that case. Setting `y = exp(x)`

explains how to finish the example here when `x`

is moderately large and negative.

## 2 thoughts on “Overflow and loss of precision”