Here is an example fiendishly designed to point out what could go wrong with floating point arithmetic, even with high precision.

I found the following example by Jean-Michel Muller in John Gustafson’s book End of Error. The task is to evaluate the following function at 15, 16, 17, and 9999.

Here *e*(0) is defined by continuity to be 1. That is, we define *e*(0) to be the limit of *e*(*x*) as *x* approaches 0. That limit exists, and it equals 1, so we define *e*(0) to be 1.

If you directly implement the functions above in C++, you will get 0 as the output, whether you use single, double, or even quadruple precision as the following code shows. However, the correct answer in each case is 1.

#include <iostream> #include <math.h> #include <quadmath.h> using namespace std; template <typename T> T e(T x) { // See footnote [1] return x == 0. ? 1. : (exp(x) - 1.)/x; } template <typename T> T q(T x) { return fabs(x - sqrt(x*x + 1.)) - 1./(x + sqrt(x*x + 1.)); } template <typename T> T h(T x) { return e( q(x)*q(x) ); } int main() { int x[] = {15, 16, 17, 9999}; for (int i = 0; i < 4; i++) { cout << h( float(x[i]) ) << endl; cout << h( double(x[i]) ) << endl; cout << h(__float128(x[i]) ) << endl; } }

A little algebra [2] shows that the function *q*(*x*) would return 0 in exact arithmetic, but not in floating point arithmetic. It returns an extremely small but non-zero number, and the numerator of `(exp(x) - 1.)/x`

evaluates to 0.

If `q(x)`

returned exactly zero, `h(x)`

would correctly return 1. Interestingly, if `q(x)`

were a little *less* accurate, returning a little larger value when it should return 0, *h* would be *more* accurate, returning a value close to 1.

I tried replacing `exp(x) - 1`

with `expm1(x)`

. This made no difference. (More on `expm1`

here.)

Incidentally, `bc -l`

gives the right result, no matter how small you set `scale`

.

define ee(x) { if (x == 0) return 1 else return (e(x) - 1)/x } define abs(x) { if (x > 0) return x else return -x } define q(x) { return abs(x - sqrt(x^2 + 1)) - 1/(x + sqrt(x^2 + 1)) } define h(x) { return ee( q(x)^2 ) }

**Update**: See the next post for another example, this time evaluating a polynomial.

## Related posts

[1] Some have suggested that this is a mistake, that you should never test floating point numbers for equality. It’s true that in general you should only test floating point numbers for approximate equality. But the code here is not a mistake; it’s part of the illustration.

[2] Several people have pointed out that you could fix this problem by doing a little algebra. But the point is not to fix it; Muller created this example to show that it is broken. He deliberately obfuscated the problem a little to make a point. This is an artificial example, meant to show in a simple setting the kind of the that could happen in a real problem. Sometimes you may subtract equal or nearly equal things without realizing it.