Floating point example by Rump

The previous post looked at an example of strange floating point behavior taking from book End of Error. This post looks at another example.

This example, by Siegfried Rump, asks us to evaluate

333.75 y⁶ + x² (11 x² y² – y⁶ – 121 y⁴ – 2) + 5.5 y⁸ + x/(2y)

at x = 77617 and y = 33096.

Here we evaluate Rump’s example in single, double, and quadruple precision.

#include <iostream>
#include <quadmath.h>

using namespace std;

template <typename T> T f(T x, T y) {
    T x2 = x*x;
    T y2 = y*y;
    T y4 = y2*y2;
    T y6 = y2*y4;
    T y8 = y2*y6;
    return 333.75*y6 + x2*(11*x2*y2 - y6 - 121*y4 - 2) + 5.5*y8 + x/(2*y);
}

int main() {  

    int x = 77617;
    int y = 33096;
    
    cout << f<float>(x, y) << endl;
    cout << f<double>(x, y) << endl;
    cout << (double) f<__float128>(x, y) << endl;

}

This gives three answers,

-1.85901e+030
-1.18059e+021
1.1726

none of which is remotely accurate. The exact answer is -54767/66192 = -0.827…

Python gives the same result as C++ with double precision, which isn’t surprising since Python floating point numbers are C doubles under the hood.

Where does the problem come from? The usual suspect: subtracting nearly equal numbers. Let’s split Rump’s expression into two parts.

s = 333.75 y⁶ + x²(11x²y² – y⁶ – 121y⁴ – 2)

t = 5.5y⁸ + x/(2y)

and look at them separately. We get

s = -7917111340668961361101134701524942850.00         
t =  7917111340668961361101134701524942849.1726...

The values –s and t agree to 36 figures, but quadruple precision has only 34 figures [1]. Subtracting these two numbers results in catastrophic loss of precision.

Rump went to some effort to conceal what’s going on, and the example is contrived to require just a little more than quad precision. However, his example illustrates things that come up in practice. For one, the values of x and y are not that big, and so we could be mislead into thinking the terms in the polynomial are not that big. For another, it’s not clear that you’re subtracting nearly equal numbers. I’ve been caught off guard by both of these in real applications.

Floating point numbers are a leaky abstraction. They can trip you up if you don’t know what you’re doing. But the same can be said of everything in computing. We can often ignore finite precision just as we can often ignore finite memory, for example. The proper attitude toward floating point numbers is somewhere between blissful ignorance and unnecessary fear.

5 thoughts on “Just evaluating a polynomial: how hard could it be?”

Chris McBrien

12 November 2019 at 09:33

I love these posts. Keep them coming.
David J. Littleboy

13 November 2019 at 08:21

Hmm. That expression is simple enough that, if you had an arbitrary precision integer library*, you could calculate the result exactly. (You’d need to multiply by 100 to make the coefficients integers and do something sensible with the x/2y term.)

Of course, that lacks generality something fierce, but generality wasn’t the question. And the intermediate results wouldn’t be all that big, 30 digits or so.

*: Memory has it that certain premodern LISP implementations (e.g. MACLISP) did have infinite precision integer arithmetic, so I’d think that one of the zillions of modern languages would too…
Wen Chen

13 November 2019 at 09:29

Hi John,

There is a tool called Herbie that can be used to detect when things may go awry with floating point expressions. Have you try to see if some of these examples with it?

https://herbie.uwplse.org/
Benoît St-Jean

24 November 2019 at 01:47

There’s an easy way to do this in Smalltalk : just use fractions to keep the necessary precision and end up with the exact result! My blog post: https://endormitoire.wordpress.com/2019/11/24/smalltalk-et-les-nombres/
quark

6 December 2019 at 06:22

Cool example, which tricks one to think double or even long double precision is enough, but it isn’t!
So, I tried this example using multiple-precision GNU MP library:

mpf_class mp_x(77617);
mpf_class mp_y(33096);

std::cout << f(mp_x, mp_y) << std::endl;

And it outputs the correct answer: -0.827396060.

Thanks for your post!

Comments are closed.

Similar posts

5 thoughts on “Just evaluating a polynomial: how hard could it be?”