Floating point numbers are a leaky abstraction

Joel Spolsky coined the term leaky abstraction for programming concepts that usually shield you from messy details but sometimes break down. A perfect abstraction is a black box that you never have to open. A leaky abstraction is a black box that you have to open up occasionally.

Floating point numbers, the computer representations of real numbers,  are leaky abstractions. They work remarkably well: you can usually pretend that a floating point type is a mathematical real number. But sometimes you can’t. The abstraction leaks, though not very often.

Most explanations I’ve heard for the limitations of machine numbers are pedantic. “There are only a finite number of floating point numbers so they can’t represent real numbers well.” That’s not much help. It doesn’t explain why floating point numbers actually do represent real numbers sufficiently well for most applications, and it doesn’t suggest where the abstraction might leak.

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10308, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton’s Law of Gravity is only known to six significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton’s gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations — addition, multiplication, division — are very accurate. As long as you don’t overflow or underflow,  these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate.  If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations. For an example, see this post on calculating standard deviation. See also computing derivatives, the second example in  Five Tips for Floating Point Programming.

What about overflow or underflow? When do you need numbers bigger than 10308? Often you don’t. But in probability calculations, for example, you need them all the time unless you’re clever. It’s common in probability to compute a medium-sized number that is the product of an astronomically large number and an infinitesimally small number. The final result fits into a computer just fine, but the intermediate numbers might not due to overflow or underflow. For example, the maximum floating point number on most computers is somewhere between 170 factorial and 171 factorial. Such large factorials often appear in applications, often in ratios with other large factorials. See Avoiding Overflow, Underflow, and Loss of Precision for tricks on how to work with factorials that would overflow if computed directly.

Often you can afford to be blissfully ignorant of the details of floating point arithmetic, but sometimes you cannot. A great place to learn more is David Goldberg’s paper What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Update: See follow-up post, Anatomy of a floating point number.

Related posts:

How to compute binomial probabilities
NaN, 1.#IND, 1.#INF, and all that

24 thoughts on “Floating point numbers are a leaky abstraction

  1. Don’t forget a Bayesian classifier — it involves multiplying large numbers of small probabilities, then normalizing them. After getting strange results, and switching to BigDecimal, I learned that the usual solution is to do all calculations in log space.

  2. Not to mention the problems that arise because we think in decimal but compute in binary. Does 0.1 + 0.2 = 0.3? No, not in binary floats, it doesn’t.

  3. David Eisner: I updated the links to point to an updated version of Goldberg’s paper. Thanks for pointing out the error.

  4. I disagree with your idea that floating-point is supposed to represent “real numbers”. They represent rational numbers.

    If you want to approximate reals with rationals, that’s your choice, but you should at least understand the difference, and why you can get into trouble thereby.

  5. Jack: Every floating point number does represent a rational number, so I could see your point. But I’d still say that floating point numbers are trying to approximate real numbers. Arithmetic operations are designed to be approximately correct in the analytical sense of the real line, not exactly correct in the algebraic sense of a quotient field. I think fixed-point numbers are a better model for rational numbers.

  6. I’m surprised to not see the “move to the logspace” basic trick… which is to compute exp(sum log x_i) in place of prod(x_i). Works great for probabilities – they tend to be from some distribution and then central limit works for you…
    Also normalizing small (or large) numbers to sum up to, say, 1.0 is better accomplished by first subtracting the max of all the logarithms.

  7. Tomas: I agree that the “move to the logspace” trick is very important. I talk about it here in the context of computing binomial probabilities.

  8. Also, operations with floating point numbers are not pure functional on different machines/compilers.

  9. There’s a whole discipline — numerical computation — about calculating with floating point numbers efficiently and maximizing both accuracy and precision. A lot of the hard work is done with pencil and paper analysis before any code is written. The techniques discussed above are just a few of the things done in numerical analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>