Floating point: Everything old is new again

In the early days of computing hardware (and actually before) mathematicians put a lot of effort into understanding and mitigating the limitations of floating point arithmetic. They would analyze mundane tasks such as adding a list of numbers and think carefully about the best way to carry out such tasks as accurately as possible.

Now that most arithmetic is carried out in double precision, you can often get away with not thinking about such things. Except when you can’t. The vagaries of floating point computation still matter occasionally, even with double precision, though not as often as they did with single precision.

Although most computing has moved from single precision to double precision, there is increasing interest in going the opposite direction, from single precision to half precision. The main driver is neural networks. You don’t need a lot of precision in weights, and you’ve got a lot of numbers to store. So instead of taking 64 bits to store a double precision number, or 32 bits to store a single precision number. you might want to use a 16 bit or even 8 bit floating point number. That way you can fit more weights in memory at once.

However, when you move to lower precision numbers, you now have to think again about the things numerical analysts thought about a couple generations ago, such as different ways of rounding. You might think that floating point rounding could be modeled by random variables. If so, you’re in good company, because John von Neumann suggested this in 1947. But a few years later people began to realize that floating point rounding errors are not random. Or to be more precise, they began to realize that modeling rounding errors as random was inadequate; of course they knew that rounding errors weren’t literally random.

But what it rounding errors were random? This would lead to more error cancellation than we see in practice with floating point arithmetic. With stochastic rounding, the rounded values become unbiased estimators of the values they would like to represent but cannot represent exactly. Now the central limit theorem and all that come to your aid. More on applications of stochastic rounding here.

(To be pedantic a moment, stochastic rounding isn’t truly random, but uses pseudorandom numbers to implement a procedure which is well modeled by randomness. Random is as random does.)

Related posts

Leave a Reply

Your email address will not be published. Required fields are marked *