**Deep learning** has spurred interest in **novel floating point formats**. Algorithms often don’t need as much precision as standard IEEE-754 doubles or even single precision floats. Lower precision makes it possible to hold more numbers in memory, reducing the time spent swapping numbers in and out of memory. Also, low-precision circuits are far less complex. Together these can benefits can give significant speedup.

Here I want to look at **bfloat16**, or BF16 for short, and compare it to 16-bit number formats I’ve written about previously, IEEE and posit. BF16 is becoming a de facto standard for deep learning. It is supported by several deep learning accelerators (such as Google’s TPU), and will be supported in Intel processors two generations from now.

## Bit layout

The BF16 format is sort of a cross between FP16 and FP32, the 16- and 32-bit formats defined in the IEEE 754-2008 standard, also known as half precision and single precision.

BF16 has 16 bits like FP16, but has the same number of exponent bits as FP32. Each number has 1 sign bit. The rest of the bits in each of the formats are allocated as in the table below.

|--------+------+----------+----------| | Format | Bits | Exponent | Fraction | |--------+------+----------+----------| | FP32 | 32 | 8 | 23 | | FP16 | 16 | 5 | 10 | | BF16 | 16 | 8 | 7 | |--------+------+----------+----------|

BF16 has as many bits as a FP16, but as many *exponent* bits as a FP32. The latter makes conversion between BF16 and FP32 simple, except for some edge cases regarding denormalized numbers.

## Precision

The epsilon value, the smallest number ε such that 1 + ε > 1 in machine representation, is 2^{–e} where *e* is the number of fraction bits. BF16 has much less precision near 1 than the other formats.

|--------+------------| | Format | Epsilon | |--------+------------| | FP32 | 0.00000012 | | FP16 | 0.00390625 | | BF16 | 0.03125000 | |--------+------------|

## Dynamic range

The dynamic range of bfloat16 is similar to that of a IEEE single precision number. Relative to FP32, BF16 sacrifices precision to retain range. Range is mostly determined by the number of exponent bits, though not entirely.

Dynamic range in decades is the log base 10 of the ratio of the largest to smallest representable positive numbers. The dynamic ranges of the numeric formats are given below. (Python code to calculate dynamic range is given here.)

|--------+-------| | Format | DR | |--------+-------| | FP32 | 83.38 | | BF16 | 78.57 | | FP16 | 12.04 | |--------+-------|

## Comparison to posits

The precision and dynamic range of posit numbers depends on how many bits you allocate to the maximum exponent, denoted *es* by convention. (Note “maximum.” The number of exponent bits varies for different numbers.) This post explains the anatomy of a posit number.

Posit numbers can achieve more precision and more dynamic range than IEEE-like floating point numbers with the same number of bits. Of course there’s no free lunch. Posits represent large numbers with low precision and small numbers with high precision, but this trade-off is often what you’d want.

For an *n*-bit posit, the number of fraction bits near 1 is *n* – 2 – *es* and so epsilon is 2 to the exponent *es* – *n* – 2. The dynamic range is

which is derived here. The dynamic range and epsilon values for 16-bit posits with *es* ranging from 1 to 4 are given in the table below.

|----+--------+-----------| | es | DR | epsilon | |----+--------+-----------| | 1 | 16.86 | 0.0000076 | | 2 | 33.82 | 0.0000153 | | 3 | 37.43 | 0.0000305 | | 4 | 143.86 | 0.0000610 | |----+--------+-----------|

For all the values of *es* above, a 16-bit posit number has a smaller epsilon than either FP16 or BF16. The dynamic range of a 16-bit posit is larger than that of a FP16 for all values of *es*, and greater than BF16 and FP32 when *es* = 4.