Sequence alignment

In my previous post I illustrated the Levenshtein edit distance by comparing the opening paragraphs of Finnegans Wake by James Joyce and a parody by Adam Roberts.

In this post I’ll show how to align two sequences using the sequence alignment algorithms of Needleman-Wunsch and Hirschberg. These algorithms can be used to compare any sequences, though they are more often used to compare DNA sequences than impenetrable novels and parodies.

I’ll be using Gang Li’s implementation of these algorithms, available on github. I believe the two algorithms are supposed to produce the same results, that Hirschberg’s algorithm is a more space-efficient implementation of the Needleman-Wunsch algorithm, though the two algorithms below produce slightly different results. I’ll give the output of Hirschberg’s algorithm.

Li’s alignment code uses lists of characters for input and output. I wrote a simple wrapper to take in strings and output strings.

    from alignment import Needleman, Hirschberg

    def compare(str1, str2):
        seq1 = list(str1)
        seq2 = list(str2)
        for algorithm in [Needleman(), Hirschberg()]:
            a, b = algorithm.align(seq1, seq2)
            print("".join(a))
            print("".join(b))
            print()

The code inserts vertical bars to indicate spaces added for alignment. Here’s the result of using the Needleman-Wunsch algorithm on the opening paragraphs of Finnegans Wake and the parody Finnegans Ewok.

    |||riverrun, past Ev|e| and Adam'||||s,
    mov|i|er|un, past ||new and |||||hopes,

    from swe|rv||e of shore|||| to bend of
    from s||tr|ike of |||||back to bend of

    b|||ay, brings us by a commodius
    |jeday, brings us by a commodius

    vic|u||s of recirculation back to
    |||lucas of recirculation back to

    H|owth Ca||stle|||| and E|nvi||r|ons.
    |fo||||||rest||moon and |en||dor.||||

I mentioned in my previous post that I could compare the first four paragraphs easily, but I had some trouble aligning the fifth paragraphs. The fifth paragraphs of each version start out quite simiar:

    Bygme||ster Fi|nnega||n, of the 
    Bygm|onster ||Ann||akin, of the 
    
    Stutte||r|||||||ing Hand, f|re|emen'|s
    ||||||Throatchokin| Hand, for|cemen|’s
    
    mau-rer, lived in the broadest way
    mau-rer, lived in the broadest way
    
    immarginable in his rushlit
    immarginable in his rushlit
    
    toofar-|||back for messuages before
    toofar| — back for messuages before 

but then Roberts takes the liberty of skipping over a large section of the original. This is what I suspected by looking at the two texts, but Hirschberg’s algorithm makes the edits obvious by showing two long sequences of vertical bars, one about 600 characters long and another about 90 characters long.

Levenshtein distance from Finnegans Wake to Return of the Jedi

Ewok

I ran into a delightfully strange blog post today called Finnegans Ewok that edits the first few paragraphs of Finnegans Wake to make it into something like Return of the Jedi.

The author, Adam Roberts, said via Twitter “What I found interesting here was how little I had to change Joyce’s original text. Tweak a couple of names and basically leave it otherwise as was.”

So what I wanted to do is quantify just how much had to change using the Levenshtein distance, which is essentially the number of one-character changes necessary to transform one string into another.

Here’s the first paragraph from James Joyce:

riverrun, past Eve and Adam’s, from swerve of shore to bend of bay, brings us by a commodius vicus of recirculation back to Howth Castle and Environs.

And here’s the first paragraph from Adam Roberts:

movierun, past new and hopes, from strike of back to bend of jeday, brings us by a commodius lucas of recirculation back to forestmoon and endor.

The original paragraph is 150 characters, the parody is 145 characters, and the Levenshtein distance is 44.

Here’s a summary of the results for the first four paragraphs.

    |-------+---------+----------|
    | Joyce | Roberts | Distance |
    |-------+---------+----------|
    |   150 |     145 |       44 |
    |   700 |     727 |      119 |
    |   594 |     615 |      145 |
    |  1053 |     986 |      333 |
    |-------+---------+----------|

The fifth paragraph seems to diverge more from Joyce. I maybe have gotten something misaligned, and reading enough of Finnegans Wake to debug the problem made my head hurt, so I stopped.

Update: See the next post for sequence alignment applied to the two sources. This lets you see not just the number of edits but what the edits are. This show why I was having difficulty aligning the fifth paragraphs.

Related posts

Comparing bfloat16 range and precision to other 16-bit numbers

server rack

Deep learning has spurred interest in novel floating point formats. Algorithms often don’t need as much precision as standard IEEE-754 doubles or even single precision floats. Lower precision makes it possible to hold more numbers in memory, reducing the time spent swapping numbers in and out of memory. Also, low-precision circuits are far less complex. Together these can benefits can give significant speedup.

Here I want to look at bfloat16, or BF16 for short, and compare it to 16-bit number formats I’ve written about previously, IEEE and posit. BF16 is becoming a de facto standard for deep learning. It is supported by several deep learning accelerators (such as Google’s TPU), and will be supported in Intel processors two generations from now.

Bit layout

The BF16 format is sort of a cross between FP16 and FP32, the 16- and 32-bit formats defined in the IEEE 754-2008 standard, also known as half precision and single precision.

BF16 has 16 bits like FP16, but has the same number of exponent bits as FP32. Each number has 1 sign bit. The rest of the bits in each of the formats are allocated as in the table below.

    |--------+------+----------+----------|
    | Format | Bits | Exponent | Fraction |
    |--------+------+----------+----------|
    | FP32   |   32 |        8 |       23 |
    | FP16   |   16 |        5 |       10 |
    | BF16   |   16 |        8 |        7 |
    |--------+------+----------+----------|

BF16 has as many bits as a FP16, but as many exponent bits as a FP32. The latter makes conversion between BF16 and FP32 simple, except for some edge cases regarding denormalized numbers.

Precision

The epsilon value, the smallest number ε such that 1 + ε > 1 in machine representation, is 2e where e is the number of fraction bits. BF16 has much less precision near 1 than the other formats.

    |--------+------------|
    | Format |    Epsilon |
    |--------+------------|
    | FP32   | 0.00000012 |
    | FP16   | 0.00390625 |
    | BF16   | 0.03125000 |
    |--------+------------|

Dynamic range

The dynamic range of bfloat16 is similar to that of a IEEE single precision number. Relative to FP32, BF16 sacrifices precision to retain range. Range is mostly determined by the number of exponent bits, though not entirely.

Dynamic range in decades is the log base 10 of the ratio of the largest to smallest representable positive numbers. The dynamic ranges of the numeric formats are given below. (Python code to calculate dynamic range is given here.)

    |--------+-------|
    | Format | DR    |
    |--------+-------|
    | FP32   | 83.38 |
    | BF16   | 78.57 |
    | FP16   | 12.04 |
    |--------+-------|

Comparison to posits

The precision and dynamic range of posit numbers depends on how many bits you allocate to the maximum exponent, denoted es by convention. (Note “maximum.” The number of exponent bits varies for different numbers.) This post explains the anatomy of a posit number.

Posit numbers can achieve more precision and more dynamic range than IEEE-like floating point numbers with the same number of bits. Of course there’s no free lunch. Posits represent large numbers with low precision and small numbers with high precision, but this trade-off is often what you’d want.

For an n-bit posit, the number of fraction bits near 1 is n – 2 – es and so epsilon is 2 to the exponent es – n – 2. The dynamic range is

\log_{10}\left( 2^{2^{\text{\small\emph{es}}} } \right)^{2n-4} = (2n-4)2^{es}\log_{10}2

which is derived here. The dynamic range and epsilon values for 16-bit posits with es ranging from 1 to 4 are given in the table below.

    |----+--------+-----------|
    | es |     DR |   epsilon |
    |----+--------+-----------|
    |  1 |  16.86 | 0.0000076 |
    |  2 |  33.82 | 0.0000153 |
    |  3 |  37.43 | 0.0000305 |
    |  4 | 143.86 | 0.0000610 |
    |----+--------+-----------|

For all the values of es above, a 16-bit posit number has a smaller epsilon than either FP16 or BF16. The dynamic range of a 16-bit posit is larger than that of a FP16 for all values of es, and greater than BF16 and FP32 when es = 4.

Related posts

What is proof-of-work?

The idea of proof of work (PoW) was first explained in a paper Cynthia Dwork and Moni Naor [1], though the term “proof of work” came later [2]. It was first proposed as a way to deter spam, but it’s better known these days through its association with cryptocurrency.

If it cost more to send email, even a fraction of a cent per message, that could be enough to deter spammers. So suppose you want to charge anyone $0.005 to send you an email message. You don’t actually want to collect their money, you just want proof that they’d be willing to spend something to email you. You’re not even trying to block robots, you just want to block cheap robots.

So instead of asking for a micropayment, you could ask the sender to solve a puzzle, something that would require around $0.005 worth of computing resources. If you’re still getting too much spam, you could increase your rate and by giving them a harder puzzle.

Dwork and Naor list several possible puzzles. The key is to find a puzzle that takes a fair amount of effort to solve but the solution is easy to verify.

Bitcoin uses hash problems for proof-of-work puzzles. Cryptographic hash functions are difficult to predict, and so you can’t do much better than brute force search if you want to come up with input whose hashed value has a specified pattern.

The goal is to add a fixed amount of additional text to a message such that when the hash function is applied, the resulting value is in some narrow range, such as requiring the first n bits to be zeros. The number n could be adjusted over time as needed to calibrate the problem difficulty. Verifying the solution requires computing only one hash, but finding the solution requires computing 2n hashes on average.

Related posts

[1] Cynthia Dwork and Noni Naor (1993). “Pricing via Processing, Or, Combatting Junk Mail, Advances in Cryptology”. CRYPTO’92: Lecture Notes in Computer Science No. 740. Springer: 139–147.

[2] Markus Jakobsson and Ari Juels (1999). “Proofs of Work and Bread Pudding Protocols”. Communications and Multimedia Security. Kluwer Academic Publishers: 258–272.

Technological context

As you read this blog post, you probably have a hierarchy of contexts in the back of your mind. It comes so naturally to you that you don’t consciously think of it.

If you’re reading this in a web browser, you probably know what browser you’re using. If not, you’re at least aware that you are using a browser, even if you forget momentarily which one you have open. And you probably know what operating system is hosting your browser. You understand that you are reading a page on my site, that this page is not your browser, but content hosted by your browser. If you’ve subscribed via RSS or email, you know what email or RSS client you’re using and understand how this post is organized with respect to your other content.

Some people do not have this kind of context. Anything on their computer is simply “The Computer.” They don’t really understand what an operating system, web browser, or email client are. And they don’t need to know, most of the time. They can get their work done, but then occasionally they have inexplicable problems.

I’m not saying this to criticize or make fun of the people who don’t have the kind of technological context I’m talking about. It’s a remarkable achievement that software has gotten so easy to use that people can get along without knowing much technological context. But if this sort of thing is second nature to you, you might have a hard time understanding how a large number of people work.

You probably take it for granted that you can access the same web site from different computers. Some people do not. Their desktop at work is one computer, and their iPhone is a different computer. They don’t really understand what a web site is.

I know what a web browser is because I have been using computers since before there was a web. Old timers know what various technological components are because they’ve seen them develop. And “digital natives” know to get things done because they’ve grown up with computers, though their gaps in context show occasionally. Seems like the people in the middle would have the hardest time, not having grown up with the technology but not having watched it develop either.

I’m writing this because I’m becoming increasingly aware of how difficult life can be for people who don’t have adequate mental models for technology. I imagine most of my readers are tech savvy, and may have a hard time seeing some of the same things that I’ve had a hard time seeing, that a lot of people don’t understand things we take for granted.

Source http://dilbert.com/strip/1995-06-24

It used to be that anybody who used a computer had to know some basic things. If you were a Unix user a generation ago, you might not know anything about the internals of Unix, but you at least knew that you were a Unix user. There were wizards and mere mortals, but the two groups shared more context than the most tech savvy and least tech savvy share today.

It’s good that people don’t need to know as much context, but occasionally it produces bewildering situations, both for the user and the person trying to help them.

Windows command line tips

I use Windows, Mac, and Linux, each for different reasons. When I run Windows, I like to have a shell that works sorta like bash, but doesn’t run in a subsystem. That is, I like to have the utility programs and command editing features that I’m used to from bash on Mac or Linux, but I want to run native Windows code and not a little version of Linux hosted by Windows. [1]

It took a while to find something I’m happy with. It’s easier to find Linux subsystems like Cygwin. But cmder does exactly what I want. It’s a bash-like shell running on Windows. I also use Windows ports of common Unix utilities. Since these are native Windows programs, I can run them and other Windows applications in the same environment. No error messages along the lines of “I know it looks like you’re running Windows, but you’re not really. So you can’t open that Word document.”

I’ve gotten Unix-like utilities for Windows from several sources. GOW (Gnu on Windows) is one source. I’ve also collected utilities from other miscellaneous sources.

Tab completion and file association

There’s one thing that was a little annoying about cmder: tab completion doesn’t work if you want to enter a file name. For example, if you want to open a Word document foo.docx from the basic Windows command prompt cmd.exe, you can type fo followed by the tab key and the file will open if foo.docx is the first file in your working directory that begins with “fo.”

In cmder, tab completion works for programs first, and then for file names. If you type in fo followed by the tab key, cmder will look for an application whose name starts with “fo.” If you wanted to move foo.docx somewhere, you could type mv fo and tab. In this context, cmder knows that “fo” is the beginning of a file name.

On Mac, you use the open command to open a file with its associated application. For example, on the Mac command line you’d type open foo.docx rather than just foo.docx to open the file with Word.

If there were something like open on Windows, then tab completion wold work in cmder. And there is! It’s the start command. In fact, if you’re accustomed to using open on Mac, you could alias start to open on Windows [2]. So in cmder, you can type start fo and hit tab, and get tab completion for the file name.

Miscellaneous

The command assoc shows you which application is associated with a file extension. (Include the “.” when using this command. So you’d time assoc .docx rather than assoc docx.

You can direct Windows shell output to the clip command to put the output onto the Windows clipboard.

The control command opens the Windows control panel.

This post shows how to have branching logic in an Emacs config file so you can use the same config file across operating systems.

Related posts

[1] Usually on Windows I want to run Windows. But if I do want to run Linux without having to go to another machine, I use WSL (Windows Subsystem for Linux) and I can use it from cmder. Since cmder supports multiple tabs, I can have one tab running ordinary cmd.exe and another tab running bash on WSL.

[2] In the directory containing cmder.exe, edit the file config/user-aliases.cmd. Add a line open=start $1.

Does computer science help you program?

The relationship between programming and computer science is hard to describe. Purists will say that computer science has nothing to do with programming, but that goes too far.

Computer science is about more than programming, but it’s is all motivated by getting computers to do things. With few exceptions. students major in computer science in college with the intention of becoming programmers.

I asked on Twitter yesterday how helpful people found computer science in writing software.

In a follow up tweet I said “For this poll, basic CS would be data structures and analysis of algorithms. Advanced CS is anything after that.”

So about a quarter didn’t find computer science useful, but the rest either expected it to be useful or at least found the basic theory useful.

I suspect some of those who said they haven’t found (advanced) CS theory useful don’t know (advanced) CS theory. This isn’t a knock on them. It’s only the observation that you can’t use what you aren’t at least aware of. In fact, you may need to know something quite well before you can recognize an opportunity to use it. (More on that here.)

Many programmers are in jobs where they don’t have much need for computer science theory. I thought about making that a possible choice, something like “No, but I wish I were in a job that could use more theory.” Unfortunately Twitter survey responses have to be fairly short.

Of course this isn’t a scientific survey. (Even supposedly scientific surveys aren’t that great.) People who follow the CompSciFact twitter account have an interest in computer science. Maybe people who had strong feelings about CS, such as resentment for having to study something they didn’t want to or excitement for being able to use something they find interesting, were more likely to answer the question.

 

Making a career out of the chain rule

When I was a teenager, my uncle gave me a calculus book and told me that mastering calculus was the most important thing I could do for starting out in math. So I learned the basics of calculus from that book. Later I read Michael Spivak’s two calculus books. I took courses that built on calculus, and studied generalizations of calculus such as calculus with complex variables, calculus in Banach spaces, etc. I taught calculus. After a while, I started to feel maybe I’d mastered calculus.

Last year I started digging into automatic differentiation and questioned whether I really had mastered calculus. At a high level, automatic differentiation is “just” an application of the chain rule in many variables. But you can make career out of exploring automatic differentiation (AD), and many people do. The basics of AD are not that complicated, but you can go down a deep rabbit hole exploring optimal ways to implement AD in various contexts.

You can make a career out of things that seem even simpler than AD. Thousands of people have made a career out of solving the equation Ax = b where A is a large matrix and the vector b is given. In high school you solve two equations in two unknowns, then three equations in three unknowns, and in principle you could keep going. But you don’t solve a sparse system of millions of equations the same way. When you consider efficiency, accuracy, limitations of computer arithmetic, parallel computing, etc. the humble equation Ax = b is not so simple to solve.

As Richard Feynman said, nearly everything is really interesting if you go into it deeply enough.

Related posts:

Rise and fall of the Windows Empire

This morning I ran across the following graph via Horace Dediu.

Computer market share

I developed Windows software during the fattest part of the Windows curve. That was a great time to be in the Windows ecosystem.

Before that I was in an academic bubble. My world consisted primarily of Macs and various flavors of Unix. I had no idea that my world was a tiny minority. I had a PC at home, but mostly used it as a terminal to connect to remote Unix machines, and didn’t realize that Windows was so dominant.

When I found out I’d been very wrong in my perception of market share, I determined not to be naive again. Ever since then I regularly look around to keep an eye on the landscape.

The graph above combines desktop and mobile computers, and you may or may not think that’s appropriate. Relative to the desktop market, Windows remains dominant, but the desktop market share itself has shrunk, not so much in absolute size but relative to total computer market.

Last time I looked, about 70% of the traffic to this web site comes from desktops. I still consider the desktop the place for “real work,” and many people feel the same way.

It’s conventional to say the Roman Empire fell in 476 AD, but this would have been a surprise to those living in the eastern half of the empire who considered themselves to be Roman. The eastern (Byzantine) empire continued for another thousand years after the western empire fell.

The Windows empire hasn’t fallen, but has changed. Microsoft is doing well, as far as I know. I don’t keep up with Microsoft as much as I used to. I have no animosity toward Microsoft, but I’m no longer doing the kind of work their tools are intended for. I still use Windows—I’m writing this blog post from my Windows 10 machine—though I also use Mac, Linux, iOS, and Android.

 

The quadratic formula and low-precision arithmetic

What could be interesting about the lowly quadratic formula? It’s a formula after all. You just stick numbers into it.

Well, there’s an interesting wrinkle. When the linear coefficient b is large relative to the other coefficients, the quadratic formula can give wrong results when implemented in floating point arithmetic.

Quadratic formula and loss of precision

The quadratic formula says that the roots of

ax^2 + bx + c = 0

are given by

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

That’s true, but let’s see what happens when we have ac = 1 and b = 108.

    from math import sqrt

    def quadratic(a, b, c):
        r = sqrt(b**2 - 4*a*c)
        return ((-b + r)/(2*a), (-b -r)/(2*a))

    print( quadratic(1, 1e8, 1) )

This returns

    (-7.450580596923828e-09, -100000000.0)

The first root is wrong by about 25%, though the second one is correct.

What happened? The quadratic equation violated the cardinal rule of numerical analysis: avoid subtracting nearly equal numbers. The more similar two numbers are, the more precision you can lose from subtracting them. In this case √(b² – 4ac) is very nearly equal to b.

If we ask Python to evaluate

    1e8 - sqrt(1e16-4)

we get 1.49e-8 when the correct answer would be 2.0e-8.

Improving precision for quadratic formula

The way to fix the problem is to rationalize the numerator of the quadratic formula by multiplying by 1 in the form

\frac{-b \mp \sqrt{b^2 - 4ac}}{-b \mp \sqrt{b^2 - 4ac}}

(The symbol ∓ is much less common than ±. It must means that if you take the the + sign in the quadratic formula, take the – sign above, and vice versa.)

When we multiply by the expression above and simplify we get

\frac{2c}{-b \mp \sqrt{b^2 - 4ac}}

Let’s code this up in Python and try it out.

    def quadratic2(a, b, c):
        r = sqrt(b**2 - 4*a*c)
        return (2*c/(-b - r), 2*c/(-b+r))

    print( quadratic2(1, 1e8, 1) )

This returns

    (-1e-08, -134217728.0)

So is our new quadratic equation better? It gives the right answer for the first root, exact to within machine precision. But now the second root is wrong by 34%. Why is the second root wrong? Same reason as before: we subtracted two nearly equal numbers!

The familiar version of the quadratic formula computes the larger root correctly, and the new version computes the smaller root correctly. Neither version is better overall. We’d be no better off or worse off always using the new quadratic formula than the old one. Each one is better when it avoids subtracting nearly equal numbers.

The solution is to use both quadratic formulas, using the appropriate one for the root you’re trying to calculate.

Low precision arithmetic

Is this a practical concern? Yes, and here’s why: Everything old is new again.

The possible inaccuracy in the quadratic formula was serious before double precision (64-bit floating point) arithmetic became common. And back in the day, engineers were more likely to be familiar with the alternate form of the quadratic formula. You can still run into quadratic equations that give you trouble even in double precision arithmetic, like the example above, but it’s less likely to be a problem when you have more bits at your disposal.

Now we’re interested in low-precision arithmetic again. CPUs have gotten much faster, but moving bits around in memory has not. Relative to CPU speed, memory manipulation has gotten slower. That means we need to be more concerned with memory management and less concerned about floating point speed.

Not only is memory juggling slower relative to CPU, it also takes more energy. According to Gustafson, reading 64 bits from DRAM requires 65 times as much energy as doing a floating point combined multiply-add because it takes place off-chip. The table below, from page 6 of Gustafson’s book, gives the details. Using lower precision floating point saves energy because more numbers can be read in from memory in the same number of bits. (Here pJ = picojoules.)

Operation Energy consumedWhere
Perform a 64-bit floating point multiply-add64 pJon-chip
Load or store 64 bits of register data6 pJon-chip
Read 64 bits from DRAM4200 pJoff-chip

So we might be interested in low-precision arithmetic to save energy in a battery powered mobile device, or to save clock cycles in a server application manipulating a lot of data. This means that the numerical tricks that most people have forgotten about are relevant again.

Related posts

Viability of unpopular programming languages

I said something about Perl 6 the other day, and someone replied asking whether anyone actually uses Perl 6. My first thought was I bet more people use Perl 6 than Haskell, and it’s well known that people use Haskell. I looked at the TIOBE Index to see whether that’s true. I won’t argue how well the index measures popularity, but for this post I’ll assume it’s a good enough proxy.

TIOBE doesn’t separate out variations on Perl [1]. What it calls Perl is 16th on the list this year, while Haskell comes in at 42nd. A few of the more obscure languages that TIOBE ranks higher than Haskell are Scratch, D, ABAP, Apex, and PL/I. Haskell has better public relations than all these languages.

There’s a lot more to viability than just popularity, though popularity matters. More users means more people to find bugs, write libraries, develop tools, answer questions, write tutorials, etc. But the benefit of community size is not linear. It goes through a sort of logistic S-curve. There’s some threshold size where the community is large enough for a language to be viable. And somewhere above that threshold you start hitting diminishing return.

It’s interesting to look at some of the languages currently less popular than Haskell but more familiar: Common Lisp (63), Erlang (66), and F# (67). These show that popularity isn’t everything.

Common Lisp has been around since 1982, and was standardizing a language that had been in development since 1958. Erlang has been around since 1986. These languages have many of the benefits of popularity listed above, accumulated over time.

There is not a huge community devoted specifically to F#, but it shares tooling and libraries with C#, the 5th language on the list. (Maybe the number of F# developers is underestimated because F# is so closely related to C#, not syntactically but in terms of infrastructure.)

Common Lisp, Erlang, and F# would all be safer bets for a production software project than several more popular languages.

Related posts:

[1] At least I don’t think they do. TIOBE does separate out some versions of Lisp as separate languages. It’s possible they do consider Perl 6 a separate language that didn’t make the top rankings.

Larry Wall deliberately introduced many natural language principles in Perl. It seems that one feature that Perl has in common with natural languages is controversy over when two dialects of a language are sufficiently different to be considered separate languages. Advocates consider Perl 6 to be a separate language but outside observers, like TIOBE, may not.

 

Eight-bit floating point

Researchers have discovered that for some problems, deep neural networks (DNNs) can get by with low precision weights. Using fewer bits to represent weights means that more weights can fit in memory at once. This, as well as embedded systems, has renewed interest in low-precision floating point.

Microsoft mentioned its proprietary floating point formats ms-fp8 and ms-fp9 in connection with its Brainwave Project [1]. I haven’t been able to find any details about these formats, other than that they use two- and three-bit exponents (respectively?).

This post will look at what an 8-bit floating point number would look like if it followed the pattern of IEEE floats or posit numbers. In the notation of the previous post, we’ll look at ieee<8,2> and posit<8,0> numbers. (Update: Added a brief discussion of ieee<8,3>, ieee<8,4>, and posit<8,1> at the end.)

Eight-bit IEEE-like float

IEEE floating point reserves exponents of all 0’s and all 1’s for special purposes. That’s not as much of a high price with large exponents, but with only four possible exponents, it seems very wasteful to devote half of them for special purposes. Maybe this is where Microsoft does something clever. But for this post, we’ll forge ahead with the analogy to larger IEEE floating point numbers.

There would be 191 representable finite numbers, counting the two representations of 0 as one number. There would be two infinities, positive and negative, and 62 ways to represent NaN.

The smallest non-zero number would be

2-5 = 1/32 = 0.03125.

The largest value would be 01011111 and have value

4(1 – 2-5) = 31/8 = 3.3875.

This makes the dynamic range just over two decades.

Eight-bit posit

A posit<8, 0> has no significand, just a sign bit, regime, and exponent. But in this case the useed value is 2, and so the range acts like an exponent.

There are 255 representable finite numbers and one value corresponding to ±∞.

The smallest non-zero number would be 1/64 and the largest finite number would be 64. The dynamic range is 3.6 decades.

Update: Here is a list of all possible posit<8,0> numbers.

Distribution of values

The graphs below give the distribution of 8-bit IEEE-like numbers and 8-bit posits on a log scale.

eight bit IEEE distribution

eight bit posit distribution

The distribution of IEEE-like numbers is asymmetric because much of the dynamic range comes from denormalized numbers.

The distributions of posits is approximately symmetrical. If a power of 2 is representable as a posit, so is its reciprocal. But you don’t have perfect symmetry because, for example, 3/2 is representable while 2/3 is not.

Other eight-bit formats

I had originally considered a 2-bit significand because Microsoft’s ms-fp8 format has a two-bit significand. After this post was first published it was suggested in the comments that an ieee<8, 4> float might be better than ieee<8, 2>, so let’s look at that. Let’s look at ieee<8, 3> too while we’re at it. And a posit<8, 1> too.

An ieee<8, 3> floating point number would have a maximum value of 7 and a minimum value of 2-6 = 1/64, a dynamic range of  2.7 decades. It would have 223 finite values, including two zeros, as well as 2 infinities as 30 NaNs.

An ieee<8, 4> floating point number would have a maximum value of 120 and a minimum value of 2-9 = 1/512, a dynamic range of 4.7 decades. It would have 239 finite values, including two zeros, as well as 2 infinities and 14 NaNs.

A posit<8, 1> would have a maximum value of 212 = 4096 and a minimum value of 1/4096, a dynamic range of  7.2 decades. Any 8-bit posit, regardless of the maximum number of exponent bits, will have 255 finite values and one infinity.

Near 1, an ieee<8, 4> has 3 significand bits, an ieee<8, 3> has 4, and a posit<8,1> has 4.

***

[1] Chung et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. Available here.

Comparing range and precision of IEEE and posit

The IEEE standard 754-2008 defines several sizes of floating point numbers—half precision (binary16), single precision (binary32), double precision (binary64), quadruple precision (binary128), etc.—each with its own specification. Posit numbers, on the other hand, can be defined for any number of bits. However, the IEEE specifications share common patterns so that you could consistently define theoretical IEEE numbers that haven’t actually been specified, making them easier to compare to posit numbers.

An early post goes into the specification of posit numbers in detail. To recap briefly, a posit<nes> number has n bits, a maximum of es of which are devoted to the exponent. The bits are divided into a sign bit, regime bits, exponent bits, and fraction bits. The sign bit is of course one bit, but the other components have variable lengths. We’ll come back to posits later for comparison.

IEEE floating point range and precision

We will denote a (possibly hypothetical) IEEE floating point number as ieee<nes> to denote one with n total bits and (exactly) es exponent bits. Such a number has one sign bit and n – es -1 significand bits. Actual specifications exist for ieee<16, 5>, ieee<32, 8>, ieee<64, 11>, and ieee<128, 15>.

The exponent of a posit number is simply represented as an unsigned integer. The exponent of an IEEE floating point number equals the exponent bits interpreted as an unsigned integers minus a bias.

\text{bias} = 2^{es -1} - 1.

So the biases for half, single, double, and quad precision floats are 15, 127, 1023, and 65535 respectively. We could use the formula above to define the bias for a hypothetical format not yet specified, assuming the new format is consistent with existing formats in this regard.

The largest exponent, emax is 2es-1 – 1 (also equal to the bias), and the smallest (most negative) exponent is emin = 2 – 2es-1. This accounts for 2es-1 – 2 possible exponents. The two remaining possibilities consist of all 1’s and all 0’s, and are reserved for special use. They represent, in combination with sign and signifcand bits, special values ±0, ±∞, NaN, and denomalized numbers. (More on denormalized numbers shortly.)

The largest representable finite number has the maximum exponent and a significand of all 1’s. Its value is thus

2^{e_{\text{max}}} (1 - 2^{-s})

where s is the number of significand bits. And so the largest representable finite number is just slightly less than

2^{2^{es -1}} }

We’ll use this as the largest representable value when calculating dynamic range below.

The smallest representable normalized number (normalized meaning the signifcand represents a number greater than or equal to 1) is

2^{e_{\text{min}}} = 2^{2 - 2^{es -1}}

However, it is possible to represent smaller values with denomalized numbers. Ordinarily the significand bits fff… represent a number 1.fff… But when the exponent bit pattern consists of all 0’s, the significand bits are interpreted as 0.fff… This means that the smallest denormalized number has a significand of all o’s except for a 1 at the end. This represents a value of

2^{e_{\text{min}}} \cdot 2^{-s} = 2^{2 - 2^{es-1} - s}

where again s is the number of significand bits.

The dynamic range of an ieee<nes> number is the log base 10 of the ratio of the largest to smallest representable numbers, smallest here including denormalized numbers.

\log_{10} \left(  \frac{2^{2^{es - 1} }}{2^{2 - 2^{es-1} - s}} \right) = \log_{10}2^{2^{es} - 2 + s} = (2^{es} - 2 + s) \log_{10}2

IEEE float and posit dynamic range at comparable precision

Which posit number should we compare with each IEEE number? We can’t simply compare ieee<nes> with posit<nes>. The value n means the same in both cases: the total number of bits. And although es does mean the number of exponent bits in both cases, they are not directly comparable because posits also have regime bits that are a special kind of exponent bits. In general a comparable posit number will have a smaller es value than its IEEE counterpart.

One way to compare IEEE floating point numbers and posit numbers is to chose a posit number format with comparable precision around 1. See the first post on posits their dynamic range and significance near 1.

In the following table, the numeric headings are the number of bits in a number. The “sig” rows contain the number of sigificand bits in the representation of 1, and “DR” stands for dynamic range in decades.

|-----------+----+-----+------+-------|
|           | 16 |  32 |   64 |   128 |
|-----------+----+-----+------+-------|
| IEEE  es  |  5 |   8 |   11 |    15 |
| posit es  |  1 |   3 |    5 |     8 |
| IEEE  sig | 10 |  23 |   52 |   112 |
| posit sig | 12 |  26 |   56 |   117 |
| IEEE  DR  | 12 |  83 |  632 |  9897 |
| posit DR  | 17 | 144 | 1194 | 19420 |
|-----------+----+-----+------+-------|

Note that in each case the posit number has both more precision for numbers near 1 and a wider dynamic range.

It’s common to use a different set of posit es values that have a smaller dynamic range than their IEEE counterparts (except for 16 bits) but have more precision near 1.

|-----------+----+-----+------+-------|
|           | 16 |  32 |   64 |   128 |
|-----------+----+-----+------+-------|
| IEEE  es  |  5 |   8 |   11 |    15 |
| posit es  |  1 |   2 |    3 |     4 |
| IEEE  sig | 10 |  23 |   52 |   112 |
| posit sig | 12 |  27 |   58 |   122 |
| IEEE  DR  | 12 |  83 |  632 |  9897 |
| posit DR  | 17 |  72 |  299 |  1214 |
|-----------+----+-----+------+-------|

Python code

Here’s a little Python code if you’d like to experiment with other number formats.

from math import log10

def IEEE_dynamic_range(total_bits, exponent_bits):

    # number of significand bits
    s = total_bits - exponent_bits - 1
    
    return (2**exponent_bits + s - 2)*log10(2)

def posit_dynamic_range(total_bits, max_exponent_bits):
    
    return (2*total_bits - 4) * 2**max_exponent_bits * log10(2)

Next: See the next post for a detailed look at eight bit posits and IEEE-like floating point numbers.

Anatomy of a posit number

This post will introduce posit numbers, explain the interpretation of their bits, and discuss their dynamic range and precision.

Posit numbers are a new way to represent real numbers for computers, an alternative to the standard IEEE floating point formats. The primary advantage of posits is the ability to get more precision or dynamic range out of a given number of bits. If an application can switch from using 64-bit IEEE floats to using 32-bit posits, for example, it can fit twice as many numbers in memory at a time. That can make a big difference in the performance of applications that process large amounts of data.

Let’s back up and say what a posit number is.

Unums and posits

John Gustafson introduced unums (universal numbers) as a different way to represent real numbers using using a finite number of bits, an alternative to IEEE floating point. See, for example, his 2015 book The End of Error. Posits are a hardware-friendly version of unums.

A conventional floating point number (IEEE 754) has a sign bit, a set of bits to represent the exponent, and a set of bits called the significand (formerly called the mantissa). For details, see Anatomy of a floating point number. For a given size number, the lengths of the various parts are fixed. A 64-bit floating point number, for example, has 1 sign bit, 11 exponent bits, and 52 bits for the significand.

A posit adds an additional category of bits, known as the regime. A posit has four parts

  1. sign bit
  2. regime
  3. exponent
  4. fraction

while an IEEE floating point number has a sign bit, exponent, and significand, the latter corresponding to the fraction part of a posit. Unlike IEEE numbers, the exponent and fraction parts of a posit do not have fixed length. The sign and regime bits have first priority. Next, the remaining bits, if any, go into the exponent. If there are still bits left after the exponent, the rest go into the fraction.

The main reference for this post is [1].

Bit pattern of a posit

To understand posits in more detail, and why they have certain advantages over conventional floating point numbers, we need to unpack their bit representation. A posit number type is specified by two numbers: the total number of bits n, and the maximum number of bits devoted to the exponent, es. (Yes, it’s a little odd to use a two-letter variable name, but that’s conventional in this context.) Together we say we have a posit<nes> number.

Sign bit

As with an IEEE floating point number, the first bit of a posit is the sign bit. If the sign bit is 1, representing a negative number, take the two’s complement of the rest of the bits before unpacking the regime, exponent, and fraction bits.

Regime bits

After the sign bit come the regime bits. The number of regime bits is variable. There could be anywhere from 1 to n-1 regime bits. How do you know when the regime bits stop? When a run of identical bits ends, either because you run out of bits or because you run into an opposite bit.

If the first bit after the sign bit is a 0, then the regime bits continue until you run out of bits or encounter a 1. Similarly, if the first bit after the sign bit is a 1, the regime bits continue until you run out of bits or encounter a 0. The bit that indicates the end of a run is not included in the regime; the regime is a string of all 0’s or all 1’s.

Exponent bits

The sign bit and regime bits get first priority. If there are any bits left, the exponent bits are next in line.  There may be no exponent bits. The maximum number of exponent bits is specified by the number es. If there are at least es bits after the sign bit, regime bits, and the regime terminating bit, the next es bits belong to the exponent. If there are fewer than es bits left, what bits remain belong to the exponent.

Fraction bits

If there are any bits left after the sign bit, regime bits, regime terminating bit, and the exponent bits, they all belong to the fraction.

Interpreting the components of a posit

Next we look at how the components described above represent a real number.

Let b be the sign bit in a posit. The sign s of the number represented by the bit pattern is positive if this bit is 0 and negative otherwise.

s = (-1)^b

Let m be the number of bits in the regime, i.e. the length of the run of identical bits following the sign bit. Then let k = –m if the regime consists of all 0’s, and let km-1 otherwise.

k = \left\{ \begin{array}{ll} -m & \text{ if regime has } m \text{ 0's} \\ m-1 & \text{ if regime has } m \text{ 1's} \end{array} \right.

The useed u of the posit is determined by es, the maximum exponent size.

u = 2^{2^{\text{\small\emph{es}}} }

The exponent e is simply the exponent bits interpreted as an unsigned integer.

The fraction f is 1 + the fraction bits interpreted as following a binary point. For example, if the fraction bits are 10011, then f = 1.10011 in binary.

Putting it all together, the value of the posit number is the product of the contributions from the sign bit, regime bits, exponent bits (if any), and fraction bits (if any).

x = s\, u^k\, 2^e f = (-1)^b \, f\, 2^{e + k2^{\text{\small\emph{es}}} }

Exceptional posits

There are two exceptional posits, both with all zeros after the sign bit. A string of n 0’s represents the number zero, and a 1 followed by n-1 0’s represents ±∞.

There’s only one zero for posit numbers, unlike IEEE floats that have two kinds of zero, one positive and one negative.

There’s also only one infinite posit number. For that reason you could say that posits represent projective real numbers rather than extended real numbers. IEEE floats have two kinds of infinities, positive and negative, as well as several kinds of non-numbers. Posits have only one entity that does not correspond to a real number, and that is ±∞.

Dynamic range and precision

The dynamic range and precision of a posit number depend on the value of es. The larger es is, the larger the contribution of the regime and exponent bits will be, and so the larger range of values one can represent. So increasing es increases dynamic range. Dynamic range, measured in decades, is the log base 10 of the ratio between the largest and smallest representable positive values.

However, increasing es means decreasing the number of bits available to the fraction, and so decreases precision. One of the benefits of posit numbers is this ability to pick es to adjust the trade-off between dynamic range and precision to meet your needs.

The largest representable finite posit is labeled maxpos. This value occurs when k is as large as possible, i.e. when all the bits after the sign bit are 1’s. In this case kn-2. So maxpos equals

u^{n-2} = \left( 2^{2^{\text{\small\emph{es}}} } \right)^{n-2}

The smallest representable positive number, minpos, occurs when k is as negative as possible, i.e. when the largest possible number of bits after the sign bit are 0’s. They can’t all be zeros or else we have the representation for the number 0, so there must be a 1 on the end. In this case m = n-2 and k = 2-n.

\mbox{minpos} = u^{2-n} = \left( 2^{2^{\text{\small\emph{es}}} } \right)^{2-n} = 1/\mbox{maxpos}

The dynamic range is given by the log base 10 of the ratio between maxpos and minpos.

\log_{10}\left( 2^{2^{\text{\small\emph{es}}} } \right)^{2n-4} = (2n-4)2^{es}\log_{10}2

For example, 16-bit posit with es = 1 has a dynamic range of 17 decades, whereas a 16-bit IEEE floating point number has a dynamic range of 12 decades. The former has a fraction of 12 bits for numbers near 1, while the latter has a significand of 10 bits. So a posit<16,1> number has both a greater dynamic range and greater precision (near 1) than its IEEE counterpart.

[Update: See this post for more on the dynamic range and precision of IEEE floats of various sizes and how posits compare.]

Note that the precision of a posit number depends on its size. This is the sense in which posits have tapered precision. Numbers near 1 have more precision, while extremely big numbers and extremely small numbers have less. This is often what you want. Typically the vast majority of numbers in a computation are roughly on the order of 1, while with the largest and smallest numbers, you mostly want them to not overflow or underflow.

Related post: Anatomy of a floating point number

***

[1] John L. Gustafson and Isaac Yonemoto. Beating Floating Point at its Own Game: Posit Arithmetic. DOI: 10.14529/jsfi170206