Summing an array of floating point numbers

Adding up an array of numbers is simple: just loop over the array, adding each element to an accumulator.

Usually that’s good enough, but sometimes you need more precision, either because your application need a high-precision answer, or because your problem is poorly conditioned and you need a moderate-precision answer [1].

There are many algorithms for summing an array, which may be surprising if you haven’t thought about this before. This post will look at Kahan’s algorithm, one of the first non-obvious algorithms for summing a list of floating point numbers.

We will sum a list of numbers a couple ways using C’s float type, a 32-bit integer. Some might object that this is artificial. Does anyone use floats any more? Didn’t moving from float to double get rid of all our precision problems? No and no.

In fact, the use of 32-bit float types is on the rise. Moving data in an out of memory is now the bottleneck, not arithmetic operations per se, and float lets you pack more data into a giving volume of memory. Not only that, there is growing interest in floating point types even smaller than 32-bit floats. See the following posts for examples:

And while sometimes you need less precision than 32 bits, sometimes you need more precision than 64 bits. See, for example, this post.

Now for some code. Here is the obvious way to sum an array in C:

    float naive(float x[], int N) {
        float s = 0.0;
        for (int i = 0; i < N; i++) {
            s += x[i];
        return s;

Kahan’s algorithm is not much more complicated, but it looks a little mysterious and provides more accuracy.

    float kahan(float x[], int N) {
        float s = x[0];
        float c = 0.0;
        for (int i = 1; i < N; i++) {
            float y = x[i] - c;
            float t = s + y;
            c = (t - s) - y;
            s = t;
        return s;

Now to compare the accuracy of these two methods, we’ll use a third method that uses a 64-bit accumulator. We will assume its value is accurate to at least 32 bits and use it as our exact result.

    double dsum(float x[], int N) {
        double s = 0.0;
        for (int i = 0; i < N; i++) {
            s += x[i];
        return s;

For our test problem, we’ll use the reciprocals of the first 100,000 positive integers rounded to 32 bits as our input. This problem was taken from [2].

    #include <stdio.h&gt:
    int main() {
        int N = 100000;
        float x[N];
        for (int i = 1; i <= N; i++)
            x[i-1] = 1.0/i;
        printf("%.15g\n", naive(x, N));
        printf("%.15g\n", kahan(x, N));  
        printf("%.15g\n", dsum(x, N));

This produces the following output.


The naive method is correct to 6 significant figures while Kahan’s method is correct to 9 significant figures. The error in the former is 5394 times larger. In this example, Kahan’s method is as accurate as possible using 32-bit numbers.

Note that in our example we were not exactly trying to find the Nth harmonic number, the sum of the reciprocals of the first N positive numbers. Instead, we were using these integer reciprocals rounded to 32 bits as our input. Think of them as just data. The data happened to be produced by computing reciprocals and rounding the result to 32-bit floating point accuracy.

But what if we did want to compute the 100,000th harmonic number? There are methods for computing harmonic numbers directly, as described here.

Related posts

[1] The condition number of a problem is basically a measure of how much floating point errors are multiplied in the process of computing a result. A well-conditioned problem has a small condition number, and a ill-conditioned problem has a large condition number. With ill-conditioned problems, you may have to use extra precision in your intermediate calculations to get a result that’s accurate to ordinary precision.

[2] Handbook of Floating-Point Arithmetic by Jen-Michel Muller et al.

Arbitrary precision is not a panacea

Having access to arbitrary precision arithmetic does not necessarily make numerical calculation difficulties go away. You still need to know what you’re doing.

Before you extend precision, you have to know how far to extend it. If you don’t extend it enough, your answers won’t be accurate enough. If you extend it too far, you waste resources. Maybe you’re OK wasting resources, so you extend precision more than necessary. But how far is more than necessary?

As an example, consider the problem of finding values of n such that tan(n) > n discussed a couple days ago. After writing that post, someone pointed out that these values of n are one of the sequences on OEIS. One of the values on the OEIS site is

k = 1428599129020608582548671.

Let’s verify that tan(k) > k. We’ll use our old friend bc because it supports arbitrary precision. Our k is a 25-digit number, so lets tell bc we want to work with 30 decimal places to give ourselves a little room. (bc automatically gives you a little more room than you ask for, but we’ll ask for even more.)

bc doesn’t have a tangent function, but it does have s() for sine and c() for cosine, so we’ll compute tangent as sine over cosine.

    $ bc -l
    scale = 30
    k = 1428599129020608582548671
    s(k)/c(k) - k

We expect this to return a positive value, but instead it returns


So is OEIS wrong? Or is bc ignoring our scale value? It turns out neither is true.

You can’t directly compute the tangent of a large number. You use range reduction to reduce it to the problem of computing the tangent of a small angle where your algorithms will work. Since tangent has period π, we reduce k mod π by computing k – ⌊k/π⌋π. That is, we subtract off as many multiples of π as we can until we get a number between 0 and π. Going back to bc, we compute

    pi = 4*a(1)

This returns


and so we compute the tangent of

    t = 0.500000507033221370444866152761*pi
      = 1.570797919686740002588270178941

Since t is slightly larger than π/2, the tangent will be negative. We can’t have tan(t) greater than k because tan(t) isn’t even greater than 0. So where did things break down?

The calculation of pi was accurate to 30 significant figures, and the calculation of k/pi was accurate to 30 significant figures, given our value of pi. So far has performed as promised.

The computed value of k/pi is correct to 29 significant figures, 23 before the decimal place and 6 after. So when we take the fractional part, we only have six significant figures, and that’s not enough. That’s where things go wrong. We get a value ⌊k/π⌋ that’s greater than 0.5 in the seventh decimal place while the exact value is less than 0.5 in the 25th decimal place. We needed 25 – 6 = 19 more significant figures.

This is the core difficulty with floating point calculation: subtracting nearly equal numbers loses precision. If two numbers agree to m decimal places, you can expect to lose m significant figures in the subtraction. The error in the subtraction will be small relative to the inputs, but not small relative to the result.

Notice that we computed k/pi to 29 significant figures, and given that output we computed the fractional part exactly. We accurately computed ⌊k/π⌋π, but lost precision when we computed we subtracted that value from k.

Our error in computing k – ⌊k/π⌋π was small relative to k, but not relative to the final result. Our k is on the order of 1025, and the error in our subtraction is on the order of 10-7, but the result is on the order of 1. There’s no bug in bc. It carried out every calculation to the advertised accuracy, but it didn’t have enough working precision to produce the result we needed.

Related posts

Formatting numbers at the command line

The utility numfmt, part of Gnu Coreutils, formats numbers. The main uses are grouping digits and converting to and from unit suffixes like k for kilo and M for mega. This is somewhat useful for individual invocations, but like most command line utilities, the real value is using it as part of pipeline.

The --grouping option will separate digits according to the rules of your locale. So on my computer

    numfmt --grouping 123456789

returns 123,456,789. On a French computer, it would return 123.456.789 because the French use commas as decimal separators and use periods to group digits [1].

You can also use numfmt to convert between ordinary numbers and numbers with units. Unfortunately, there’s some ambiguity regarding what units like kilo and mega mean. A kilogram is 1,000 grams, but a kilobyte is 210 = 1,024 bytes. (Units like kibi and mebi were introduced to remove this ambiguity, but the previous usage is firmly established.)

If you want to convert 2M to an ordinary number, you have to specify whether you mean 2 × 106 or 2 × 220. For the former, use --from=si (for Système international d’unités) and for the latter use --from=iec (for International Electrotechnical Commission).

    $ numfmt --from=si 2M
    $ numfmt --from=iec 2M

One possible gotcha is that the symbol for kilo is capital K rather than lower case k; all units from kilo to Yotta use a capital letter. Another is that there must be no space between the numerals and the suffix, e.g. 2G is legal but 2 G is not.

You can use Ki for kibi, Mi for mebi etc. if you use --from=iec-i.

    $ numfmt --from=iec-i 2Gi  

To convert from ordinary numbers to numbers with units use the --to option.

    $ numfmt --to=iec 1000000 

Related links

[1] I gave a presentation in France years ago. Much to my chagrin, the software I was demonstrating had a hard-coded assumption that the decimal separator was a period. I had tested the software on a French version of Windows, but had only entered integers and so I didn’t catch the problem.

To make matters worse, there was redundant input validation. Entering 3.14 rather than 3,14 would satisfy the code my team wrote, but the input was first validated by a Windows API which rejected 3.14 as invalid in the user’s locale.

Linear feedback shift registers

The previous post looked at an algorithm for generating De Bruijn sequences B(k, n) where k is a prime number. These are optimal sequences that contain every possible consecutive sequence of n symbols from an alphabet of size k. As noted near the end of the post, the case k = 2 is especially important in application, i.e. binary sequences.

If we set k = 2, the generating algorithm is an example of a linear feedback shift register (LFSR) sequence.

Here’s the algorithm from the previous post:

  1. If (x1, x2, …, xn ) = (0,0, …, 0), return c1.
  2. If (x1, x2, …, xn ) = (1,0, …, 0), return 0.
  3. Otherwise return c1x1 + c2x2 + … cnxn mod k.

Note that the last line says to return a linear combination of the previous symbols. That is, we operate on the latest n symbols, saving them to registers. We take the feedback from the previous outputs and compute a linear combination. We then shift the register content over by one and add the new output on the end. Hence the name “linear feedback shift register” sequence.

Note that the LFSR algorithm is a linear operator over the field GF(2), except for the special cases in steps 1 and 2. The algorithm couldn’t be entirely linear because it would get stuck; It would produce nothing but zeros forevermore once it encountered an input sequence of all zeros. So technically a LFSR is an “nearly always linear feedback shift register.” It’s linear for 2n – 2 inputs and nonlinear for 2 special inputs.

A LFSR is more general than a binary De Bruijn sequence. For some choices of linear coefficients the output is a De Bruijn sequence, but not for others. If you associate the linear coefficients with polynomial coefficient (with a sign change, as noted in the previous post) then the LFSR sequence is a De Bruijn sequence if and only if the polynomial is a primitive polynomial of degree n.

A few months ago I wrote about LFSRs in the context of random number generation. LFSRs make very efficient random number generators, but they’re not appropriate for cryptography because their linear structure makes them vulnerable to attack. The idea of shrinking generators is use one random number generator to sample another generator. The two generators can both be LFSRs, but the combined generator is nonlinear because the sampling mechanism is nonlinear. The net result is that you can combine two efficient but insecure generators to create a new generator that is efficient and secure.

Related posts

Generating De Bruijn cycles with primitive polynomials

Last week I wrote about a way to use De Bruijn cycles. Today I’ll write about a way to generate De Bruijn cycles.

De Bruijn cycles

Start with an alphabet of k symbols. B(k, n) is the set of cycles of that contain every sequence of n symbols, and is as short as possible. Since there are kn possible sequences of n symbols, and every one corresponds to some starting position in the De Bruijn cycle, an element of B(k, n) has to have at least kn symbols. In fact, the elements of B(k, n) have exactly kn symbols.

It’s not obvious that B(k, n) should be non-empty for all k and n, but it is. And there are algorithms that will take a k and an n and return an element of B(k, n).

The post from last week gave the example of a sequence in B(4, 3) that contains all triples of DNA base pairs:


Generating De Bruijn cycles

When k is a prime number, i.e. we’re working over an alphabet with a prime number of symbols, it is particularly simple generate De Bruijn sequences [1]. For example, let’s work over the alphabet {0, 1, 2}. Then the following code will produce the next symbol in a sequence in B(3, 4).

    def B34(a,b,c,d):
        if (a,b,c,d) == (0,0,0,0):
            return 1
        if (a,b,c,d) == (1,0,0,0):
            return 0
        return (a+b) % 3

We can initialize the sequence wherever we like since it produces a cycle, but if we start with 0000 we get the following:


You can verify that every sequence of four elements from {0, 1, 2} is in there somewhere, provided you wrap the end around. For example, 1000 can be found by starting in the last position.

Where did the algorithm above come from? How would you create an analogous algorithm for other values of k and n?

The algorithm goes back to Willem Mantel in 1894, and can be found, for example, in Knuth’s TAOCP Volume 4. Here is Mantel’s algorithm to generate an element of B(k, n) where k is prime. The function takes the latest n symbols in the De Bruijn cycle and returns the next symbol.

  1. If (x1, x2, …, xn ) = (0,0, …, 0), return c1.
  2. If (x1, x2, …, xn ) = (1,0, …, 0), return 0.
  3. Otherwise return c1x1 + c2x2 + … cnxn mod k.

In our example above, c1 = c2 = 1 and c3 = c4 = 0, but how do you find the c‘s in general?

Primitive polynomials

To find the c‘s, first find a primitive polynomial of degree n over the prime field with k elements. Then the c‘s are the coefficients of the polynomial, with a sign change, if you write the polynomial in the following form.

x^n - c_n x^{n-1} - \cdots - c_2x - c_1

In the example above, I started with the polynomial

x^4 + 2x + 2

We can read the coefficients off this polynomial: c1 = c2 = -2 and c3 = c4 = 0. Since -1 and 2 are the same working mod 3, I used c1 = c2 = 1 above.

Backing up a bit, what is a primitive polynomial and how do you find them?

A primitive polynomial of degree n with coefficients in GF(k), the finite field with k elements, has leading coefficient 1 and has a root α that generates the multiplicative group of GF(kn). That is, every nonzero element of GF(kn) can be written as a power of α.

In my example, I found a primitive polynomial in GF(34) by typing polynomials into Mathematica until I found one that worked.

    In[1]:= PrimitivePolynomialQ[x^4 + 2 x + 2, 3]

    Out[1]= True

Since coefficients can only be 0, 1, or 2 when you’re working mod 3, it only took a few guesses to find a primitive polynomial [2].

Brute force guessing works fine k and n are small, but clearly isn’t practical in general. There are algorithms for searching for primitive polynomials, but I’m not familiar with them.

The case where k = 2 and n may be large is particularly important in applications, and you can find where people have tabulated primitive binary polynomials, primitive polynomials in GF(2n). It’s especially useful to find primitive polynomials with a lot of zero coefficients because, for example, this leads to less computation when producing De Bruijn cycles.

Finding sparse primitive binary polynomials is its own field of research. See, for example, Richard Brent’s project to find primitive binary trinomials, i.e. primitive binary polynomials with only three non-zero coefficients.

More on binary sequences in the next post on linear feedback shift registers.


[1] The same algorithm can be used when k is not prime but a prime power because you can equate sequence of length n from an alphabet with k = pm elements with a sequence of length mn from an alphabet with p elements. For example, 4 is not prime, but we could have generated a De Bruijn sequence for DNA basepair triples by looking for binary sequences of length 6 and using 00 = A, 01 = C, 10 = G, and 11 = T.

[2] The number of primitive polynomials in GF(pn) is φ(pn – 1)/m where φ is Euler’s totient function. So in our case there were 8 polynomials we could have found.

Quantum supremacy and post-quantum crypto

Google announced today that it has demonstrated “quantum supremacy,” i.e. that they have solved a problem on a quantum computer that could not be solved on a classical computer. Google says

Our machine performed the target computation in 200 seconds, and from measurements in our experiment we determined that it would take the world’s fastest supercomputer 10,000 years to produce a similar output.

IBM disputes this claim. They don’t dispute that Google has computed something with a quantum computer that would take a lot of conventional computing power, only that it “would take the world’s fastest supercomputer 10,000 years” to solve. IBM says it would take 2.5 days.

Scott Aaronson gets into technical details of the disagreement between Google and IBM. He explains that the supercomputer in question is Oak Ridge National Labs’ Summit machine. It covers the area of two basketball courts and has 250 petabytes of disk storage. By exploiting its enormous disk capacity, Summit could simulate Google’s quantum calculation on classical hardware in “only” two and half days. In a nutshell, it seems Google didn’t consider that you could trade off a gargantuan amount of storage for processor power. But as Aaronson points out, if Google’s machine added just a couple more qubits, even Summit couldn’t keep up on this particular problem.

So does this mean that all the world’s encryption systems are going to fail by the end of the week?

Google selected a problem that is ideal for a quantum computer. And why wouldn’t they? This is the natural thing to do. But they’re not on the verge of rendering public key encryption useless.

Google’s Sycamore processor used 54 qubits. According to this paper, it would take 20,000,000 qubits to factor 2048-bit semiprimes such as used in RSA encryption. So while Google has achieved a major milestone in quantum computing, public key encryption isn’t dead yet.

If and when large-scale quantum computing does become practical, encryption algorithms that depend on the difficulty of factoring integers will be broken. So will algorithms that depend on discrete logarithms, either working over integers or over elliptic curves.

Post-quantum encryption methods, methods that will remain secure even in a world of quantum computing (as far as we know), have been developed but not widely deployed. There’s a push to develop post-quantum methods now so that they’ll be ready by the time they’re needed. Once a new method has been proposed, it takes a long time for researchers to have confidence in it. It also takes a long time to develop efficient implementations that don’t introduce vulnerabilities.

The NSA recommends using existing methods with longer keys for now, then moving to quantum-resistant methods, i.e. not putting any more effort into developing new algorithms that are not quantum-resistant.

Here are some posts I’ve written about post-quantum encryption methods.

Splitting lines and numbering the pieces

As I mentioned in my computational survivalist post, I’m working on a project where I have a dedicated computer with little more than basic Unix tools, ported to Windows. It’s given me new appreciation for how the standard Unix tools fit together; I’ve had to rely on them for tasks I’d usually do a different way.

I’d seen the nl command before for numbering lines, but I thought “Why would you ever want to do that? If you want to see line numbers, use your editor.” That way of thinking looks at the tools one at a time, asking what each can do, rather than thinking about how they might work together.

Today, for the first time ever, I wanted to number lines from the command line. I had a delimited text file and wanted to see a numbered list of the column headings. I’ve written before about how you can extract columns using cut, but you have to know the number of a column to select it. So it would be nice to see a numbered list of column headings.

The data I’m working on is proprietary, so I downloaded a PUMS (Public Use Microdata Sample) file named ss04hak.csv from the US Census to illustrate instead. The first line of this file is


I want to grab the first line of this file, replace commas with newlines, and number the results. That’s what the following one-liner does.

    head -n 1 ss04hak.csv | sed "s/,/\n/g" | nl

The output looks like this:

     1  RT 
     2  SERIALNO 
     3  DIVISION  
     4  MSACMSA
     5  PMSA
   100  FWATP
   101  FYBLP

Now if I wanted to look at a particular field, I could see the column number without putting my finger on my screen and counting. Then I could use that column number as an argument to cut -f.

File character counts

Once in a while I need to know what characters are in a file and how often each appears. One reason I might do this is to look for statistical anomalies. Another reason might be to see whether a file has any characters it’s not supposed to have, which is often the case.

A few days ago Fatih Karakurt left an elegant solution to this problem in a comment:

    fold -w1 file | sort | uniq -c

The fold function breaks the content of a file in to lines 80 characters long by default, but you can specify the line width with the -w option. Setting that to 1 makes each character its own line. Then sort prepares the input for uniq, and the -c option causes uniq to display counts.

This works on ASCII files but not Unicode files. For a Unicode file, you might do something like the following Python code.

import collections

count = collections.Counter()
file = open("myfile", "r", encoding="utf8")
for line in file.readlines():
    for c in line.strip("\n"):
        count[ord(c)] += 1

for p in sorted(list(count)):
    print(chr(p), hex(p), count[p])

Survivalist vi

A few days ago I wrote about computational survivalists, people who prepare to be able to work on computers with only software that is available everywhere. Of course nothing is available everywhere, and so each person interprets “everywhere” to mean computers they anticipate using.

If you need to edit a text file on a Windows computer, you can count on Notepad being there. And if you know how to use Windows, you know how to use Notepad. You may not know advanced features (does Notepad even have advanced features?) but you can get around.

On a Unix-like computer, you can count on vi being there. It was written about four decades ago and ships with every Unix-like system. This post will explain the bare minimum to use vi in a pinch.


Arrow keys should work as expected. If for whatever reason your keyboard has no arrow keys or the keys don’t work, you can use the four home row keys h, j, k, and l as arrow keys. Use trial and error until you figure out which way each one moves.

vi has two modes, command mode and insert mode, and it opens in command mode by default. Typing an h, for example, does not insert an h into your text but instead moves the cursor to the left.


To edit a file, you can enter insert mode by typing i. Now the characters you type will be inserted into your file. What if you want to delete a character? You can delete too, but it’s still called “insert” mode. The arrow keys still navigate, but typing an h, for example, will insert an h into your file. To go back to command mode to navigate more, type the Escape key.


If you’re not already in command mode, type ESC to enter command mode.

If you haven’t made any changes, you can type :q and Enter from command mode to exit. If you have made changes, you can use :wq to save the file and exit, and :q! to exit without saving.

Alternatives to Escape

If your keyboard doesn’t have an escape key, as on some Macs, you can use Control-[ as an escape key. (There’s a story behind why this works.)

Similar posts

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Queueing theory and regular expressions

people waiting in line

Queueing theory is the study of waiting in line. That may not sound very interesting, but the subject is full of surprises. For example, when a server is near capacity, adding a second server can cut backlog not just in half but by an order of magnitude or more. More on that here.

In this post, I’m not going to look at the content of queueing theory but just the name. When seeing the word queueing, many people assume it is misspelled and want to replace it with queuing. As I write this, my spell checker is recommending that change. But the large majority of people in the field call their subject queueing theory.

Queueing has five consecutive vowels. Are there any other English words with so many vowels in a row? To find out, I did an experiment like the classic examples from Kernighan and Pike demonstrating regular expressions by searching a dictionary for words with various patterns.

The first obstacle I ran into is that apparently Ubuntu doesn’t come with a dictionary file. However this post explains how to install one.

The second obstacle was that queueing wasn’t in the dictionary. I first searched the American English dictionary, and it didn’t include this particular word. But then I searched the British English dictionary and found it.

Now, to find all words with five consecutive vowels, I ran this:

    egrep '[aeiou]{5}' british-english

and got back only one result: queueing. So at least in this dictionary of 101,825 words, the only word with five (or more) consecutive vowels is queueing.

Out of curiosity I checked how many words have at least four consecutive vowels. The command

    grep -E "[aeiou]{4}" british-english | wc -l

returns 40, and the corresponding command with the file american-english returns 39. So both dictionaries contain 39 words with exactly four vowels in a row and no more. But are they they same 39 words? In fact they are. And one of the words is queuing, and so apparently that is an acceptable spelling in both American and British English. But as a technical term in mathematics, the accepted spelling is queueing.

Some of the 39 words are plurals or possessives of other words. For example, the list begins with Hawaiian, Hawaiian’s, and Hawaiians. (Unix sorts capital letters before lower case letters.) After removing plurals and possessives, we have 22 words with four consecutive vowels:

  • Hawaiian
  • Iroquoian
  • Kauai
  • Kilauea
  • Louie
  • Montesquieu
  • Quaoar
  • Rouault
  • aqueous
  • gooier
  • gooiest
  • obsequious
  • obsequiously
  • obsequiousness
  • onomatopoeia
  • pharmacopoeia
  • plateaued
  • plateauing
  • queue
  • queued
  • queuing
  • sequoia

It’s interesting that three of the words are related to Hawaii: Kuaui is an Hawaiian island and Kulauea is a Hawaiian volcano.

I’d never heard of Quaoar before. It’s a Kuiper belt object about half the size of Pluto.

More queueing theory posts