Computing parity of a binary word

The previous post mentioned adding a parity bit to a string of bits as a way of detecting errors. The parity of a binary word is 1 if the word contains an odd number of 1s and 0 if it contains an even number of ones.

Codes like the Hamming codes in the previous post can have multiple parity bits. In addition to the parity of a word, you might want to also look at the parity of a bit mask AND’d with the word. For example, the Hamming(7,4) code presented at the end of that post has three parity bits. For a four-bit integer x, the first parity bit is the parity bit of

    x & 0b1011

the second is the parity bit of

    x & 0b1101

and the third is the parity of

    x & 0b1110

These three bit masks are the last three columns of the matrix representation of the Hamming(7,4) code:

    1 0 0 0 1 1 1
    0 1 0 0 0 1 1
    0 0 1 0 1 0 1
    0 0 0 1 1 1 0

More generally, any linear operation on a vector of bits is given by multiplication by a binary matrix, where arithmetic is carried out mod 2. Matrix products can be defined in terms of inner products, and the inner product of words x and y is given by the parity of x&y.

Given that parity is important to compute, how would you compute it?

If you have a popcount function, you could read the last bit of the popcount. Since popcount counts the number of ones, the parity of x is 1 if popcount(x) is odd and 0 otherwise.

gcc extensions

In the earlier post on popcount, we mentioned three functions that gcc provides to compute popcount:


These three functions return popcount for an unsigned int, an unsigned long, and an unsigned long long respectively.

In each case the function will call a function provided by the target processor if it is available, and will run its own code otherwise.

There are three extensions for computing parity that are completely analogous:


Stand-alone code

If you want your own code for computing parity, Hacker’s Delight gives the following for a 32-bit integer x.

    y = x ^ (x >> 1);
    y = y ^ (y >> 2);
    y = y ^ (y >> 4);
    y = y ^ (y >> 8);
    y = y ^ (y >> 16);

More bit twiddling posts

A gentle introduction to Hamming codes

The previous post looked at how to choose five or six letters so that their Morse code representations are as distinct as possible. This post will build on the previous one to introduce Hamming codes. The problem of finding Hamming codes is much simpler in some ways, but also more general.

Morse code is complicated for a several reasons. First, it seems at first blush to have an alphabet of two symbols—dot and dash—but it actually has an alphabet of three symbols—dot, dash, and space—with a complicated constraints. Second, we’re trying to optimize the perceptual separation between letters, not the objective separation between signals on a wire. Third, dots and dashes have different lengths, and we have secondary objectives related to transmission time. Given the option, we would prefer to choose letters that reduce transmission time, but we would also like to choose letters that each have similar transmission time.

We will simplify the situation in this post by using exactly two symbols, 0 and 1, and using bit sequences of fixed length. We’ll also have one simple objective, maximizing Hamming distance separation, with no secondary objectives.


We’ll need to introduce some terminology to make the rest of the post more clear.

By alphabet we will mean the choice of symbols we use. For Morse code, the “alphabet” of symbols is dot, dash, and space. For a binary sequence, the alphabet is 0 and 1.

By word we will mean a sequence of symbols from our alphabet. Using this terminology, we would call ..-. in Morse code a word, even though it represents an English letter. You will also see the term vector used instead of word; mentally substitute vector for word everywhere if you find that easier.

The set of words we will use for transmitting messages is called a code. In the previous post, the code consisted originally of A, D, F, G, and V, and we looked at codes that would have been better by several criteria.

If we represent each English letter by a sequence of five bits, we would call the 0s and 1s the elements of our alphabet and the groups of five bits our words.

Binary codes

For this post we will look at words of a fixed length n. For example, we could encode English letters into words of 5 bits each since

25 = 32 > 26

though this would only give us Hamming distance separation of 1, i.e. many of the code words would differ by only one bit. If a single bit were accidentally flipped in transmission, we would not be able to tell. But if we were to use more bits per word, we could have more separation. For example, if we used words of six bits and used the last bit as a parity bit, then we could choose 26 words that have Hamming distance at least 2 from each other.

A(n, d)

The maximum number of binary words of length n, all separated by a Hamming distance of at least d, is denoted

A2(n, d).

The subscript 2 on A says that we’re working with an alphabet of 2 symbols, i.e. 0 and 1. If we were interested in an alphabet of size q, we would replace the 2 with a q.

This notation is a little sideways from our introduction, but closely related. We would like to know, for example, how many bits n we need do produce a code that has, say, 26 symbols separated by a Hamming distance d. That is, we’re thinking of our code size, such as 26, and the Hamming distance d, as being the independent variables and the number of bits n as the dependent variable. But it’s customary in coding theory to consider the word size n and the Hamming distance separation d as independent variables, and to consider Aq(n, d) as a function of n and d.

There is no way to compute Aq(n, d) in general, even for a fixed q such as q = 2. But their are upper and lower bounds, and exact values have been computed in many particular cases.

For example, A2(10, 4) = 40, and so using sequences of 10 bits, you can find a set of 40 words that are all a Hamming distance of at least 4 from each other, enough to encode all English letters (without regard to case) and 10 digits.

A2(15, 6) = 128, so with 15 bits you could find 128 words a distance of 6 apart, enough to encode all ASCII characters.

Hamming bound and perfect codes

As mentioned above, there is no convenient formula for A(n, d) but there are bounds. The Hamming bound says

A_q(n, d) \leq \frac{q^n}{\sum_{k=0}^t {n \choose k}(q-1)^k}

where the upper limit of the sum is

t = \left\lfloor \frac{d-1}{2} \right\rfloor

A code for which the Hamming bound is exact is called a perfect code.

Hamming codes

Hamming codes are perfect binary codes where d = 3.

Note that 3 is the minimum separation for error correction. If we simply add a parity bit, as mentioned above, we can detect errors, but we cannot correct them. If code words are a distance 2 apart, a word with one corrupted bit could be equidistant between two valid code words.

For example, suppose you encode 00010 as 000101, adding a parity bit of 1 at the end because the original sequence had an odd number of 1’s. And suppose you similarly encode 00011 as 000110. Now you receive 000100. You know that something is wrong because the parity bit is inconsistent with the previous bits. But you don’t know whether 000101 was transmitted and the 6th bit was corrupted or 000110 was transmitted and the 5th bit was corrupted.

With Hamming codes, n is always one less than a power of 2, i.e

n = 2m – 1

and m is the number of added bits. That is, the code will have

2mm – 1

data bits, and the number of distinct code words will be 2 raised to that power. Each of these words is separated by a Hamming distance of at least 3.

Incidentally, note that as m increases, the number of parity bits is growing linearly, but the number of data bits is growing exponentially. That is, the overhead of the parity bits relative to the code word size is going to zero.

Hamming(7,4) code

Let’s look at one Hamming code in detail. If m = 3, we have a code with 7-bit words: 4 data bits and 3 added bits. With 4 data bits we can have 16 different words, so the Hamming code with 7-bit words contains 16 words, each separated by a Hamming distance of 3. If we compute the right side of the Hamming bound we also get 16, i.e. A2(7, 3) = 16, demonstrating that the Hamming bound is exact.

We can encode a set of 4 bits by making them into a row vector and multiplying the vector on the right by the matrix

    1 0 0 0 1 1 1
    0 1 0 0 0 1 1
    0 0 1 0 1 0 1
    0 0 0 1 1 1 0

The matrix multiplication is defined using the field with two elements, so multiplication stays the same but 1 + 1 = 0.

Note that the 4 by 4 block on the left side of the matrix is the identity matrix. So the encoding of a string of four bits begins with the original 4 bits. The 16 words in our code are


Unless I’ve made an error in writing this up [1], each of these code words should differ from all other code words in at least 3 positions.

More on coding theory

[1] Errors happen all the time. That’s why we need error correcting codes!

ADFGVX cipher and Morse code separation

A century ago the German army used a field cipher that transmitted messages using only six letters: A, D, F, G, V, and X. These letters were chosen because their Morse code representations were distinct, thus reducing transmission error.

The ADFGVX cipher was an extension of an earlier ADFGV cipher. The ADFGV cipher was based on a 5 by 5 grid of letters. The ADFGVX extended the method to a 6 by 6 grid of letters and digits. A message was first encoded using the grid coordinates of the letters, then a transposition cipher was applied to the sequence of coordinates.

This post revisits the design of the ADFGVX cipher. Not the encryption method itself, but the choice of letters used for transmission. How would you quantify the difference between two Morse code characters? Given that method of quantification, how good was the choice of ADFGV or its extension ADFGVX? Could the Germans have done better?

Quantifying separation

There are several possible ways to quantify how distinct two Morse code signals are.

Time signal difference

My first thought was to compare the signals as a function of time.

There are differing conventions for how long a dot or dash should be, and how long the space between dots and dashes should be. For this post, I will assume a dot is one unit of time, a dash is three units of time, and the space between dots or dashes is one unit of time.

The letter A is represented by a dot followed by a dash. I’ll represent this as 10111: on for one unit of time for the dot, off for one unit of time for the space between the dot and the dash, and on for three units of time for the dash. D is dash dot dot, so that would be 1110101.

We could quantify the difference between two letters in Morse code as the Hamming distance between their representations as 0s and 1s, i.e. the number of positions in which the two letters differ. To compare A and D, for example, I’ll pad the A with a couple zeros on the end to make it the same length as D.

    A: 1011100
    D: 1110101
        x x  x

The distance is 3 because the two sequences differ in three positions. (Looking back at the previous post on popcount, you could compute the distance as the popcount of the XOR of the two bit patterns.)

A problem with this approach is that it seems to underestimate the perceived difference between F and G.

    F: ..-. 1010111010
    G: --.  1110111010

These only differ in the second bit, but they sound fairly different.

Symbolic difference

The example above suggests maybe we should compare the sequence of dots and dashes themselves rather than compare their corresponding time signals. By this measure F and G are distance 4 apart since they differ in every position.

Other possibilities

Comparing the symbol difference may over-estimate the difference between U (..-) and V (...-). We chould look at some combination of time signal difference and symbolic difference.

Or maybe the thing to do would be to look at something like the edit distance between letters. We could say that U and V are close because it only takes inserting a dot to turn a U into a V.

Was ADFGV optimal?

There are several choices of letters that would have been better than ADFGV by either way of measuring distance. For example, CELNU has better separation and takes about 14% less time to transmit than ADFGV.

Here are a couple tables that give the time distance (dT) and the symbol distance (dS) for ADFGV and for CELNU.

    | Pair | dT | dS |
    | AD   |  3 |  3 |
    | AF   |  4 |  3 |
    | AG   |  5 |  2 |
    | AV   |  4 |  3 |
    | DF   |  3 |  3 |
    | DG   |  2 |  1 |
    | DV   |  3 |  2 |
    | FG   |  1 |  4 |
    | FV   |  2 |  2 |
    | GV   |  3 |  3 |
    | Pair | dT | dS |
    | CE   |  7 |  4 |
    | CL   |  4 |  3 |
    | CN   |  4 |  2 |
    | CU   |  5 |  2 |
    | EL   |  5 |  3 |
    | EN   |  3 |  2 |
    | EU   |  4 |  2 |
    | LN   |  4 |  4 |
    | LU   |  3 |  3 |
    | NU   |  3 |  2 |

For six letters, CELNOU is faster to transmit than ADFGVX. It has a minimum time distance separation of 3 and a minimum symbol distance 2.

Time spread

This is an update in response to a comment that suggested instead of minimizing transmission time of a set of letters, you might want to pick letters that are most similar in transmission time. It takes much longer to transmit C (-.-.) than E (.), and this could make CELNU harder to transcribe than ADFGV.

So I went back to the script I’m using and added time spread, the maximum transmission time minus the minimum transmission time, as a criterion. The ADFGV set has a spread of 4 because V takes 4 time units longer to transmit than A. CELNU has a spread of 10.

There are 210 choices of 5 letters that have time distance greater than 1, symbol distance greater than 1, and spread equal to 4. That is, these candidates are more distinct than ADFGV and have the same spread.

It takes 44 time units to transmit ADFGV. Twelve of the 210 candidates identified above require 42 or 40 time units. There are five that take 40 time units:


Looking at sets of six letters, there are 464 candidates that have better separation than ADFGVX and equal time spread. One of these, AGLNUX, is an extension of one of the 5-letter candidates above.

The best 6-letter are ABLNUV and AGLNUV. They are better than ADFGVX by all the criteria discussed above. They both have time distance separation 2 (compared to 1), symbol distance separation 2 (compared to 1), time spread 4, (compared to 6) and transmission time 50 (compared to 56).

More Morse code posts

ChaCha RNG with fewer rounds

ChaCha is a CSPRING, a cryptographically secure pseudorandom number generator. When used in cryptography, ChaCha typically carries out 20 rounds of its internal scrambling process. Google’s Adiantum encryption system uses ChaCha with 12 rounds.

The runtime for ChaCha is proportional to the number of rounds, so you don’t want to do more rounds than necessary if you’re concerned with speed. Adiantum was developed for mobile devices and so Google wanted to reduce the number of rounds while maintaining a margin of cryptographic safety.

Cryptographer Jean-Philippe Aumasson suggests 8 rounds of ChaCha is plenty. He reports that there is a known attack on ChaCha with 6 rounds that requires on the order of 2116 operations, but that ChaCha with 5 rounds is definitely breakable, requiring on the order of only 216 operations.

Three rounds of ChaCha are not enough [1], but four rounds are enough to satisfy DIEHARDER, PractRand, and TestU01 [2]. This illustrates the gap between adequate statistical quality and adequate cryptographic quality: Four rounds of ChaCha are apparently enough to produce the former but five rounds are not enough to produce the latter.

There doesn’t seem to be any reason to use ChaCha with four rounds. If you don’t need cryptographic security, then there are faster RNGs you could use. If you do need security, four rounds are not enough.

ChaCha with six rounds seems like a good compromise if you want an RNG that is fast enough for general use and that that also has reasonably good cryptographic quality. If you want more safety margin for cryptographic quality, you might want to go up to eight rounds.

What a difference one round makes

One thing I find interesting about random number generation and block encryption is that a single round of obfuscation can make a huge difference. ChaCha(3) fails statistical tests but ChaCha(4) is fine. ChaCha(5) is easily broken but ChaCha(6) is not.

Interesting failures

Often random number generators are either good or bad; they pass the standard test suites or they clearly fail. ChaCha(3) is interesting in that it is somewhere in between. As the results in the footnotes show, ChaCha(3) is an intermediate case. DIEHARDER hints at problems, but small crush thinks everything is fine. The full crush battery however does find problems.

The decisive failure of the Fourier tests is understandable: low-quality generators often fail spectral tests. But the results of the simple poker test are harder to understand. What is it about simulating poker that makes ChaCha(3) fail spectacularly? And in both cases, one more round of ChaCha fixes the problems.

Related posts

[1] ChaCha(3) passes passes DIEHARDER, though five of the tests passed with a “weak” pass. Passes TestU01 small crush but fails full crush:

========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        32-bit stdin
 Number of statistics:  144
 Total CPU time:   00:39:05.05
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 25  SimpPoker, d = 64                eps
 26  SimpPoker, d = 64                eps
 27  CouponCollector, d = 4          7.7e-4
 28  CouponCollector, d = 4          7.4e-4
 29  CouponCollector, d = 16          eps
 33  Gap, r = 0                      2.7e-8
 51  WeightDistrib, r = 0             eps
 52  WeightDistrib, r = 8           5.2e-15
 53  WeightDistrib, r = 16           2.1e-5
 55  SumCollector                     eps
 69  RandomWalk1 H (L = 10000)       1.1e-4
 74  Fourier3, r = 0               1.3e-144
 75  Fourier3, r = 20               6.9e-44
 80  HammingWeight2, r = 0           2.8e-6
 83  HammingCorr, L = 300           3.2e-10
 84  HammingCorr, L = 1200          6.1e-13
 All other tests were passed

ChaCha(3) also fails PractRand decisively.

RNG_test using PractRand version 0.94
RNG = chacha(3), seed = 0x7221236f
test set = core, folding = standard (32 bit)

rng=chacha(3), seed=0x7221236f
length= 128 megabytes (2^27 bytes), time= 2.0 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/32]BCFN(2+0,13-6,T)         R= +31.0  p =  4.7e-11   VERY SUSPICIOUS
  [Low1/32]BCFN(2+1,13-6,T)         R= +27.4  p =  8.3e-10   VERY SUSPICIOUS
  [Low1/32]BCFN(2+2,13-6,T)         R= +52.7  p =  1.8e-18    FAIL !
  [Low1/32]BCFN(2+3,13-6,T)         R= +47.6  p =  9.5e-17    FAIL !
  [Low1/32]BCFN(2+4,13-7,T)         R= +26.1  p =  1.7e-8   very suspicious
  [Low1/32]DC6-9x1Bytes-1           R= +26.3  p =  1.3e-14    FAIL !
  [Low1/32]FPF-14+6/16:all          R=  +8.6  p =  2.2e-7   very suspicious
  ...and 147 test result(s) without anomalies

[2] ChaCha(4) passed TestU01 small crush and full crush. It passed PractRand using up to 512 GB.

Popcount: counting 1’s in a bit stream

Sometimes you need to count the number of 1’s in a stream of bits. The most direct application would be summarizing yes/no data packed into bits. It’s also useful in writing efficient, low-level bit twiddling code. But there are less direct applications as well. For example, three weeks ago this came up in a post I wrote about Pascal’s triangle.

The number of odd integers in the nth row of Pascal’s triangle equals 2b where b is the number of 1’s in the binary representation of n.

The function that takes a bit stream and returns the number of 1’s is commonly called popcount, short for population count.

Formula using floor

Here’s an interesting formula for popcount taken from Hacker’s Delight:

\mbox{popcount}(x) = x - \sum_{n=1}^\infty \left\lfloor \frac{x}{2^n} \right\rfloor

The sum is actually finite since after a certain point all the terms are zero. For example, if x = 13 (1101 in binary), then the right side is

13 – 6 – 3 – 1

which of course is 3.

Computing with gcc extensions

The gcc compiler has a function __builtin_popcount that takes an unsigned integer x and returns the number of bits in x set to 1. If the target platform has a chip instruction for computing popcount then compiler will generate code to call this instruction. Otherwise it uses library code.

For example, the following code prints 6, 1, and 2.

    #include <stdio.h>

    int main() {
        for (unsigned int x = 63; x < 66; x++)
            printf("%d\n", __builtin_popcount(x));

There are also functions __builtin_popcountl and __builtin_popcountll for unsigned long and unsigned long long arguments.

Computing in C

If you want to use your own C code to compute popcount rather than relying on compiler extensions, here are a couple possibilities taken from Hacker’s Delight. First, one that only requires unsigned ints.

    int pop1(unsigned x) {
        x -= ((x >> 1) & 0x55555555);
        x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
        x = (x + (x >> 4)) & 0x0F0F0F0F;
        x += (x >> 8);
        x += (x >> 16);    
        return x & 0x0000003F;

And here is a more elegant version that uses an unsigned long long.

    int pop2(unsigned x) {
        unsigned long long y;
        y = x * 0x0002000400080010ULL;
        y = y & 0x1111111111111111ULL;
        y = y * 0x1111111111111111ULL;
        y = y >> 60;
        return y;

A brief comment on hysteresis

You might hear hysteresis described as a phenomena where the solution to a differential equation depends on its history. But that doesn’t make sense: the solution to a differential equation always depends on its history. The solution at any point in time depends (only) on its immediately preceding state. You can take the state at any particular time, say position and velocity, as the initial condition for the solution going forward.

But sometimes there’s something about the solution to a differential equation that does not depend on its history. For a driven linear oscillator, the frequency of the solution depends only on the frequency of the driving function. But this is not necessarily true of a nonlinear oscillator. The same driving frequency might correspond to two different solution frequencies, depending on whether the frequency of the driving function increased to its final value or decreased to that value.

Hysteresis is interesting, but it’s more remarkable that linear system does not exhibit hysteresis. Why should the solution frequency depend only on the driving frequency and not on the state of the system?

The future of system with hysteresis still depends only on its immediately preceding state. But there may be a thin set of states that lead to solutions with a given frequency, and the easiest way to arrive in such a set may be to sneak up on it slowly.

Related posts

Safe Harbor ain’t gonna cut it

There are two ways to deidentify data to satisfy HIPAA:

  • Safe Harbor, § 164.514(b)(2), and
  • Expert Determination, § 164.514(b)(1).

And for reasons explained here, you may need to be concerned with HIPAA even if you’re not a “covered entity” under the statute.

To comply with Safe Harbor, your data may not contain any of eighteen categories of information. Most of these are obvious: direct identifiers such as name, phone number, email address, etc. But some restrictions under Safe Harbor are less obvious and more difficult to comply with.

For example, under Safe Harbor you need to remove

All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

This would make it impossible, for example, to look at seasonal trends in medical procedures because you would only have data to the resolution of a year. But with a more sophisticated approach, e.g. differential privacy, it would be possible to answer such questions while providing better privacy for individuals. See how here.

If you need to comply with HIPAA, or analogous state laws such as TMPRA, and you can’t follow Safe Harbor, your alternative is expert determination. If you’d like to discuss expert determination, let’s talk.



Inverse congruence RNG

Linear congruence random number generators have the form

xn+1 = a xn + b mod p

Inverse congruence generators have the form

xn+1 = a xn-1 + b mod p

were x-1 means the modular inverse of x, i.e. the value y such that xy = 1 mod p. It is possible that x = 0 and so it doesn’t have an inverse. In that case the generator returns b.

Linear congruence generators are quite common. Inverse congruence generators are much less common.

Inverse congruence generators were first proposed by J. Eichenauer and J. Lehn in 1986. (The authors call them inversive congruential generators; I find “inverse congruence” easier to say than “inversive congruential.”)

An advantage of such generators is that they have none of the lattice structure that can be troublesome for linear congruence generators. This could be useful, for example, in high-dimensional Monte Carlo integration.

These generators are slow, especially in Python. They could be implemented more efficiently in C, and would be fast enough for applications like Monte Carlo integration where random number generation isn’t typically the bottleneck.

The parameters a, b, and p have to satisfy certain algebraic properties. The parameters I’ll use in the post were taken from [1]. With these parameters the generator has maximal period p.

The generator naturally returns integers between 0 and p. If that’s what you want, then the state and the random output are the same.

If you want to generate uniformly distributed real numbers, replace x above with the RNG state, and let x be the state divided by p. Since p has 63 bits and a floating point number has 53 significant bits, all the bits of the result should be good.

Here’s a Python implementation for uniform floating point values. We pass in x and the state each time to avoid global variables.

    from sympy import mod_inverse

    p = 2**63 - 25
    a = 5520335699031059059
    b = 2752743153957480735

    def icg(pair):
        x, state = pair
        if state == 0:
            return (b/p, b)
            state = (a*mod_inverse(state, p) + b)%p
        return (state/p, state)

If you’re running Python 3.8 or later, you can replace

    mod_inverse(state, p)


    pow(state, -1, p)

and not import sympy.

Here’s an example of calling icg to print 10 floating point values.

    x, state = 1, 1
    for _ in range(10):
        (x, state) = icg((x, state))

If we want to generate random bits rather than floating point numbers, we have to work a little harder. If we just take the top 32 bits, our results will be slightly biased since our output is uniformly distributed between 0 and p, not between 0 and a power of 2. This might not be a problem, depending on the application, since p is so near a power of 2.

To prevent this bias, I used a acceptance-rejection sampling method, trying again when the method generates a value above the largest multiple of 232 less than p. The code should nearly always accept.

    T = 2**32
    m = p - p%T

    def icg32(pair):
        x, state = pair
        if state == 0:
            return (b % T, b)
        state = (a*mod_inverse(state, p) + b)%p
        while state > m:  # rare
            state = (a*mod_inverse(state, p) + b)%p
        return (state % T, state)

You could replace the last line above with

    return state >> (63-32)

which should be more efficient.

The generator passes the four most commonly used test suites

  • PractRand
  • TestU01

whether you reduce the state mod 32 or take the top 32 bits of the state. I ran the “small crush” battery of the TestU01 test suite and all tests passed. I didn’t take the time to run the larger “crush” and “big crush” batteries because they require two and three orders of magnitude longer.

More on random number generation

[1] Jürgen Eichenauer-Herrmann. Inversive Congruential Pseudorandom Numbers: A Tutorial. International Statistical Review, Vol. 60, No. 2, pp. 167–176.

A better adaptive Runge-Kutta method

This is the third post in a series on Runge-Kutta methods. The first post in the series introduces Runge-Kutta methods and Butcher tableau. The next post looked at Fehlberg’s adaptive Runge-Kutta method, first published in 1969. This post looks at a similar method from Dormand and Prince in 1980.

Like Fehlberg’s method, the method of Dormand and Prince can be summarized in a big, intimidating tableau, which we will display below. However we will discuss three differences between the methods:

  • Order 4(5) vs 5(4)
  • Derivative reuse
  • Precision / computation ratio

Dormand Prince tableau

Here’s the Butcher tableau for the Dormand-Prince method in all it’s glory:

Dormand-Prince tableau

The only detail of the table that will be important below is that 7th and 8th rows are identical.

Order 4(5) vs order 5(4)

Fehlberg’s method, a.k.a. RKF45, computes each update to the solution using a 4th order Runge-Kutta method, and uses a 5th order Runge-Kutta method to estimate the error.

The method of Dormand and Prince also uses 4th and 5th order Runge-Kutta methods, but in the opposite way. The fifth order method is used to advance the solution, and the 4th order method is used for comparison to estimate error.

Derivative reuse

The work in solving

y' = f(t, y)

by a Runge-Kutta method is roughly proportional to the number of stages. Dormand-Prince is a 7-stage method while Fehlberg is a 6-stage method, so it would seem that the latter is more efficient. However, if you look back at the Dormand-Prince tableau, the last row above the horizontal line equals the first row below the line. That means that the last evaluation of f at one step can be reused at the first evaluation of f at the next step.

Precision per unit work

In their book Solving Differential Equations, vol. 1, Hairer et al compare several adaptive Runge-Kutta methods, including Fehlberg (RKF45) and Dormand-Prince, and conclude that the latter produces more precision per unit work.

We again see that the [Fehlberg] method underestimates the local error. Further, with the use of local extrapolation, the advantage of RKF4(5) melts away to a large extent. The best method of all these is without a doubt the coefficient set of Dormand and Prince.

More differential equation posts

How to estimate ODE solver error

This post brings together several themes I’ve been writing about lately: caching function evaluations, error estimation, and Runge-Kutta methods.

A few days ago I wrote about how Runge-Kutta methods can all be summarized by a set of numbers called the Butcher tableau. These methods solve

y' = f(t, y)

by evaluating f at some partial step, then evaluating f again at some new partial step and some linear combination of values from previous steps, etc. In the preface to J. C. Butcher’s book on differential equations, JM Sanz-Serna describes this structure as follows.

Runge-Kutta schemes … are highly nonlinear with a remarkable Matrioshka doll structure, where the vector field has to be evaluated an an expression that involves the vector field evaluated at and expression that involves the vector field …

Once all the “Matrioshka dolls” are lined up, i.e. all the intermediate results have been calculated, the final estimate is a linear combination of these values.

Here’s the clever idea behind adaptive solvers: Create two Runge-Kutta methods of different orders that depend on the same intermediate results. Then both can be computed without new function evaluations, and the results compared. The difference between the results can be used as an estimate of the local error. Then you can adjust your step size accordingly and try again.

I’m going to present two adaptive Runge-Kutta schemes. I’ll go over Fehlberg’s method in this post and a variation in the next post that has some advantages over Fehlberg’s method.

Runge-Kutta-Felhberg (RKF45)

Fehlberg’s method, commonly known as RKF45, starts with a six-stage Runge Kutta method whose coefficients are given by the following tableau.

\begin{array} {c|ccccccc} 0\\ \frac{1}{4} & \frac{1}{4} \\ \frac{3}{8} & \frac{3}{32} & \frac{9}{32} \\ \frac{12}{13} & \frac{1932}{2197} & \frac{7296}{2197} & 0 \\ 1 & \frac{439}{216} & -8 & \frac{3680}{513} & -\frac{845}{4104} \\ \frac{1}{2} & -\frac{8}{27} & 2 & -\frac{3544}{2565} & \frac{1859}{4104} & -\frac{11}{40} \\ \hline y_1 & \frac{25}{216} & 0 & \frac{1408}{2565} & \frac{2197}{1404} & -\frac{1}{5} & 0 & \\ \end{array}

The meaning of the tableau is described here. Imagine the effort it took Erwin Felhberg to discover this in 1969.

What’s important about the tableau for this post is that the coefficients above the horizontal line are used to create six numbers, the k‘s in the notation of the post referenced above. The k‘s are multiplied by the coefficients below the horizontal line to produce the solution of the differential equation at the next step. This is the “4” of RKF45.

RKF45 then applies another method which reuses all the k‘s in a different linear combination. This is summarized in the following variation of the Butcher tableau.

\begin{array} {c|ccccccc} 0\\ 1/4 & 1/4 \\ 3/8 & 3/32 & 9/32 \\ 12/13 & 1932/2197 & 7296/2197 & 0 \\ 1 & 439/216 & -8 & 3680/513 & -845/4104 \\ 1/2 & -8/27 & 2 & -3544/2565 & 1859/4104 & -11/40 \\ \hline y_1 & 25/216 & 0 & 1408/2565 & 2197/1404 & -1/5 & 0 & \\ \hline \hat{y} & 16/135 & 0 & 6656/12825 & 28561/56430 & -9/50 & 2/55 \end{array}

This is essentially two tableau combined into one. The first is as above. The second is like the one above but with a different bottom row. The bottom row gives the coefficients for a Runge-Kutta method corresponding to the “5” part of RKF45.

The 4 stands for 4th order, i.e. the local error for a step size h is O(h4). The 5 stands for 5th order, i.e. local error O(h5). RKF45 is two different methods, but they share so much computation that the second one almost comes for free; it does not require any new function evaluations, only taking a linear combination of six numbers.

However, RK45 is only “free” if you’ve gone to the effort of using a six-stage method. The amount of computation is roughly proportional to the number of stages, so we do about 50% more work to have RKF45 with an error estimate than doing the most common 4th order RK method. So if you knew exactly what step size to use, basic RK would be more efficient. But how could you know the optimal step size a priori?

By guiding us to choose the right step size, the extra work in RKF45 more than pays for itself. It could save a lot of computation that would come from using too small a step size, or prevent inaccurate results due to using too large a step size. Or maybe both: maybe a differential equation needs small steps at one period of time and can use larger steps at another period of time.

For more information on Felhberg’s method, see Solving Differential Equations by Hairer et al.

Related posts