Beta inequalities with integer parameters

Suppose X is a beta(a, b) random variable and Y is a beta(cd) random variable. Define the function

g(a, b, c, d) = Prob(X > Y).

At one point I spent a lot of time developing accurate and efficient ways to compute g under various circumstances. I did this because when I worked at MD Anderson Cancer Center we spent thousands of CPU-hours computing this function inside clinical trial simulations.

For this post I want to outline how to compute g in a minimal environment, i.e. a programming language with no math libraries, when the parameters a, b, c, and d are all integers. The emphasis is on portability, not efficiency. (If we had access to math libraries, we could solve the problem for general parameters using numerical integration.)

In this technical report I develop several symmetries and recurrence relationships for the function g. The symmetries can be summarized as

g(a, b, c, d) = g(d, c, b, a) = g(d, b, c, a) = 1 − g(c, d, a, b).

Without loss of generality we assume a is the smallest argument. If not, apply the symmetries above to make the first argument the smallest.

We will develop a recursive algorithm for computing the function g by recurring on a. The algorithm will terminate when a = 1, and so we want a to be the smallest parameter.

In the technical report cited above, I develop the recurrence relation

g(a + 1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a


h(a, b, c, d) = B(a + c, b + d) / B(a, b) B(c, d)

and B(x, y) is the beta function.

The equation above explains how to compute g(a + 1, b, c, d) after you’ve already computed g(a, b, c, d), which is what I often needed to do in applications. But we want to do the opposite to create a recursive algorithm:

g(a, b, c, d) = g(a – 1, b, c, d) + h(a – 1, b, c, d)/(a – 1)

for a > 1.

For the base case, when a = 1, we have

g(1, b, cd) = B(c, b + d) / B(c, d).

All that’s left to develop our code is to evaluate the Beta function.


B(x, y) = Γ(x + y) / Γ(x) Γ(y)


Γ(x)  = (x – 1)!.

So all we need is a way to compute the factorial, and that’s such a common task that it has become a cliché.

The symmetries and recurrence relation above hold for non-integer parameters, but you need a math library to compute the gamma function if the parameters are not integers. If you do have a math library, the algorithm here won’t be the most efficient one unless you need to evaluate g for a sequence of parameters that differ by small integers.


For more on random inequalities, here’s the first of a nine-part sequence of blog posts.

Unix via etymology

There are similarities across Unix tools that I’ve seldom seen explicitly pointed out. For example, the dollar sign $ refers to the end of things in multiple contexts. In regular expressions, it marks the end of a string. In sed, it refers to last line of a file. In vi it is the command to move to the end of a line.

Think of various Unix utilities lined up in columns: a column for grep, a column for less, a column for awk, etc. I’m thinking about patterns that cut horizontally across the tools.

Tools are usually presented vertically, one tool at a time, and for obvious reasons. If you need to use grep, for example, then you want to learn grep, not study a dozen tools simultaneously, accumulating horizontal cross sections like plastic laid down by a 3D printer, until you finally know what you need to get your job done.

On the other hand, I think it would be useful to point out some of the similarities across tools that experienced users pick up eventually but seldom verbalize. At a minimum, this would make an interesting blog post. (That’s not what this post is; I’m just throwing out an idea.) There could be enough material to make a book or a course.

I used the word etymology in the title, but that’s not exactly what I mean. Maybe I should have said commonality. I’m more interested in pedagogy than history. There are books on Unix history, but that’s not quite what I have in mind. I have in mind a reader who would appreciate help mentally organizing a big pile of syntax, not someone who is necessarily a history buff.

If the actual etymology isn’t too complicated, then history and pedagogy can coincide. For example, it’s helpful to know that sed and vi have common syntax that they inherited from ed. But pseudo-history, a sort of historical fiction, might be more useful than factually correct history with all its rabbit trails and dead ends. This would be a narrative that tells a story of how things could have developed, even while acknowledging that the actual course of events was more complicated.

Related post: Command option patterns

Functions in bc

The previous post discussed how you would plan an attempt to set a record in computing ζ(3), also known as Apéry’s constant. Specifically that post looked at how to choose your algorithm and how to anticipate the number of terms to use.

Now suppose you wanted to actually do the calculation. This post will carry out a calculation using the Unix utility bc. I often use bc for extended precision calculation, but mostly for simple things [1].

Calculating Apéry’s constant will require a function to compute binomial coefficients, something not built into bc, and so this post will illustrate how to write custom functions in bc.

(If you wanted to make a serious attempt at setting a record computing Apéry’s constant, or anything else, you would probably use an extended precision library written in C MPFR or write something even lower level; you would not use bc.)

Apéry’s series

The series we want to evaluate is

\zeta(3) = \frac{5}{2} \sum_{n=1}^\infty (-1)^{n-1} \frac{1}{\dbinom{2n}{n} n^3}

To compute this we need to compute the binomial coefficients in the denominator, and to do that we need to compute factorials.

From calculations in the previous post we estimate that summing n terms of this series will give us 2n bits of precision, or 2n/3 decimal places. So if we carry out our calculations to n decimal places, that gives us more precision than we need: truncation error is much greater than numerical error.

bc code

Here’s the code, which I saved in the file zeta3.b. I went for simplicity over efficiency. See [2] for a way to make the code much more efficient.

    # inefficient but simple factorial
    define fact(x) {
        if (x <= 1)
            return (1);
        return (x*fact(x-1));

    # binomial coefficient n choose k
    define binom(n, k) {
        return (fact(n) / (fact(k)*fact(n-k)));

    define term(n) {
        return ((-1)^(n-1)/(n^3 * binom(2*n, n)))

    define zeta3(n) {
        scale = n
        sum = 0
        for (i = 1; i <= n; ++i)
            sum += term(i);
        return (2.5*sum);

Now say we want 100 decimal places of ζ(3). Then we should need to sum about 150 terms of the series above. Let’s sum 160 terms just to be safe. I run the code above as follows [3].

    $ bc -lq zeta3.b

This returns


How well did we do?

I tested this by computing ζ(3) to 120 decimals in Mathematica with

    N[RiemannZeta[3], 120]

and subtracting the value returned by bc. This shows that the error in our calculation above is approximately 10-102. We wanted at least 100 decimal places of precision and we got 102.


[1] I like bc because it’s simple. It’s a little too simple, but given that almost all software errs on the side of being too complicated, I’m OK with bc being a little too simple. See this post where I used (coined?) the phrase controversially simple.

[2] Not only is the recursive implementation of factorial inefficient, computing factorials from scratch each time, even by a more efficient algorithm, is not optimal. The more efficient thing to do is compute each new coefficient by starting with the previous one. For example, once we’ve already computed the binomial coefficient (200, 100), then we can multiply by 202*201/101² in order to get the binomial coefficient (202, 101).

Along the same lines, computing (-1)^(n-1) is wasteful. When bootstrapping each binomial coefficient from the previous one, multiply by -1 as well.

[3] Why the options -lq? The -l option does two things: it loads the math library and it sets the precision variable scale to 20. I always use the -l flag because the default scale is zero, which lops off the decimal part of all floating point numbers. Strange default behavior! I also often need the math library. Turns out -l wasn’t needed here because we explicitly set scale in the function zeta3, and we don’t use the math library.

I also use the -q flag out of habit. It starts bc in quiet mode, suppressing the version and copyright announcement.

Planning a world record calculation

Before carrying out a big calculation, you want to have an idea how long various approaches would take. This post will illustrate this by planning an attempt to calculate Apéry’s constant

\zeta(3) = \sum_{n=1}^\infty \frac{1}{n^3}

to enormous precision. This constant has been computed to many decimal places, in part because it’s an open question whether the number has a closed form. Maybe you’d like to set a new record for computing it.

This post was motivated by a question someone asked me in response to this post: What the advantage is in having an approximation for binomial coefficients when we can just calculate them exactly? This post will show these approximations in action.

Apéry’s series

First of all, calculating ζ(3) directly from the definition above is not the way to go. The series converges too slowly for that. If you compute the sum of the first N terms, you get about 2 log10 N decimal places. This post explains why. We can do much better, with the number of decimal places achieved being proportional to N rather than log N.

A couple days ago I wrote about a sequence rapidly converging series for ζ(3), starting with Apéry’s series.

\zeta(3) = \frac{5}{2} \sum_{n=1}^\infty (-1)^{n-1} \frac{1}{\dbinom{2n}{n} n^3}

Suppose you want to compute ζ(3) to a million decimal places using this series. How would you do this?

Since this is an alternating series, the truncation error from summing only the first N terms is bounded by the N+1 term. So you could sum terms until you get a term less than


Great, but how long might that take?

Without some estimate of binomial coefficients, there’s no way to know how long this would take without computing a bunch of binomial coefficients. But with a little theory, you can do a back-of-the-envelope estimate.

Estimating how many terms to use

For large n, the binomial coefficient in the denominator of the nth term is approximately 4n/√(πn). So when you include the n³ term, you can see that the denominator of the nth term in the series is greater than 4n = 22n. In other words, each term gives you at least 2 bits of precision.

If you want 1,000,000 digits, that’s approximately 3,000,000 bits. So to get a million digits of ζ(3) you’d need to sum about 1,500,000 terms of Apéry’s series. By comparison, summing 1,500,000 terms in the definition of ζ(3) would give you between 12 and 13 digits precision.

More efficient series

Apéry’s series is only the first of a sequence of rapidly converging series for ζ(3). The series get more complicated but more efficient as a parameter s increases. Is it worthwhile to use a larger value of s? If you were planning to devote, say, a month of computing resources to try to set a record, it would be a good idea to spend a few hours estimating whether another approach would get more work done in that month.

By combining information here and here you could find out that the series corresponding to s = 2 has terms whose denominator is on the order of 33n. To estimate how many terms you’d need for a million digits of precision, you’d need to solve the equation

33n = 10106,

and by taking logs you find this is 1,047,951. In other words, about 1/3 fewer terms than using Apéry’s series. That was a win, so maybe you go on to estimate whether you should go up to s = 3. The key information you need in these calculations is the estimate on binomial coefficients given here.

Incidentally, computing ζ(3) to a million decimal places would have given you the world record in 1997. At the moment, the record is on the order of a trillion digits.

Update: The next post shows how to actually carry out (on a smaller scale) the calculations described here using the Unix utility bc.

Parallel versus sequential binding

If someone tells you they want to replace A’s with B’s and B’s with A’s, they are implicitly talking about parallel assignment. They almost certainly don’t mean “Replace all A’s with B’s. Then replace all B’s with A’s.”

They expect the name of the Swedish pop group ABBA to be turned into “BAAB”, but if you carry out the steps sequentially, you first get “BBBB” and then “AAAA”.

Variations on this problem come up all the time. For example, it came up on a project I was working on with associative computing. Documentation specified a pair of assignment operations for an array of bits, and it was unclear whether the assignment was sequential or parallel. A colleague asked “Is this let or let star?”, alluding to a convention in Lisp programming. In Common Lisp, LET* executes a list of assignments sequentially, and LET executes a list of assignments in parallel.

This weekend I wrote a post about reversing Morse code sequences. If you transmit Morse code backwards, several pairs of letters are swapped. For example, G’s (--.) become W’s (.--) and vice versa.

To search for Morse code palindromes, words with symmetric Morse code representations, I searched a dictionary using parallel replacement. I did this by reversing the letters in a word and then replacing the letters by the letters with the reversed Morse code. You could do the latter step with the tr utility included with any Unix-like system.


This says “Replace A with N, B with V, …, N with A, V with B, … all in parallel.” If tr did sequential replacement, the result would not be what we want, analogous to turning ABBA into AAAA above.

Not only does tr handle swaps, it handles more complex permutations. If you wanted to rotate A to B, B to C, and C to A in parallel, it handles this just fine.

    $ echo ALCIBIADES | tr ABC BCA

It helps to have a name for this pattern. A formal description is sequential versus parallel binding, but I like the slang LET versus LET*. More people will understand the more verbose description, but if a group has to use the terms often, maybe they could adopt the LET vs LET* slang. And more importantly, maybe they could adopt the naming convention of using a star, or some other marker, to distinguish sequential and parallel APIs.

(Full disclosure: I initially got LET and LET* backward. So maybe the star thing isn’t so mnemonic.)

Related posts

Differential Equations and Department Stores

Howard Aiken on the uses of computers, 1955:

If it should turn out that the basic logics of a machine designed for the numerical solution of differential equations coincide with the basic logics of a machine intended to make bills for a department store, I would regard this as the most amazing coincidence I have ever encountered.

Update: Some people have read the quote above and thought Aiken was ignorant of the work of Turing et al. I assumed he was speaking in terms of what was practical rather than what was possible, which is apparently correct.

Thanks to Anatoly Vorobey in the comments below, I found a paper that goes into more background. From that paper:

Aiken’s theme in the lecture … was that a machine designed primarily for scientific use was far from ideal for business computing. … For example, scientific computing (Aiken pointed out) involves relatively small amounts of data and complex processing, whereas business computing involves large amounts of data and relatively shallow processing.

Also a crypto library

The home page for the OpenSSL project says

OpenSSL is a robust, commercial-grade, and full-featured toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols. It is also a general-purpose cryptography library. …

If you’ve never heard of the project before, you would rightly suppose that OpenSSL implements SSL (and its successor TLS). But you might not realize that OpenSSL “is also a general-purpose cryptography library.”

After thinking about it a bit, you might realize that software implementing SSL must have some encryption capability, but it doesn’t follow that this capability would necessarily be readily accessible. In fact, OpenSSL has implements a lot of cryptography algorithms and makes them easy to use from the command line. For example, this post shows how to compute hash functions using the openssl command.

Earlier today I wrote a thread on @CompSciFact about the famous example of encrypting an image of the Linux mascot Tux using ECB (Electronic Code Book) mode. As the saying goes, you should never use ECB “because you can see the penguin.”

Original encrypted Tux image

I wanted to try reproducing the example, and my first thought was to use Python. But setting up encryption libraries is a fairly lengthy process, while AES encryption using openssl is a one-liner.

My encrypted Tux image

You can still see the outline of Tux, but my penguin looks quite different from the famous example for a variety of reasons. For starters, I don’t know what key was used in the original image. Also, there are a variety of ways to extract the data from an image, encrypt it, and put it back. I basically followed Filippo Valsorda’s post The ECB Penguin but I had to make a few changes to get it to work due to changes in GIMP since that post was written.

Naive compression of genetic data

There are special compression algorithms for genetic sequence data, but I was curious how well simply zipping a text file would work.

I downloaded a 14 MB text file containing DNA sequence data from a fruit fly and compressed it as a zip file and as a 7z file. The result was about 3.5 MB, which is basically no compression beyond the obvious.

The file contains millions of A’s, C’s, G’s, and T’s and nothing more [0]. (I prepared the file by removing comments and line breaks.)


There are only four possible characters, so each carries two bits of information [1], but is encoded in an 8-bit ASCII character. The most obvious encoding would pack four letters into a byte. That would compress a 14 MB text file down to a 3.5 MB binary file.

Here are the exact numbers.

| file     |     size |  ratio|
| original | 14401122 | 1.000 |
| zip      |  3875361 | 3.716 |
| 7z       |  3419294 | 4.212 |

So the 7z format does a little better than simply packing groups of four letters into a byte, but the zip format does not.

There is repetition in genome sequences, but apparently generic compression software is unable to exploit this repetition. You can do much better with compression algorithms designed for genetic data.


The plot thickens. Imran Haque kindly let me know that I overlooked the N’s in the data. These are placeholders for unresolved bases. You can’t simply encode each base using two bits because you need five states.

The number of N’s is small—at least in this example, though I imagine they would be more common in lower quality data—and so the calculation in footnote [1] is still approximately correct [2]. There are about two bits of information in each base, but this is on average. Shannon would say you can expect to compress your text file by at least 75%. But you can’t simply represent each base with two bits as I suggested because you need to make room for the possibility of a null base.

Note that the total information of a file is not the number of symbols times the information per symbol, unless all the symbols are independent. In the case of genetic data, the symbols are not independent, and so more compression is possible.


[0] Or so I thought. See the Update section.

[1] The letter frequencies are not exactly equal, but close: 26.75% A, 23.67% C, 23.94% G, and 25.58% T. The Shannon entropy is 1.998, so essentially two bits.

[2] N’s account for 0.04% of the data. Accounting for the N’s increases the Shannon entropy to 2.002.

Offline documentation

It’s simpler to search the web than to search software-specific documentation. You can just type your query into a search engine and not have to be bothered by the differences in offline documentation systems for different software. But there are a couple disadvantages.

First, the result may not be that relevant. For example, maybe you have a question about LaTeX typesetting and you get back results about rubber. And even if the result is relevant, it might not be accurate or up-to-date.

Second, you might not always be online. You might lose your internet connection, or you might deliberately stay offline for a block of time in order to concentrate better.

A convenient way to grab online documentation for a lot of software packages is to use Dash for macOS or Zeal on Windows and Linux.

If you use a particular piece of software a lot, you probably want to learn how to use its native documentation system. It’s hard to do this for lots of different tools, hence the popularity of the generic web search, but it’s worthwhile for a small number of high priority tools.

Finding computer algebra algorithms with computer algebra

I ran across an interesting footnote in Wolfram Koepf’s book Computer Algebra.

Gosper’s algorithm [1] was probably the first algorithm which would not have been found without computer algebra. Gosper writes in his paper: “Without the support of MACSYMA and its developer, I could not have collected the experiences necessary to provoke the conjectures that lead to the algorithm.”

When I first glanced at the footnote I misread it, coming away with the impression that Gosper said he could not have proved that his algorithm was correct without computer algebra. But he said he would not have formulated his algorithm without his experience using MACSYMA.

If a sum of hypergeometric terms is itself hypergeometric, Gosper’s algorithm gives a way to find the sum. Gosper’s algorithm, discovered using computer algebra, is itself an important computer algebra algorithm.

Related posts

[1] Koepf cites Gosper as R. W. Gosper, Jr. Decision procedure for indefinite hypergeometric summation. Proc. Natl. Acad. Sci. USA, 75, 1978, 40–42.