Combinators and Catalan numbers

I seem to run into Catalan numbers fairly regularly. This evening I ran into Catalan numbers while reading Stephen Wolfram’s book Combinators: A Centennial View [1]. Here’s the passage.

The number of combinator expressions of size n grows like

{2, 4, 16, 80, 448, 2688, 16896, 109824, 732160, 4978688}

or in general

2n CatalanNumber[n-1] = 2n Binomial[2n – 2, n – 1]/n

or asymptotically:

\frac{8^n}{4\,\sqrt{\pi}\,n^{3/2}}

Specifically, Wolfram is speaking about S and K combinators. (See a brief etymology of combinator names here.)

The Catalan numbers are defined by

C+n = \frac{1}{n+1} {2n \choose n}

Replacing n with n-1 above shows that

CatalanNumber[n-1] = Binomial[2n – 2, n – 1]/n

as claimed above.

The formula for the number of combinators is easy to derive [1]. There are 2n ways to write down a string of n characters that are either S or K. Then the number of ways to add parentheses is the (n-1)st Catalan number, as explained here.

Wolfram’s asymptotic expression is closely related to a blog post I wrote two weeks ago called kn choose n. That post looked at asymptotic formulas for

f(n, k) = \binom{kn}{n}
and Catalan numbers correspond to f(n, 2)/(n + 1). That post showed that

f(n,2) \approx \frac{4^n}{\sqrt{\pi n}}

The asymptotic estimate stated in Wolfram’s book follows by replacing n with n-1 above and multiplying by 2n/n. (Asymptotically there’s no difference between n and n-1 in the denominator.)

More posts on Catalan numbers

[1] Believe it or not, I’m reading this in connection with a consulting project. When Wolfram’s book first came out, my reaction was something like “Interesting, though it’ll never be relevant to me.” Famous last words.

[2] Later in the book Wolfram considers combinators made only using S. This is even simpler since are no choices of symbols; the only choice is where to put the parentheses, and CatalanNumber[n-1] tells you how many ways that can be done.

Planetary code golf

Suppose you’re asked to write a function that takes a number and returns a planet. We’ll number the planets in order from the sun, starting at 1, and for our purposes Pluto is the 9th planet.

Here’s an obvious solution:

    def planet(n):
        planets = [
            "Mercury",
            "Venus",
            "Earth",
            "Mars",
            "Jupiter",
            "Saturn",
            "Uranus",
            "Neptune",
            "Pluto"
        ]
        # subtract 1 to correct for unit offset
        return planets[n-1] 

Now let’s have some fun with this and play a little code golf. Here’s a much shorter program.

    def planet(n): return chr(0x263e+n)

I was deliberately ambiguous at the top of the post, saying the code should return a planet. The first function naturally interprets that to mean the name of a planet, but the second function takes it to mean the symbol of a planet.

The symbol for Mercury has Unicode value U+263F, and Unicode lists the symbols for the planets in order from the sun. So the Unicode character for the nth planet is the character for Mercury plus n. That would be true if we numbered the planets starting from 0, but conventionally we call Mercury the 1st planet, not the 0th planet, so we have to subtract one. That’s why the code contains 0x263e rather than 0x263f.

We could make the function just a tiny bit shorter by using a lambda expression and using a decimal number for 0x263e.

    planet = lambda n: chr(9790+n)

Display issues

Here’s a brief way to list the symbols from the Python REPL.

    >>> [chr(0x263f+n) for n in range(9)]
    ['☿', '♀', '♁', '♂', '♃', '♄', '♅', '♆', '♇']

You may see the symbols for Venus and Mars above rendered with a colored background, depending on your browser and OS.

Here’s what the lines above looked like at my command line.

screen shot of repl

Here’s what the symbols looked like when I posted them on Twitter.

symbols for planets

For reasons explained here, some software interprets some characters as emoji rather than literal characters. The symbols for Venus and Mars are used as emoji for female and male respectively, based on the use of the symbols in biology. Incidentally, these symbols were used to represent planets long before biologists used them to represent sexes.

Related posts

Beta inequalities with integer parameters

Suppose X is a beta(a, b) random variable and Y is a beta(cd) random variable. Define the function

g(a, b, c, d) = Prob(X > Y).

At one point I spent a lot of time developing accurate and efficient ways to compute g under various circumstances. I did this because when I worked at MD Anderson Cancer Center we spent thousands of CPU-hours computing this function inside clinical trial simulations.

For this post I want to outline how to compute g in a minimal environment, i.e. a programming language with no math libraries, when the parameters a, b, c, and d are all integers. The emphasis is on portability, not efficiency. (If we had access to math libraries, we could solve the problem for general parameters using numerical integration.)

In this technical report I develop several symmetries and recurrence relationships for the function g. The symmetries can be summarized as

g(a, b, c, d) = g(d, c, b, a) = g(d, b, c, a) = 1 − g(c, d, a, b).

Without loss of generality we assume a is the smallest argument. If not, apply the symmetries above to make the first argument the smallest.

We will develop a recursive algorithm for computing the function g by recurring on a. The algorithm will terminate when a = 1, and so we want a to be the smallest parameter.

In the technical report cited above, I develop the recurrence relation

g(a + 1, b, c, d) = g(a, b, c, d) + h(a, b, c, d)/a

where

h(a, b, c, d) = B(a + c, b + d) / B(a, b) B(c, d)

and B(x, y) is the beta function.

The equation above explains how to compute g(a + 1, b, c, d) after you’ve already computed g(a, b, c, d), which is what I often needed to do in applications. But we want to do the opposite to create a recursive algorithm:

g(a, b, c, d) = g(a – 1, b, c, d) + h(a – 1, b, c, d)/(a – 1)

for a > 1.

For the base case, when a = 1, we have

g(1, b, cd) = B(c, b + d) / B(c, d).

All that’s left to develop our code is to evaluate the Beta function.

Now

B(x, y) = Γ(x + y) / Γ(x) Γ(y)

and

Γ(x)  = (x – 1)!.

So all we need is a way to compute the factorial, and that’s such a common task that it has become a cliché.

The symmetries and recurrence relation above hold for non-integer parameters, but you need a math library to compute the gamma function if the parameters are not integers. If you do have a math library, the algorithm here won’t be the most efficient one unless you need to evaluate g for a sequence of parameters that differ by small integers.

***

For more on random inequalities, here’s the first of a nine-part sequence of blog posts.

Unix via etymology

There are similarities across Unix tools that I’ve seldom seen explicitly pointed out. For example, the dollar sign $ refers to the end of things in multiple contexts. In regular expressions, it marks the end of a string. In sed, it refers to last line of a file. In vi it is the command to move to the end of a line.

Think of various Unix utilities lined up in columns: a column for grep, a column for less, a column for awk, etc. I’m thinking about patterns that cut horizontally across the tools.

Tools are usually presented vertically, one tool at a time, and for obvious reasons. If you need to use grep, for example, then you want to learn grep, not study a dozen tools simultaneously, accumulating horizontal cross sections like plastic laid down by a 3D printer, until you finally know what you need to get your job done.

On the other hand, I think it would be useful to point out some of the similarities across tools that experienced users pick up eventually but seldom verbalize. At a minimum, this would make an interesting blog post. (That’s not what this post is; I’m just throwing out an idea.) There could be enough material to make a book or a course.

I used the word etymology in the title, but that’s not exactly what I mean. Maybe I should have said commonality. I’m more interested in pedagogy than history. There are books on Unix history, but that’s not quite what I have in mind. I have in mind a reader who would appreciate help mentally organizing a big pile of syntax, not someone who is necessarily a history buff.

If the actual etymology isn’t too complicated, then history and pedagogy can coincide. For example, it’s helpful to know that sed and vi have common syntax that they inherited from ed. But pseudo-history, a sort of historical fiction, might be more useful than factually correct history with all its rabbit trails and dead ends. This would be a narrative that tells a story of how things could have developed, even while acknowledging that the actual course of events was more complicated.

Related post: Command option patterns

Carbon nanotube-like plots

A few days ago I wrote a brief post showing an interesting pattern that comes from plotting sin(1), sin(2), sin(3), etc.

That post uses a logarithmic scale on the horizontal axis. You get a different, but also interesting, pattern when you use a linear scale.

Someone commented that this looks like a projection of a carbon nanotube, and I think they’re right.

Let’s generalize this by plotting sin(kn) for integer n. The plot above corresponds to k = 1. Tiny changes to k make a big difference in the appearance of the plot.

Here’s k = 0.995:

And here’s k = 1.005:

And here’s an animated plot, continuously changing k from 0.99 to 1.01.

This animation was created using the following Mathematica code.

    Animate[ListPlot[Table[Sin[k n], {n, 1, 1000}]], {k, 0.99, 1.01}, 
            AnimationRate -> 0.001, TrackedSymbols :> {k}]

Why is the word problem hard?

This post is about the word problem. When mathematicians talk about “the word problem” we’re not talking about elementary math problems expressed in prose, such as “If Johnny has three apples, ….”

The word problem in algebra is to decide whether two strings of symbols are equivalent given a set of algebraic rules. I go into a longer description here.

I know that the word problem is hard, but I hadn’t given much thought to why it is hard. It has always seemed more like a warning sign than a topic to look into.

“Why don’t we just …?”

“That won’t work because … is equivalent to the word problem.”

In logic terminology, the word problem is undecidable. There can be no algorithm to solve the word problem in general, though there are algorithms for special cases.

So why is the word problem undecidable? As with most (all?) things that have been shown to be undecidable, the word problem is undecidable because the halting problem is undecidable. If there were an algorithm that could solve the word problem in general, you could use it to solve the halting problem. The halting problem is to write a program that can determine whether another arbitrary program will always terminate, and it can be shown that no such program can exist.

The impulse for writing this post came from stumbling across Keith Conrad’s expository article Why word problems are hard. His article goes into some of the algebraic background of the problem—free groups, (in)finitely generated groups, etc.—and gives some flavor for why the word problem is harder than it looks. The bottom line is that the word problem is hard because the halting problem is hard, but Conrad’s article gives you a little more of an idea what’s going on that just citing a logic theorem.

I still don’t have a cocktail-party answer for why the word problem is hard. Suppose a bright high school student who had heard of the word problem were at a cocktail party (drinking a Coke, of course) and asked why the word problem is hard. Suppose also that this student had not heard of the halting problem. Would the simplest explanation be to start by explaining the halting problem?

Suppose we change the setting a bit. You’re teaching a group theory course and the students know about groups and generators, but not about the halting problem, how might you give them a feel for why the word problem is hard? You might ask them to read Keith Conrad’s paper and point out that it shows that simpler problems than the word problem are harder than they seem at first.

Related posts

Much less than, Much greater than

The symbols ≪ and ≫ may be confusing the first time you see them, but they’re very handy.

The symbol ≪ means “much less than, and its counterpart ≫ means “much greater than”. Here’s a little table showing how to produce the symbols.

    |-------------------+---------+-------+------|
    |                   | Unicode | LaTeX | HTML |
    |-------------------+---------+-------+------|
    | Much less than    | U+226A  | \ll   | ≪ |
    | Much greater than | U+226B  | \gg   | ≫ |
    |-------------------+---------+-------+------|

Of course “much” depends on context. Is 5 much less than 7? It is if you’re describing the height of people in feet, but maybe not in the context of prices of hamburgers in dollars.

Sometimes you’ll see ≫, or more likely >> (two greater than symbols), as slang for “is much better than.” For example, someone might say “prototype >> powerpoint” to convey that a working prototype is much better than a PowerPoint pitch deck.

The symbols ≪ and ≫ can make people uncomfortable because they’re insider jargon. You have to know the context to understand how to interpret them, but they’re very handy if you are an insider. All jargon is like this.

Below are some examples of ≪ and ≫ in practice.

Square root approximation

You might see somewhere that for |b| ≪ a, the following approximation holds:

\sqrt{a + b} \approx \sqrt{a} + \frac{b}{2\sqrt{a}}

So when is |b| much less than a? That’s up to you. If, in your context, you decide that b/a is small, the approximation error will be an order of magnitude smaller.

Suppose you need to know √103 to a couple decimal places. Here a = 100 and b = 3. The ratio b/a = 0.03, and your error should be small relative to 0.03, so the approximation above should be good enough. Let’s see if that’s right.

The approximation above gives

√103 ≈ √100 + 3/2√100 = 10 + 3/20 = 10.15

and the exact value of √103 is 10.14889…, and so we did get two correct decimal places, and we nearly got three.

Sine approximation

Rather than saying a variable is “small,” we might say it is much less than 1. For example, you may see

sin θ ≈ θ

for |θ| ≪ 1. If θ is small, the error in the approximation above is very small.

A few years I wrote a 700-word blog post unpacking in detail what the previous sentence means. A lot of people memorize “You can replace sin θ with θ for small angles” without thoroughly understanding what this means. How small is small enough? The post explains how to know.

Stirling’s formula

Sometimes you see something like n ≫ 1 to indicate that n must be large. For example, Stirling’s formula for factorials says

n! \sim \sqrt{2\pi n} \left(\frac{n}{e} \right)^n

for n ≫ 1. For instance, if n = 10, the approximation above has an error of less than 1%.

Note that the approximation error above is small relative to the exact value. The relative error is small, not the absolute error. The error in the example is more than 30,000, but this value is small relative to 10! = 3,628,800.

Asymmetry between small and large

It’s often harder to tell from context when something is large than when it is small.

If an approximation holds for |x| ≪ 1, there’s often an implicit power series in the background, and the error is on the order of x². That’s the case in our square root approximation above. The sine approximation is even better, with error on the order of x³.

But if an approximation holds for x ≫ 1, there’s often an implicit asymptotic series in the background, and these are more subtle. You likely need more context to how large x needs to be for a particular application.

Related posts

Integer sines

The following graph plots sin(1), sin(2), sin(3), etc. It is based on a graph I found on page 42 of Analysis by its History by Hairer and Wainer.

Here’s the Python code that produced the plot.

    import matplotlib.pyplot as plt
    from numpy import arange, sin
    
    x = arange(1, 3000)
    plt.scatter(x, sin(x), s=1)
    plt.xscale("log")
    plt.savefig("int_sin.png")

Double, triple, quadruple, …

I recently needed a word for “multiply by 13” that was parallel to quadruple for “multiply by 4”, so I made up triskadekaduple by analogy with triskadecaphobia. That got me to wondering how you make words for multiples higher than four.

The best answer is probably “don’t.” Your chances of being understood drop sharply after quadruple. But despite the warning sign saying “hic sunt dracones” we forge ahead.

Double, triple, and quadruple are based on Latin names for numbers, so we should keep using Latin prefixes. Next would be quintuple, but I expect you would likely be understood if you said pentuple based on Greek penta-.

Next would be sextuple, septuple, and octuple. These terms are understandable, particularly in the context of multiple births: sextuplets, septuplets, and octuplets.

But now we hit a brick wall. The Latin prefix for nine is novem-, and it’s unlikely anyone would understand novemple or anything like that. The Greek prefix ennea– is no better. Enneauple? Enneaduple?

(The Latin prefix novem– is recognizable from November, which is the 11th month, so does that mean novem– stands for 11? No, November really is the ninth month, or at least it was when the year started in March.

The only example I can think of for a word starting with ennea– is the enneagram personality classification system.)

The prefixes after novem– are equally obscure. But if we jump to 13, some people will have heard of triskadecaphobia. This comes from tris kai deka (three and ten) from Greek. But I would only use triskadecaduple tongue-in-cheek.

Related posts

Functions in bc

The previous post discussed how you would plan an attempt to set a record in computing ζ(3), also known as Apéry’s constant. Specifically that post looked at how to choose your algorithm and how to anticipate the number of terms to use.

Now suppose you wanted to actually do the calculation. This post will carry out a calculation using the Unix utility bc. I often use bc for extended precision calculation, but mostly for simple things [1].

Calculating Apéry’s constant will require a function to compute binomial coefficients, something not built into bc, and so this post will illustrate how to write custom functions in bc.

(If you wanted to make a serious attempt at setting a record computing Apéry’s constant, or anything else, you would probably use an extended precision library written in C MPFR or write something even lower level; you would not use bc.)

Apéry’s series

The series we want to evaluate is

\zeta(3) = \frac{5}{2} \sum_{n=1}^\infty (-1)^{n-1} \frac{1}{\dbinom{2n}{n} n^3}

To compute this we need to compute the binomial coefficients in the denominator, and to do that we need to compute factorials.

From calculations in the previous post we estimate that summing n terms of this series will give us 2n bits of precision, or 2n/3 decimal places. So if we carry out our calculations to n decimal places, that gives us more precision than we need: truncation error is much greater than numerical error.

bc code

Here’s the code, which I saved in the file zeta3.b. I went for simplicity over efficiency. See [2] for a way to make the code much more efficient.

    # inefficient but simple factorial
    define fact(x) {
        if (x <= 1)
            return (1);
        return (x*fact(x-1));
    }

    # binomial coefficient n choose k
    define binom(n, k) {
        return (fact(n) / (fact(k)*fact(n-k)));
    }

    define term(n) {
        return ((-1)^(n-1)/(n^3 * binom(2*n, n)))
    }

    define zeta3(n) {
        scale = n
        sum = 0
        for (i = 1; i <= n; ++i)
            sum += term(i);
        return (2.5*sum);
    }

Now say we want 100 decimal places of ζ(3). Then we should need to sum about 150 terms of the series above. Let’s sum 160 terms just to be safe. I run the code above as follows [3].

    $ bc -lq zeta3.b
    zeta3(160)

This returns

    1.202056903159594285399738161511449990764986292340498881792271555341\
83820578631309018645587360933525814493279797483589388315482364276014\
64239236562218818483914927

How well did we do?

I tested this by computing ζ(3) to 120 decimals in Mathematica with

    N[RiemannZeta[3], 120]

and subtracting the value returned by bc. This shows that the error in our calculation above is approximately 10-102. We wanted at least 100 decimal places of precision and we got 102.

Notes

[1] I like bc because it’s simple. It’s a little too simple, but given that almost all software errs on the side of being too complicated, I’m OK with bc being a little too simple. See this post where I used (coined?) the phrase controversially simple.

[2] Not only is the recursive implementation of factorial inefficient, computing factorials from scratch each time, even by a more efficient algorithm, is not optimal. The more efficient thing to do is compute each new coefficient by starting with the previous one. For example, once we’ve already computed the binomial coefficient (200, 100), then we can multiply by 202*201/101² in order to get the binomial coefficient (202, 101).

Along the same lines, computing (-1)^(n-1) is wasteful. When bootstrapping each binomial coefficient from the previous one, multiply by -1 as well.

[3] Why the options -lq? The -l option does two things: it loads the math library and it sets the precision variable scale to 20. I always use the -l flag because the default scale is zero, which lops off the decimal part of all floating point numbers. Strange default behavior! I also often need the math library. Turns out -l wasn’t needed here because we explicitly set scale in the function zeta3, and we don’t use the math library.

I also use the -q flag out of habit. It starts bc in quiet mode, suppressing the version and copyright announcement.