Mr. Bell and Bell numbers

One day Eric Temple Bell (1883–1960) was looking at the power series for the double exponential function, exp(exp(x)) and noticed a similarity to the power series for exp(x). You can find his account in [1]. He would have calculated the series by hand, but we have the advantage of software like Mathematica.

We can get the first five terms of the series, centered at 0, with the command

    Series[Exp[Exp[x]], {x, 0, 5}]

This give us

e+e x+e x^2+\frac{5 e x^3}{6}+\frac{5 e x^4}{8}+\frac{13 e x^5}{30}+ \ldots

If you pull out the factor of e from each term, and change the denominators to match those in the power series for exp(x) you get

e\left( 1 + \frac{1}{1!} x + \frac{2}{2!} x^2 + \frac{5}{3!} x^3 + \frac{15}{4!}x^4 + \frac{52}{5!} x^5 + \ldots\right)

with integers in all the numerators. It’s not obvious a priori that these numbers should even be integers, but they are,

Bell called the sequence numerators the exponential numbers: 1, 1, 2, 5, 15, 52, … The sequence is now known as the Bell numbers despite Bell’s modesty. Bell wasn’t the first to study this sequence of numbers, but he developed their properties more fully.

Applications

Bell numbers come up a lot in applications, which is why Bell wasn’t the first to notice them. (He may have been the first to come to them via their exponential generating function.) For example, the nth Bell number Bn is the number of ways to partition a set of n labeled items. Bn is also the nth moment of a Poisson random variable with λ = 1.

Bell’s triangle

There is a construction of Bell numbers analogous to Pascal’s triangle. Charles Sanders Peirce discovered what we now call Bell’s triangle fifty years before Bell discovered the Bell numbers.

To create Bell’s triangle, start with a row containing only 1.

The first number in each successive row is set to the last number in the previous row.

Then fill in the rest of the row by adding the number to the left and the number directly above

    1
    1  2
    2  3  5
    5  7 10 15
   15 20 27 37 52
   …

The numbers in the first column are the Bell numbers.

Related posts

[1] E. T. Bell. Exponential Numbers. The American Mathematical Monthly, Vol. 41, No. 7 (Aug. – Sep., 1934), pp. 411–419.

How many ways can you triangulate a regular polygon?

In this post we want to count the number of ways to divide a regular polygon [1] into triangles by connecting vertices with straight lines that do not cross.

Squares

For a square, there are two possibilities: we either connect the NW and SE corners,

or we connect the SW and NE corners.

Pentagons

For a pentagon, we pick one vertex and connect it to both non-adjacent vertices.

We can do this for any vertex, so there are five possible triangulations. All five triangulations are rotations of the same triangulation. What if we consider these rotations as equivalent? We’ll get to that later.

Hexagons

For a hexagon, things are more interesting. We can again pick any vertex and connect it to all non-adjacent vertices, giving six triangulations.

But there are more possibilities. We could connect every other vertex, creating an equilateral triangle inside. We can do this two ways, connecting either the even-numbered vertices or the odd-numbered vertices. Either triangulation is a rotation of the other.

We can also connect the vertices in a zig-zag pattern, creating an N-shaped pattern inside. We could also rotate this triangulation one or two turns. (Three turns gives us the same pattern again.)

Finally, we could also connect the vertices creating a backward N pattern.

General case

So to recap, we have 2 ways to triangulate a square, 5 ways to triangulate a pentagon, and 6 + 2 + 3 + 3 = 14 ways to triangulate a hexagon. Also, there is only 1 way to triangulate a triangle: do nothing.

Let Cn be the number of ways to triangulate a regular (n + 2)-gon. Then we have C1 = 1, C2 = 2, C3 = 5, and C4 = 14.

In general,

C_n = \frac{1}{n+1}\binom{2n}{n}

which is the nth Catalan number.

Catalan numbers are the answers to a large number of questions. For example, Cn is also the number of ways to fully parenthesize a product of n + 1 terms, and the number of full binary trees with n + 1 nodes.

The Catalan numbers have been very well studied, and we know that asymptotically

C \sim \frac{4^n}{n^{3/2} \sqrt{\pi}}

so we can estimate Cn for large n. For example, we could use the formula above to estimate the number of ways to triangulate a 100-gon to be 5.84 ×1055. The 98th Catalan number is closer to 5.77 ×1055. Two takeaways: Catalan numbers grow very quickly, and we can estimate them within an order of magnitude using the asymptotic formula.

Equivalence classes

Now let’s go back and count the number of triangulations again, considering some variations on a triangulation to be the same triangulation.

We’ll consider rotations of the same triangulation to count only once. So, for example, we’ll say there is only one triangulation of a pentagon and four triangulations of a hexagon. If we consider mirror images to be the same triangulation, then there are three triangulations of a hexagon, counting the N pattern and the backward N pattern to be the same.

Grouping rotations

The number of equivalence classes of n-gon triangulations, grouping rotations together, is OEIS sequence A001683. Note that the sequence starts at 2.

OEIS gives a formula for this sequence:

a(n) = \frac{1}{2n}C_{n-2} + \frac{1}{4}C_{n/2-1} + \frac{1}{2} C_{\lceil (n+1)/2\rceil - 2} + \frac{1}{3} C_{n/3 - 1}
where Cx is zero when x is not an integer. So a(6) = 4, as expected.

Grouping rotations and reflections

The number of equivalence classes of n-gon triangulations, grouping rotations and reflections together, is OEIS sequence A000207. Note that the sequence starts at 3.

OEIS gives a formula for this sequence as well:

a(n) = \frac{1}{2n}C_{n-2} + \frac{1}{4}C_{n/2-1} + \frac{1}{2} C_{\lceil (n+1)/2\rceil - 2} + \frac{1}{3} C_{n/3 - 1}

As before, Cx is zero when x is not an integer. This gives a(6) = 3, as expected.

The formula on the OEIS page is a little confusing since it uses C(n) to denote Cn−2 .

Related posts

[1] Our polygons do not need to be regular, but they do need to be convex.

Gluons, quarks, letters, and envelopes

Yesterday I wrote a couple of posts about a combinatorics question that lead to OEIS sequence A000255. That page has this intriguing line:

This comment derives from a family of recurrences found by Malin Sjodahl for a combinatorial problem for certain quark and gluon diagrams (Feb 27 2010)

I love how pulling on a thread can lead you to unexpected places. What in the world does my little permutation problem have to do with particle physics‽

I found Malin Sjödahl’s website, and apparently the citation above refers to this paper from 2009. The author’s site doesn’t list any papers from 2010. Maybe the paper was published electronically in 2009 and in print in 2010.

Counting tensors

The paper is mostly opaque to me since I don’t know particle physics, but at one point Sjödahl says

The problem of finding all such topologies is equivalent to the number of ways of mapping N elements to each other without mapping a single one to itself …

and says that the solution is

N! \sum_{i=0}^N \frac{(-1)^i}{i!} \to \frac{N!}{e}

Sjödahl is not counting physical permutations but the number possible tensors associated with gluons and quark interaction diagrams.

The right-hand side above is essentially the same as the asymptotic estimate for the function Q(n) in the previous post.

I didn’t find the recurrence that the OEIS comment alluded to. Perhaps I’m not looking at the same paper. Perhaps I’m not looking hard enough because I’m skimming a paper whose contents I don’t understand.

The Montmort letter problem

In more mathematical terminology, Sjödahl is counting the number of permutations of N objects with no fixed point, known as the number derangements of a set N objects.

If you divide by the number of possible permutations N! you have the probability that a random permutation moves every object.

This was historically known as the Montmort letter problem, named after Pierre-Remond Montmort who asked the following question. If N letters are randomly assigned to N envelopes, what is the probability that no letter ends up in the correct envelope?

The probability converges to 1/e as N approaches infinity. It approaches 1/e quickly, and so this is a good approximation even for moderate N. More on the rate of convergence in the next post.

Previous posts in this series

Example using a generating function to find asymptotic behavior

The previous post looked at how to compute Q(n), the number of permutations of 1, 2, 3, …, n + 1 that contain no consecutive integers. We found a way to numerically compute Q(n) but no analytic expression that would let us compute asymptotics.

The sequence Q(n) is sequence A000255 in OEIS, and OEIS gives the exponential generating function of Q:

\sum_{n=0}^\infty \frac{Q(n)}{n!} = \frac{\exp(-x)}{(1-x)^2}

We can use this function and Theorem 5.2.1 from [1] to get the asymptotic form of Q(n). According to that theorem, the coefficient of xn in our generating function is asymptotically the same as the coefficient of xn in the principle part at the singularity at 1. This principle part is

\frac{1}{e(1-x)^2} + \frac{1}{e(1-x)} = \frac{1}{e}\left(2 + 3x + 4x^2 + 5x^3 + \cdots\right)

and so the coefficient of xn is (n + 2)/e.

So

Q(n) \sim \frac{n + 2}{e} n!

and Q(n) / (n + 1)! approaches 1/e for large n, as conjectured in the previous post.

[1] Herbert S. Wilf. Generatingfunctionology. Second edition.

Permutations with no consecutive elements

I was working on something this week that made me wonder how many permutations break up all consecutive elements. For example, if we’re looking at permutations of abcde then acebd counts, but acdbe does not. I’d like to count the number of such permutations, and estimate for large N the number of permutations of N elements with no consecutive terms.

A brief search lead to OEIS sequence A000255 which gives a recursive formula for the answer. Before looking at that, let’s define some terms and do some brute force calculation to get a feel for the problem.

Brute force calculation

Given a positive integer n, define Q(n) to be the number of partitions p of 1, 2, 3, …, n + 1 such that for no k does p(k + 1) = p(k) + 1.

from itertools import permutations

def no_consec(perm):
    for k in range(len(perm) - 1):
        if perm[k+1] == perm[k] + 1:
            return False
    return True

def Q(n):
    count = 0
    for p in permutations(range(n+1)):
        if no_consec(p):
            count += 1
    return count

We could use this code to compute Q(n) for moderate-sized n, but the time required to compute Q(n) this way is proportional to n! and so that won’t scale well.

Recurrence relation

OEIS sequence A000255 gives the values of what we’re calling Q(n), and the first few values match the output of the code above. But OEIS does better, giving a recursive solution for Q:

Q(n) = n Q(n − 1) + (n − 1) Q(n − 2)

for n > 1 with initial conditions Q(0) = Q(1) = 1.

As with Fibonacci numbers, you could implement this easily as a recursive function

def Q2(n):
    if n in[0, 1]:
        return 1
    return n*Q2(n-1) + (n-1)*Q2(n-2)

but it is much faster to implement iteratively.

def Q3(n):
    q = [1]*(n+2)
    for k in range(2, n+1):
        q[k] = k*q[k-1] + (k-1)*q[k-2]
    return q[n]

You could make the recursive implementation faster with memoization, using lru_cache from functools. However, memoization does not prevent recursion; it just speeds up recursion using a cache, and you can still get an error saying maximum recursion depth exceeded.

Asymptotics

We can calculate Q(n) for large n, but we don’t have an analytic expression [1] that will let us find the asymptotic behavior. I intended to work that out in my next post.

Empirically, it seems Q(n) approaches (n + 1)!/e as n goes to infinity, but I don’t have a proof of that at the time of writing. If this is correct, this means that in the limit the probability of a permutation having no fixed point equals the probability of it having no consecutive values.

Update: Proof

***

[1] There is an analytic expression for Q(n) in terms of a hypergeometric function

Q(n) = (−1)n 2F0(2, −n; ; 1)

but this expression isn’t helpful in finding the asymptotic behavior.

Separable functions in different contexts

I was skimming through the book Mathematical Reflections [1] recently. He was discussing a set of generalizations [2] of the Star of David theorem from combinatorics.

\gcd\left(\binom{n - 1}{r - 1}, \binom{n}{r+1}, \binom{n+1}{r}\right) = \gcd\left(\binom{n-1}{r}, \binom{n+1}{r+1}, \binom{n}{r-1}\right)

The theorem is so named because if you draw a Star of David by connecting points in Pascal’s triangle then each side corresponds to the vertices of a triangle.

diagram illustrating the Star of David Theorem

One such theorem was the following.

\binom{n - \ell}{r - \ell} \binom{n - k}{r} \binom{n - k - \ell}{r - k} = \binom{n-k}{r-k} \binom{n - \ell}{r} \binom{n - k - \ell}{r - \ell}

This theorem also has a geometric interpretation, connecting vertices within Pascal’s triangle.

The authors point out that the binomial coefficient is a separable function of three variables, and that their generalized Star of David theorem is true for any separable function of three variables.

The binomial coefficient C(nk) is a function of two variables, but you can think of it as a function of three variables: n, k, and nk. That is

{n \choose k} = f(n) \, g(k) \, g(n-k)

where f(n) = n! and g(k) = 1/k!.

I was surprised to see the term separable function outside of a PDE context. My graduate work was in partial differential equations, and so when I hear separable function my mind goes to separation of variables as a technique for solving PDEs.

Coincidentally, I was looking a separable coordinate systems recently. These are coordinate systems in which the Helmholtz equation can be solved by separable function, i.e. a coordinate system in which the separation of variables technique will work. The Laplacian can take on very different forms in different coordinate systems, and if possible you’d like to choose a coordinate system in which a PDE you care about is separable.

Related posts

[1] Peter Hilton, Derek Holton, and Jean Pedersen. Mathematical Reflections. Springer, 1996.

[2] Hilton et al refer to a set of theorems as generalizations of the Star of David theorem, but these theorems are not strictly generalizations in the sense that the original theorem is clearly a special case of the generalized theorems. The theorems are related, and I imagine with more effort I could see how to prove the older theorem from the newer ones, but it’s not immediately obvious.

A variation on Rock, Paper, Scissors

Imagine in a game of Rock, Paper, Scissors one player is free to play as usual but the other is required to choose each option the same number of times. That is, in 3n rounds of the game, the disadvantaged player much choose Rock n times, Paper n times, and Scissors n times.

Obviously the unrestricted player would be expected to win more games, but how many more? At least one, because the unrestricted player knows what the restricted player will do on the last round.

If n is large, then in the early rounds of the game it makes little difference that one of the players is restricted. The restriction doesn’t give the unrestricted player much exploitable information. But in the later rounds of the game, the limitations on the restricted player’s moves increase the other players chances of winning more.

It turns out that the if the unrestricted player uses an optimal strategy, he can expect to win O(√n) more rounds than he loses. More precisely, the expected advantage approaches 1.4658√n as n grows. You can find a thorough analysis of the problem in this paper.

Related posts

Up-down permutations

An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example

1, 5, 3, 4, 2

is an up-down permutation of 1, 2, 3, 4, 5 because

1 < 5 > 3 < 4 > 2.

Up-down permutations are also called alternating permutations for obvious reasons. The number of up-down permutations of a set of size n is often denoted A(n) because these permutations were first studied by Désiré André. It is also sometimes denoted En, where the E alludes to Euler.

André found the generating function for A(n):

\sum_{n=0}^\infty A(n) \frac{x^n}{n!} = \sec x + \tan x.

You could read this as saying sec(x) + tan(x) is the exponential generating function for A(n) or as saying sec(x) + tan(x) is the ordinary generating function for

P(n) =\frac{A(n)}{n!}

where P(n) is the proportion of all permutations of the digits 1 through n which are alternating permutations.

It’s amazing that up-down permutations have anything to do with trig functions, but they do.

Knowing the (exponential) generating function for A(n) lets us use generating function techniques to prove various things about these numbers. For example, we can use the generating function to discover their asymptotic behavior.

The secant and tangent functions are well behaved at 0, but both have a singularity at π/2. This means the generating function above is not just a formal power series but an analytic power series with radius of convergence π/2, the distance from the center of the power series to the closest singularity. It follows that

P(n) = {\cal O}\left( \left( \frac{2}{\pi} \right)^n \right)

and so the proportion of permutations which are alternating permutations decreases exponentially for large n.

Related posts

Ruzsa distance

A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group.

Subset difference

Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A B is defined the set

A - B = \{a - b \mid a \in A, b \in B \}

As is customary with Abelian groups, we denote the group operation by + and a b means the group operation applied to a and the inverse of b.

For example, let G be the group of integers mod 10. Let A = {1, 5} and B = {3, 7}. Then A B is the set {2, 4, 8}. There are only three elements because 1 − 3 and 5 − 7 are both congruent to 8.

Ruzsa distance

The Ruzsa distance between two subsets of an Abelian group is defined by

d(A,B) = \log \frac{|A-B|}{|A|^{1/2}\, |B|^{1/2}}

where |S| denotes the number of elements in a set S.

The Ruzsa distance is not a metric, but it fails in an unusual way. the four axioms of a metric are

  1. d(x, x) = 0
  2. d(x, y) > 0 unless x = y
  3. d(x, y) = d(y, x)
  4. d(x, z) ≤ d(x, y) + d(y, z)

The first axiom is usually trivial, but it’s the only one that doesn’t hold for Ruzsa distance. In particular, the last axiom, the triangle inequality, does hold.

To see that the first axiom does not always hold, lets again let G be the integers mod 10 and let A = {1, 3}. Then A A is the set {0, 2, 8} and d(A, A) = log 3/2 > 0.

Sometimes d(A, A) does equal zero. If A = G then A A = A, and so d(A, A) = log 1 = 0.

Entropic Ruzsa distance

If we switch from subsets of G to random variables taking values in G we can define an analog of Ruzsa distance between random variables X and Y, the entropic Ruzsa distance

d_{\text{ent}}(X, Y) = H(X' + Y') - \frac{1}{2}H(X) - \frac{1}{2}H(Y)

where X′ and Y′ are independent copies of X and Y and H is Shannon entropy. For more on entropic Ruzsa distance, see this post by Terence Tao.

Note that if A and B are subsets of G, and X and Y are uniform random variables with support on A and B respectively, then the negative terms above correspond to the log of 1/|A|½ |B|½.  The H(X′ + Y′) term isn’t the log of |AB| though because for one thing its a sum rather than a difference. For another, the sum of uniform random variables may not be uniform: there may be more than one way to end up at a particular sum, and so sum values will have higher probability.

 

The 10th Dedekind number

The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be

M(9) = 286386577668298411128469151667598498812366.

The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, and M(3) = 20. But as n increases, M(n) increases rapidly, with M(9) being on the order of 1041.

Although computing the Dedekind numbers exactly is difficult—M(8) was computed in 1991 and M(9) now in 2023—there is an explicit formula for these numbers, and much is known about their asymptotic growth. This post speculates about what M(10) might be.

Write the number k in binary and let bik be its ith bit:

b_i^k=\left\lfloor\frac{k}{2^i}\right\rfloor - 2\left\lfloor\frac{k}{2^{i+1}}\right\rfloor

Then the nth Dedekind number is given by

M(n)=\sum_{k=1}^{2^{2^n}} \prod_{j=1}^{2^n-1} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)

and so

M(10)=\sum_{k=1}^{2^{1024}} \prod_{j=1}^{1023} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)

In principle, all you have to do to compute M(10) is evaluate the sum above. However, since this sum has more than 10308 terms, it would take a while.

What can we say about M(10) without computing it? The number of monotone Boolean functions of n variables is less than the total number of Boolean functions of n variables, which equals

2^{2^n}

That tells us M(10) < 1.8 × 10308.

There are more useful bounds. It has been proven that

{n\choose \lfloor n/2\rfloor}\le \log_2 M(n)\le {n\choose \lfloor n/2\rfloor}\left(1+O\left(\frac{\log n}{n}\right)\right)

This gives us a definite lower bound but not a definite upper bound. We know M(10) ≥ 2252 which is approximately 7.237 × 1075, but we don’t know what the big-O term is. All we know is that for sufficiently large n, this term is smaller than some multiple of log(n)/n. How large does n need to be and what is this constant? I don’t know. Maybe researchers in this area have some partial results.

Let’s take a guess at the upper bound by seeing what the big-O term was for M(9). Find k such that

\log_2 M(9) = \binom{9}{4}\left(1 + k \frac{\log 9}{9}\right)

We get

k = \left(\frac{\log_2M(9)}{126} - 1 \right)\frac{9}{\log 9} \approx 0.3809

and we can use this to guess that

\log_2 M(10) \stackrel{?}{=} \binom{10}{5}\left(1 + 0.3809 \frac{\log 10}{10}\right) \approx 274.1

which would imply M(10) = 3.253 × 1082.

So to recap, we know for certain that M(10) is between 7.237 × 1075 and 1.8 × 10308, and our guess based on the heuristic above is that M(10) = 3.253 × 1082.