Three things about dominoes

Here are three things about dominoes, two easy and one more advanced.


First, how many pieces are there in a set of dominoes? A domino corresponds to an unordered pair of numbers from 0 to n. The most popular form has n = 6, but there are variations with other values of n. You can show that the number of dominoes is

{ n+1\choose 2} + n + 1

This is because there are n+1 possible numbers (since blanks are a possibility) and each one is either a double or not. The number of ways to choose two distinct numbers is the binomial coefficient and the number of doubles is n+1.

Another way to look at this is that we are selecting two things from a set of n+1 things with replacement and so the number of possibilities is

\left({ n+1\choose 2}\right) = {n+2 \choose 2}

where the symbol on the left is Stanley’s symbol for selection with replacement.

In any case, there are 28 dominoes when n = 6, 55 when n = 9, and 91 when n = 12.

Magic squares

There are a couple ways to make a magic square of sorts from a set of dominoes. To read more about this, see this post.

Domino magic square


How many ways can you cover an m by n chess board with dominoes? The answer turns out to be

\sqrt{\prod_{k=1}^m \prod_{\ell=1}^n \left( 2\cos\left(\frac{\pi k}{m+1} \right) + 2i \cos\left(\frac{\pi \ell}{n+1} \right) \right)}

See this post for details.

Stirling numbers, including negative arguments

Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there.

Stirling numbers of the second kind

The Stirling number of the second kind

S_2(n,k) = \left\{ {n \atop k} \right\}

counts the number of ways to partition a set of n elements into k non-empty subsets. The notation on the left is easier to use inline, and the subscript reminds us that we’re talking about Stirling numbers of the second kind. The notation on the right suggests that we’re dealing with something analogous to binomial coefficients, and the curly braces suggest this thing might have something to do with counting sets.

Since the nth Bell number counts the total number of ways to partition a set of n elements into any number of non-empty subsets, we have

B_n = \sum_{k=1}^n \left\{ {n \atop k}\right\}

Another connection to Bell is that S2(n, k) is the sum of the coefficients in the partial exponential Bell polynomial Bn, k.

Stirling numbers of the first kind

The Stirling numbers of the first kind

S_1(n,k) = \left[ {n \atop k} \right]

count how many ways to partition a set into cycles rather than subsets. A cycle is a sort of ordered subset. The order of elements matters, but only a circular way. A cycle of size k is a way to place k items evenly around a circle, where two cycles are considered the same if you can rotate one into the other. So, for example, [1, 2, 3] and [2, 3, 1] represent the same cycle, but [1, 2, 3] and [1, 3, 2] represent different cycles.

Since a set with at least three elements can be arranged into multiple cycles, Stirling numbers of the first kind are greater than or equal to Stirling numbers of the second kind, given the same arguments.

Addition identities

We started out by saying Stirling numbers were like binomial coefficients, and here we show that they satisfy addition identities similar to binomial coefficients.

For binomial coefficients we have

{n \choose k} = {n - 1 \choose k} + {n-1 \choose k-1}.

To see this, imagine we start with the set of the numbers 1 through n. How many ways can we select a subset of k items? We have selections that exclude the number 1 and selections that include the number 1. These correspond to the two terms on the right side of the identity.

The analogous identities for Stirling numbers are

\left\{{n \atop k} \right\}= k \left\{ {n-1 \atop k} \right\} + \left\{ {n-1 \atop k-1} \right\}

\left[ {n \atop k} \right]= (n-1) \left[ {n-1 \atop k} \right] + \left[ {n-1 \atop k-1} \right]
The combinatorial proofs of these identities are similar to the argument above for binomial coefficients. If you want to partition the numbers 1 through n into k subsets (or cycles), either 1 is in a subset (cycle) by itself or not.

More general arguments

Everything above has implicitly assumed n and k were positive, or at least non-negative, numbers. Let’s look first at how we might handle zero arguments, then negative arguments.

It works out best if we define S1(0, k) and S2(0, k) to be 1 if k is 0 and zero otherwise. Also we define S1(n, 0) and S2(n, 0) to be 1 if n is 0 and zero otherwise. (See Concrete Mathematics.)

When either n or k is zero the combinatoric interpretation is strained. If we let either be negative, there is no combinatorial interpretation, though it can still be useful to consider negative arguments, much like one does with binomial coefficients.

We can take the addition theorems above, which are theorems for non-negative arguments, and treat them as definitions for negative arguments. When we do, we get the following beautiful dual relationship between Stirling numbers of the first and second kinds:

\left\{{n \atop k} \right\}= \left[ {-k \atop -n} \right]

Related posts

Bell numbers

The nth Bell number is the number of ways to partition a set of n labeled items. It’s also equal to the following sum.

B_n = \frac{1}{e}\sum_{k=0}^\infty \frac{k^n}{k!}

You may have to look at that sum twice to see it correctly. It looks a lot like the sum for en except the roles of k and n are reversed in the numerator.

The sum reminded me of the equation

\frac{d}{de}e^x = x e^{x-1}

that I blogged about a few weeks ago. It’s correct, but you may have to stare at it a little while to see why.

Incidentally, the nth Bell number is the nth moment of a Poisson random variable with mean 1.

There’s also a connection with Bell polynomials. The nth Bell number is the sum of the coefficients of the nth complete Bell polynomial. The nth Bell polynomial is sum over k of the partial exponential Bell polynomials Bn,k. The partial (exponential) Bell polynomials are defined here.

Computing Bell numbers

You can compute Bell numbers in Python with sympy.bell and in Mathematical with BellB. You can compute Bell numbers by hand, or write your own function in a language that doesn’t provide one, with the recursion relation B0 = 1 and

B_n = \sum_{k=0}^{n-1}{ n-1 \choose k}B_k

for n > 0. Bell numbers grow very quickly, faster than exponential, so you’ll need extended precision integers if you want to compute very many Bell numbers.

For theoretical purposes, it is sometimes helpful to work with the exponential generating function of the Bell numbers. The nth Bell number is n! times the coefficient of xn in exp(exp(x) – 1).

\sum_{n=0}^\infty \frac{B_n}{n!} x^n = \exp(\exp(x) -1)

An impractical but amusing way to compute Bell numbers would be via simulation, finding the expected value of nth powers of random draws from a Poisson distribution with mean 1.

Combinatorics, just beyond the basics

Most basic combinatorial problems can be solved in terms of multiplication, permutations, and combinations. The next step beyond the basics, in my experience, is counting selections with replacement. Often when I run into a problem that is not quite transparent, it boils down to this.

Examples of selection with replacement

Here are three problems that reduce to counting selections with replacement.

Looking ahead in an experiment

For example, suppose you’re running an experiment in which you randomize to n different treatments and you want to know how many ways the next k subjects can be assigned to treatments. So if you had treatments A, B, and C, and five subjects, you could assign all five to A, or four to A and one to B, etc. for a total of 21 possibilities. Your choosing 5 things from a set of 3, with replacement because you can (and will) assign the same treatment more than once.

Partial derivatives

For another example, if you’re taking the kth order partial derivatives of a function of n variables, you’re choosing k things (variables to differentiate with respect to) from a set of n (the variables). Equality of mixed partials for smooth functions says that all that matters is how many times you differentiate with respect to each variable. Order doesn’t matter: differentiating with respect to x and then with respect to y gives the same result as taking the derivatives in the opposite order, as long as the function you’re differentiating has enough derivatives. I wrote about this example here.

Sums over even terms

I recently had an expression come up that required summing over n distinct variables, each with a non-negative even value, and summing to 2k. How many ways can that be done? As many as dividing all the variable values in half and asking that they sum to k. Here the thing being chosen the variable, and since the indexes sum to k, I have a total of k choices to make, with replacement since I can chose a variable more than once. So again I’m choosing k things with replacement from a set of size n.


I wrote up a set of notes on sampling with replacement that I won’t duplicate here, but in a nutshell:

\left( {n \choose k} \right) = {n + k - 1 \choose k}

The symbol on the left is Richard Stanley’s suggested notation for the number of ways to select k things with replacement from a set of size n. It turns out that this equals the expression on the right side. The derivation isn’t too long, but it’s not completely obvious either. See the aforementioned notes for details.

I started by saying basic combinatorial problems can be reduced to multiplication, permutations, and combinations. Sampling with replacement can be reduced to a combination, the right side of the equation above, but with non-obvious arguments. Hence the need to introduce a new symbol, the one on the right, that maps directly to problem statements.

Enumerating possibilities

Sometimes you just need to count how many ways one can select k things with replacement from a set of size n. But often you need to actually enumerate the possibilities, say to loop over them in software.

In conducting a clinical trial, you may want to enumerate all the ways pending data could turn out. If you’re going to act the same way, no matter how the pending data work out, there’s no need to wait for the missing data before proceeding. This is something I did many times when I worked at MD Anderson Cancer Center.

When you evaluate a multivariate Taylor series, you need to carry out a sum that has a term for corresponding to each partial derivative. You could naively sum over all possible derivatives, not taking into account equality of mixed partials, but that could be very inefficient, as described here.

The example above summing over even partitions of an even number also comes out of a problem with multivariate Taylor series, one in which odd terms drop out by symmetry.

To find algorithms for enumerating selections with replacement, see Knuth’s TAOCP, volume 4, Fascicle 3.

Related posts

Inverting a power series

Inverting a power series

As a student, one of the things that seemed curious to me about power series was that a function might have a simple power series, but its inverse could be much more complicated. For example, the coefficients in the series for arctangent have a very simple pattern

\arctan(x) = x - \frac{1}{3}x^3 + \frac{1}{5} x^5 - \frac{1}{7}x^7 + \cdots = \sum_{n=0}^\infty \frac{(-1)^{n}x^n}{2n+1}

but the coefficients in the series for tangent are mysterious.

\tan x = x + \frac{1}{3}x^3 + \frac{2}{15} x^5 + \frac{17}{315} x^7 + \frac{62}{2835} x^9 + \cdots

There’s no obvious pattern to the coefficients. It is possible to write the sum in closed form, but this requires introducing the Bernoulli numbers which are only slightly less mysterious than the power series for tangent.

\tan x = \sum_{n=1}^\infty (-1)^{n-1} \frac{4^n(4^n-1)}{(2n)!} B_{2n}\, x^{2n-1}

To a calculus student, this is bad news: a simple, familiar function has a complicated power series. But this is good news for combinatorics. Reading the equation from right to left, it says a complicated sequence has a simple generating function!

Computing the coefficients

The example above suggests that inverting a power series, i.e. starting with the power series for a function and computing the power series for its inverse, is going to be complicated. We introduce exponential Bell polynomials to encapsulate some of the complexity, analogous to introducing Bernoulli numbers above.

Assume the function we want to invert, f(x), satisfies f(0) = 0 and its derivative satisfies f‘(0) ≠ 0. Assume f and its inverse g have the series representations

f(x) = \sum_{k=0}^\infty f_k \frac{x^k}{k!}


g(y) = \sum_{k=0}^\infty g_k \frac{y^k}{k!}

Note that this isn’t the power series per se. The coefficients here are k! times the ordinary power series coefficients. (Compare with ordinary and exponential generating functions explained here.)

Then you can compute the series coefficients for g from the coefficients for f as follows. We have g1 = 1/f1 (things start out easy!) and for n ≥ 2,

g_n = \frac{1}{f_1^n}\sum_{k=1}^{n-1} (-1)^k n^{\bar{k}} B_{n-1, k}(\hat{f_1}, \hat{f_2}, \ldots, \hat{f}_{n-k})

where the B‘s are exponential Bell polynomials,

\hat{f}_k = \frac{f_{k+1}}{(k+1)f_1}


n^{\bar{k}} = n(n+1)(n+2)\cdots(n+k-1)

is the kth rising power of n. (More on rising and falling powers here.)


Let’s do an example in Python. Suppose we want a series for the inverse of the gamma function near 2. The equations above assume we’re working in a neighborhood of 0 and that our function is 0 at 0. So we will invert the series for f(x) = Γ(x+2) – 1 and then adjust the result to find the inverse of Γ(x).

import numpy as np
from scipy.special import gamma
from sympy import factorial, bell 

def rising_power(a, k):
    return gamma(a+k)/gamma(a)

# Taylor series coefficients for gamma centered at 2
gamma_taylor_coeff = [
N = len(gamma_taylor_coeff)

f_exp_coeff = np.zeros(N)
for i in range(1, N):
    f_exp_coeff[i] = gamma_taylor_coeff[i]*factorial(i)

# Verify the theoretical requirements on f
assert( f_exp_coeff[0] == 0 )
assert( f_exp_coeff[1] != 0 )

f_hat = np.zeros(N-1)
for k in range(1, N-1):
    f_hat[k] = f_exp_coeff[k+1] / ((k+1)*f_exp_coeff[1])
g_exp_coeff = np.zeros(N)
g_exp_coeff[1] = 1/f_exp_coeff[1]

for n in range(2, N):
    s = 0
    for k in range(1, n):
        # Note that the last argument to bell must be a list,
        # and not a NumPy array.
        b = bell(n-1, k, list(f_hat[1:(n-k+1)]))
        s += (-1)**k * rising_power(n, k) * b
    g_exp_coeff[n] = s / f_exp_coeff[1]**n

def g(x):
    return sum([
        x**i * g_exp_coeff[i] / factorial(i) for i in range(1, N)

def gamma_inverse(x):
    return 2. + g(x-1.)

# Verify you end up where you started when you do a round trip
for x in [1.9, 2.0, 2.1]:
    print( gamma_inverse(gamma(x)) )



Bell polynomials: partial, ordinary, and exponential

Bell polynomials come up naturally in a variety of contexts: combinatorics, analysis, statistics, etc. Unfortunately, the variations on Bell polynomials are confusingly named.

Analogy with differential equations

There are Bell polynomials of one variable and Bell polynomials of several variables. The latter are sometimes called “partial” Bell polynomials. In differential equations, “ordinary” means univariate (ODEs) and “partial” means multivariate (PDEs). So if “partial” Bell polynomials are the multivariate form, you might assume that “ordinary” Bell polynomials are the univariate form. Unfortunately that’s not the case.

It seems that the “partial” in the context of Bell polynomials was taken from differential equations, but the term “ordinary” was not. Ordinary and partial are not opposites in the context of Bell polynomials. A Bell polynomial can be ordinary and partial!

Analogy with generating functions

“Ordinary” in this context is the opposite of “exponential,” not the opposite of “partial.” The analogy is with generating functions, not differential equations. The ordinary generating function of a sequence multiplies the nth term by xn and sums. The exponential generating function of a sequence multiplies the nth term by xn/n! and sums. For example, the ordinary generating function for the Fibonacci numbers is

F(x) = \sum_{n=0}^\infty F_n x^n = \frac{x}{1 - x - x^2}

while the exponential generating function is

f(x) = \sum_{n=0}^\infty F_n \frac{x^n}{n!} = \frac{\exp(r_+x) - \exp(r_-x)}{\sqrt{5}}


 r_{\pm} = \frac{1 \pm \sqrt{5}}{2}

The definitions of exponential and ordinary Bell polynomials are given below, but the difference between the two that I wish to point out is that the former divides the kth polynomial argument by k! while the latter does not. They also differ by a scaling factor. The exponential form of Bn,k has a factor of n! where the ordinary form has a factor of k!.

“Ordinary” as a technical term

Based on the colloquial meaning of “ordinary” you might assume that it is the default. And indeed that is the case with generating functions. Without further qualification, generating function means ordinary generating function. You’ll primarily see explicit references to ordinary generating functions in a context where it’s necessary to distinguish them from some other kind of generating function. Usually the word “ordinary” will be written in parentheses to emphasize that an otherwise implicit assumption is being made explicit. In short, “ordinary” and “customary” correspond in the context of generating functions.

But in the context of Bell polynomials, it seems that the exponential form is more customary. At least in some sources, an unqualified reference to Bell polynomials refers to the exponential variety. That’s the case in SymPy where the function bell() implements exponential Bell polynomials and there is no function to compute ordinary Bell polynomials.


Now for the actual definitions. We can make our definitions more concise if we first define multi-index notation. A multi-index

\alpha = (\alpha_1, \alpha_2, \ldots, \alpha_n)

is a set of n non-negative integers. The norm of a multi-index is defined by

|\alpha| = \alpha_1 + \alpha_2 + \cdots + \alpha_n

and the factorial of a multi-index is defined by

\alpha! = \prod_{k=1}^n \alpha_k!

If x = (x1, x2, …, xn) is an n-tuple of real numbers then x raised to the power of a multi-index α is defined by

x^\alpha = \prod_{k=1}^n x_k^{\alpha_k}

The ordinary Bell polynomial Bn,k is

B_{n,k}(x) = \sum \frac{k!}{\alpha!} x^\alpha

where x is a (nk +1)-tuple and α ranges over all multi-indices subject to two constraints: the norm of α is k and

\sum_{i=1}^{n-k+1} i \alpha_i = n

The exponential Bell polynomials can then be defined in terms of the ordinary bell polynomials by

B^\mathrm{exp}_{n,k}(x_1, x_2, \ldots, x_{n-k+1}) = \frac{n!}{k!} B^\mathrm{ord}_{n,k}\left( \frac{x_1}{1!}, \frac{x_2}{2!}, \cdots, \frac{x_{n-k+1}}{(n-k+1)!} \right )

Related posts

Moment generating functions and connections to other things

This post relates moment generating functions to the Laplace transform and to exponential generating functions. It also brings in connections to the z-transform and the Fourier transform.

Thanks to Brian Borchers who suggested the subject of this post in a comment on a previous post on transforms and convolutions.

Moment generating functions

The moment generating function (MGF) of a random variable X is defined as the expected value of exp(tX). By the so-called rule of the unconscious statistician we have

M_X(t) \equiv \mathrm{E}[e^{tX}] = \int_{-\infty}^\infty e^{tx} f_X(x)\, dx

where fX is the probability density function of the random variable X. The function MX is called the moment generating function of X because it’s nth derivative, evaluated at 0, gives the nth moment of X, i.e. the expected value of Xn.

Laplace transforms

If we flip the sign on t in the integral above, we have the two-sided Laplace transform of fX. That is, the moment generating function of X at t is the two-sided Laplace transform of fX at –t. If the density function is zero for negative values, then the two-sided Laplace transform reduces to the more common (one-sided) Laplace transform.

Exponential generating functions

Since the derivatives of MX at zero are the moments of X, the power series for MX is the exponential generating function for the moments. We have

M_X(t) = m_0 + m_1t + \frac{m_2}{2}t^2 + \frac{m_3}{3!} t^3 + \cdots

where mn is the nth moment of X.

Other generating functions

This terminology needs a little explanation since we’re using “generating function” two or three different ways. The “moment generating function” is the function defined above and only appears in probability. In combinatorics, the (ordinary) generating function of a sequence is the power series whose coefficient of xn is the nth term of the sequence. The exponential generating function is similar, except that each term is divided by n!. This is called the exponential generating series because it looks like the power series for the exponential function. Indeed, the exponential function is the exponential generating function for the sequence of all 1’s.

The equation above shows that MX is the exponential generating function for mn and the ordinary generating function for mn/n!.

If a random variable Y is defined on the integers, then the (ordinary) generating function for the sequence Prob(Yn) is called, naturally enough, the probability generating function for Y.

The z-transform of a sequence, common in electrical engineering, is the (ordinary) generating function of the sequence, but with x replaced with 1/z.

Characteristic functions

The characteristic function of a random variable is a variation on the moment generating function. Rather than use the expected value of tX, it uses the expected value of itX. This means the characteristic function of a random variable is the Fourier transform of its density function.

Characteristic functions are easier to work with than moment generating functions. We haven’t talked about when moment generating functions exist, but it’s clear from the integral above that the right tail of the density has to go to zero faster than ex, which isn’t the case for fat-tailed distributions. That’s not a problem for the characteristic function because the Fourier transform exists for any density function. This is another example of how complex variables simplify problems.

Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

    from sympy.ntheory import mobius, divisors

    def num_primitive(n):
        return sum( [mobius(n/d)*2**d for d in divisors(n)] )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

    from sympy.ntheory import factorint

    def mobius(n):
        exponents = factorint(n).values()
        lenexp = len(exponents)
        m = 0 if lenexp == 0 else max(exponents)
        return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

Related: Applied number theory

Making change

How many ways can you make change for a dollar? This post points to two approaches to the problem, one computational and one analytic.

SICP gives a Scheme program to solve the problem:

(define (count-change amount) (cc amount 5))

(define (cc amount kinds-of-coins)
    (cond ((= amount 0) 1)
    ((or (< amount 0) (= kinds-of-coins 0)) 0)
    (else (+ (cc amount
                 (- kinds-of-coins 1))
             (cc (- amount

(define (first-denomination kinds-of-coins)
    (cond ((= kinds-of-coins 1) 1)
          ((= kinds-of-coins 2) 5)
          ((= kinds-of-coins 3) 10)
          ((= kinds-of-coins 4) 25)
          ((= kinds-of-coins 5) 50)))

Concrete Mathematics explains that the number of ways to make change for an amount of n cents is the coefficient of zn in the power series for the following:

\frac{1}{(1 - z)(1 - z^5)(1 - z^{10})(1 - z^{25})(1 - z^{50})}

Later on the book gives a more explicit but complicated formula for the coefficients.

Both show that there are 292 ways to make change for a dollar.

Counting magic squares

How many k × k magic squares are possible? If you start from a liberal definition of magic square, there’s an elegant result. For the purposes of this post, a magic square is a square arrangement of non-negative numbers such that the rows and columns all sum to the same non-negative number m called the magic constant. Note that this allows the possibility that numbers will be repeated, and this places no restriction on the diagonals.

With this admittedly non-standard definition, the number of k × k magic squares with magic constant m is always a polynomial in m of degree no more than (k – 1)2. For k = 3, the result is

(m + 1)(m + 2)(m2 + 3m + 4)/8

There is no general formula for all k, but there is an algorithm for finding a formula for each value of k.

Source: The Concrete Tetrahedron

Update: I had reported the polynomial degree as k + 1, but looking back at Concrete Tetrahedron, that book lists the order as (k + 1)2. However, the paper cited in the comments lists the exponent as (k – 1)2, which I believe is correct.

Related posts: