Sampling with replacement until you’ve seen everything

Suppose you have a standard deck of 52 cards. You pull out a card, put it back in the deck, shuffle, and pull out another card. How long would you expect to do this until you’ve seen every card?

Here’s a variation on the same problem. Suppose you’re a park ranger keeping data on tagged wolves in a national park. Each day you catch a wolf at random, collect some data, and release it. If there are 100 wolves in the park, how long until you’ve seen every wolf at least once?

We’ll solve this problem via simulation, then exactly.


The Python code to solve the problem via simulation is straight forward. Here we’ll simulate our ranger catching and releasing one of 100 wolves. We’ll repeat our simulation five times to get an idea how much the results vary.

    import numpy as np
    from numpy.random import default_rng

    rng = default_rng()
    num_runs = 5
    N = 100 # number of unique items

    for run in range(num_runs):
        tally = np.zeros(N,dtype=int)
        trials = 0
        while min(tally) == 0:
            tally[rng.integers(N)] += 1
            trials += 1

When I ran this, I got 846, 842, 676, 398, and 420.

Exact solution

Suppose we have a desk of N cards, and we sample the deck with replacement M times. We can select the first card N ways. Ditto for the second, third, and all the rest of the cards. so there are NM possible sequences of card draws.

How many of these sequences include each card at least once? To answer the question, we look at Richard Stanley’s twelvefold way. We’re looking for the number of ways to allocate M balls into N urns such that each urn contains at least one ball. The answer is

N! S(M, N)

where S(M, N) is the Stirling number of the second kind with parameters M and N [1].

So the probability of having seen each card at least once after selecting M cards randomly with replacement is

N! S(M, N) / NM.

Computing Stirling numbers

We’ve reduced our problem to the problem of computing Stirling numbers (of the second kind), so how do we do that?

We can compute Stirling numbers in terms of binomial coefficients as follows.

S(n,k) = \frac{1}{k!}\sum_{i = 0}^k (-1)^i {k \choose i} (k-i)^n

Now we have a practical problem if we want to use this formula for larger values of n and k. If we work with floating point numbers, we’re likely to run into overflow. This is the perennial with probability calculations. You may need to compute a moderate-sized number as the ratio of two enormous numbers. Even though the final result is between 0 and 1, the intermediate results may be too large to store in a floating point number. And even if we don’t overflow, we may lose precision because we’re working with an alternating sum.

The usual way to deal with overflow is to work on a log scale. But that won’t work for Stirling numbers because we’re adding terms together, not multiplying them.

One way to solve our problem—not the most efficient way but a way that will work—is to work with integers. This is easy in Python because integers by default can be arbitrarily large.

There are two ways to compute binomial coefficients in Python: scipy.special.binom and math.comb. The former uses floating point operations and the latter uses integers, so we need to use the latter.

We can compute the numerator of our probability with

    from math import comb
    def k_factorial_stirling(n, k):
        return sum((-1)**i * comb(k, i)*(k-i)**n for i in range(k+1))    

Then our probability is

    k_factorial_stirling(M, N) / N**M

If we compute the probability that our ranger has seen all 100 wolves after catching and releasing 500 times, we get 0.5116. So the median number of catch and release cycles is somewhere around 500, which is consistent with our simulation above.

Note that the numerator and denominator in our calculation are both on the order of 101000 while the largest floating point number is on the order of 10308, so we would have overflowed if we’d used binom instead of comb.

Related posts

[1] It’s unfortunate that these were called the “second kind” because they’re the kind that come up most often in application.

Image: “Fastening a Collar” by USFWS Mountain Prairie is licensed under CC BY 2.0 .

Self-Orthogonal Latin Squares

The other day I wrote about orthogonal Latin squares. Two Latin squares are orthogonal if the list of pairs of corresponding elements in the two squares contains no repeats.

A self-orthogonal Latin square (SOLS) is a Latin square that is orthogonal to its transpose.

Here’s an example of a self-orthogonal Latin square:

    1 7 6 5 4 3 2
    3 2 1 7 6 5 4
    5 4 3 2 1 7 6
    7 6 5 4 3 2 1
    2 1 7 6 5 4 3
    4 3 2 1 7 6 5
    6 5 4 3 2 1 7

To see that this Latin square is orthogonal to itself, we’ll concatenate the square and its transpose.

    11 73 65 57 42 34 26 
    37 22 14 76 61 53 45 
    56 41 33 25 17 72 64 
    75 67 52 44 36 21 13 
    24 16 71 63 55 47 32 
    43 35 27 12 74 66 51 
    62 54 46 31 23 15 77

Each pair of digits is unique.

Magic squares

As we discussed in the earlier post, you can make a magic square out of a pair of orthogonal Latin squares. If we interpret the pairs of digits above as base 10 numbers, we have a magic square because each row, column, and diagonal has the digits 1 through 7 in the 1s place and the same set of digits in the 10s place.

Typically an n × n magic square is filled with the numbers 1 through n². We can make such a magic square with a few adjustments.

First, we subtract 1 from every element of our original square before we combine it with its transpose. This gives us the following.

    00 62 54 46 31 23 15 
    26 11 03 65 50 42 34 
    45 30 22 14 06 61 53 
    64 56 41 33 25 10 02 
    13 05 60 52 44 36 21 
    32 24 16 01 63 55 40 
    51 43 35 20 12 04 66 

If we interpret the elements above as base 7 numbers, then the entries are the numbers 0 through 66seven which equals 48ten. If we add 1 to every entry we get all the numbers from 1 through 100seven = 49ten. Written in base 10, this is the following.

     1 45 40 35 23 18 13 
    21  9  4 48 36 31 26 
    34 22 17 12  7 44 39 
    47 42 30 25 20  8  3 
    11  6 43 38 33 28 16 
    24 19 14  2 46 41 29 
    37 32 27 15 10  5 49 

Possible sizes

It’s surprising that orthogonal Latin squares exist as often as they do. Self-orthogonal Latin squares are more rare, and yet they exist for all sizes except n = 2, 3, or 6. (Source: The CRC Handbook of Combinatorial Designs.)

It’s hard to prove that self-orthogonal Latin squares exist, but it’s easy to verify the sizes where they don’t exist. There is no self-orthogonal Latin square of size 6 because there is no pair of orthogonal Latin squares of size 6. The latter is a hard theorem to prove, but given it, the former is trivial.

There are orthogonal Latin squares of size 3, but no self-orthogonal Latin squares of size 3. And it’s not hard to see why. Suppose there were such a square. Without loss of generality we can label the top row with 1, 2, and 3. Let x be the entry directly below 1.

    1 2 3
    x * *
    * * *

What could x be? Not 1, because elements in a column of a Latin square are unique. It can’t be 2 either, because otherwise when you join the square and its transpose would have two 22s. So x must be 3, and the element below x must be 2. But then when you join the square and its transpose you have two 32s. There’s not enough wiggle room for a 3 × 3 Latin square to be self-orthogonal.

Greco-Latin squares and magic squares

Suppose you create an n × n Latin square filled with the first n letters of the Roman alphabet. This means that each letter appears exactly once in each row and in each column.

You could repeat the same exercise only using the Greek alphabet.

Is it possible to find two n × n Latin squares, one filled with Roman letters and the other filled with Greek letters, so that when you combine corresponding entries, each combination of Roman and Greek letters appears exactly once? If so, the combination is called a Greco-Latin square.

Whether Greco-Latin squares of size n exist depends on n. But before we explore that, let’s give an example.

Here are two Latin squares, one filled with the first four Roman letters and the other filled with the first four Greek letters.

    A B C D   α β γ δ
    D C B A   γ δ α β
    B A D C   δ γ β α
    C D A B   β α δ γ

If we concatenate the two matrices, we get

    Aα Bβ Cγ Dδ   
    Dγ Cδ Bα Aβ   
    Bδ Aγ Dβ Cα   
    Cβ Dα Aδ Bγ   

and each of the two-letter entries is unique. So we’ve shown that a Greco-Latin square exists for n = 4.

Mutually Orthogonal Latin Squares (MOLS)

The more modern name for Greco-Latin squares is “mutually orthogonal Latin squares,” abbreviated MOLS. We say two Latin squares M and N are mutually orthogonal if the list of pairs (Mij, Nij,) contains no duplicates.

Euler’s work

Euler showed how to construct Greco-Latin squares when n is odd and when n is a multiple of 4. He conjectured that no Greco-Latin squares exist when n = 4k + 2.

His conjecture was incorrect. For example, there is a Greco-Latin square of size 10. And in fact, Greco-Latin squares exist for all n except n = 2 and n = 6.

Magic squares

Suppose you have two orthogonal Latin squares M and N of size n, and suppose that the diagonal elements of both squares contain no duplicates.

Use the numbers 0 through n-1 rather than Greek or Latin letters to fill the squares and interpret the Latin squares as matrices. Then the matrix

nM + N

is a magic square. This is equivalent to taking the corresponding Greco-Latin square and interpreting the entries as base n numbers.

For example, using the Latin squares above, replace A and α with 0, B and β with 1, C and γ with 2, and D and γ with 3. Then the Greco-Roman square

    Aα Bβ Cγ Dδ   
    Dγ Cδ Bα Aβ   
    Bδ Aγ Dβ Cα   
    Cβ Dα Aδ Bγ   


    00 11 22 33   
    32 23 10 01   
    13 02 31 20   
    21 30 03 12   

with the entries being base 4 numbers. The rows and columns clearly have the same sum because they all have the same set of numbers in the 1s place and in the 4s place. Written in base 10, the magic square above is

     0  5 10 15   
    14 11  4  1   
     7  2 13  8   
     9 12  3  6   

It’s conventional for magic squares to be filled with consecutive numbers starting with 1, so you could add 1 to every entry above if you’d like.

Related posts

Latin squares and 3D chess

In a n×n Latin square, each of the numbers 1 through n appears exactly once in each row and column. For example, the 5 × 5 square below is a Latin square.

12345, 21534, 34251, 45123, 53412

If we placed a rook on each square numbered 1, the rooks would not attack each other since no two rooks would be in the same row or the same column. The same would be true if we moved all the rooks to squares numbered 2, or 3, etc.

Now imagine a n×n×n chess cube, a stack of n chessboards, each board being n×n. For example, we could have a stack of eight standard 8×8 boards.

In our 3D chess cube, rooks can move any number of squares in either the x or y direction within a board, and they can also move vertically in the z direction.

Put a rook on every square of the bottom board. Then take the rooks on squares numbered 2 and move them to the second level. Take the rooks on squares numbered 3 and move them to the third level, and so forth.

Now we have n×n rooks, and none is in a position to attack the other. To see this, pick a particular level k. None of the rooks on level k are attacking each other because level k is a Latin square. And none of the rooks can attack vertically because we started with all the rooks on the bottom level and lifted them up by various amounts; there’s only one rook in each vertical column.

Next let’s suppose we have n×n rooks arranged in 3D so that none is attacking any of the others. Label the rooks on level k with a k. Now push all the rooks straight down vertically to the first level. There can only be one rook on each square because no rooks were attacking each other vertically.

Number each square by the number of its rook. I claim the result is a Latin square. There can only be one k in each row and column because all the ks started out on level k, and none were attacking each other in the x or y direction.

Related posts

How many Latin squares are there?

12345, 21534, 34251, 45123, 53412

A Latin square is an n × n grid with a number from 1 to n in each cell, such that no number appears twice in a row or twice in a column.

It’s not obvious that Latin squares exist for all n, but they do, and in fact there are a lot of them. The exact number is known only for n ≤ 11. See [1] for the known values. There are upper and lower bounds for all n, and this post will look at how good the bounds are.

A particularly simple result is that number of Latin squares of size n is bounded below by the superfactorial of n [2]. That is, if Ln is the number of Latin squares of size n and the superfactorial of n is defined by

S(n) = 1! × 2! × 3! × … × n!



A reduced Latin square is a Latin square in which the elements of the first row and first column are in numerical order. The image at the top of the post is a reduced Latin square. If Rn is the number of reduced Latin squares of size n then

Ln = n! (n-1)! Rn

and so we can divide the lower bound on Sn by n! (n-1)! to get a lower bound on Rn:


Ronald Alter [2] gives the following upper bound on Rn:

R_n \leq \prod_{k=1}^{n-1} (n-k)^{k-1}(n-k)!

Here’s now the lower bound, exact value, and upper bound compare for n up to 6.

    | n | lower | exact |   upper |
    | 1 |     1 |     1 |       1 |
    | 2 |     1 |     1 |       1 |
    | 3 |     1 |     1 |       2 |
    | 4 |     2 |     4 |      24 |
    | 5 |    12 |    56 |    3456 |
    | 6 |   288 |  9408 | 9553280 |

The numbers get very big for larger n. so let’s switch over to a log scale.

Here’s a plot corresponding to the logarithms of the values above, including all known values of Rn.

The lower and upper bounds are remarkably symmetric about the exact value, which suggests that their average would make a good estimate. Let’s look at a plot.

Indeed, the average of the logs of the bounds is very close to the log of the actual value. This says the number of reduced Latin squares of size n is approximately the geometric mean of its upper and lower bounds, at least for n up to 11.

We can factor S(n-2) out of the upper bound on Rn when computing the geometric mean, and this gives us the approximation

R_n \approx S(n-2)\sqrt{(n-1)!} \,\,\prod_{k=1}^{n-1} (n-k)^{(k-1)/2}


[1] OEIS A000315

[2] Ronald Alter. The American Mathematical Monthly. Vol. 82, No. 6 (Jun. – Jul., 1975), pp. 632-634

Change of basis and Stirling numbers

Polynomials form a vector space—the sum of two polynomials is a polynomial etc.—and the most natural basis for this vector space is powers of x:

1, x, x², x³, …

But the power basis is not the only possible basis, and often not the most useful basis in application.

Falling powers

In some applications the falling powers of x are a more useful basis. For positive integers n, the nth falling power of x is defined to be

x^{\underbar{\small{\emph{n}}}) = x(x-1)(x-2)\cdots(x-n+1)

Falling powers come up in combinatorics, in the calculus of finite differences, and in hypergeometric functions.

Change of basis

Since we have two bases for the vector space of polynomials, we can ask about the matrices that represent the change of basis from one to the other, and here’s where we see an interesting connection.

The entries of these matrices are numbers that come up in other applications, namely the Stirling numbers. You can think of Stirling numbers as variations on binomial coefficients. More on Stirling numbers here.

In summation notation, we have

\begin{align*} x^{\underbar{\small{\emph{n}}}} &= \sum_{k=0}^n S_1(n,k)x^{\text{\small{\emph{k}}}} \\ x^{\text{\small{\emph{n}}}} &= \sum_{k=0}^n S_2(n,k)x^{\underbar{\small\emph{n}}} \\ \end{align*}

where the S1 are the (signed) Stirling numbers of the 1st kind, and the S2 are the Stirling numbers of the 2nd kind.

(There are two conventions for defining Stirling numbers of the 1st kind, differing by a factor of (-1)n-k.)

Matrix form

This means the (ij)th element of matrix representing the change of basis from the power basis to the falling power basis is S1(i, j) and the (i, j)th entry of the matrix for the opposite change of basis is S2(i, j). These are lower triangular matrices because S1(i, j) and S2(i, j) are zero for j > i.

These are infinite matrices since there’s no limit to the degree of a polynomial. But if we limit our attention to polynomials of degree less than m, we take the upper left m by m submatrix of the infinite matrix. For example, if we look at polynomials of degree 4 or less, we have

\begin{bmatrix} x^\underbar{\tiny{0}} \\ x^\underbar{\tiny{1}} \\ x^\underbar{\tiny{2}} \\ x^\underbar{\tiny{3}} \\ x^\underbar{\tiny{4}} \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & -1 & 1 & 0 & 0 \\ 0 & 2 & -3 & 1 & 0 \\ 0 & -6 & 11 & -6 & 1 \\ \end{bmatrix} \begin{bmatrix} x^\text{\tiny{0}} \\ x^\text{\tiny{1}} \\ x^\text{\tiny{2}} \\ x^\text{\tiny{3}} \\ x^\text{\tiny{4}} \\ \end{bmatrix}

to convert from powers to falling powers, and

\begin{bmatrix} x^\text{\tiny{0}} \\ x^\text{\tiny{1}} \\ x^\text{\tiny{2}} \\ x^\text{\tiny{3}} \\ x^\text{\tiny{4}} \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 3 & 1 & 0 \\ 0 & 1 & 7 & 6 & 1 \\ \end{bmatrix} \begin{bmatrix} x^\underbar{\tiny{0}} \\ x^\underbar{\tiny{1}} \\ x^\underbar{\tiny{2}} \\ x^\underbar{\tiny{3}} \\ x^\underbar{\tiny{4}} \\ \end{bmatrix}

going from falling powers to powers.

Incidentally, if we filled a matrix with unsigned Stirling numbers of the 1st kind, we would have the change of basis matrix going from the power basis to rising powers defined by

x^{\overline{n}} = x(x+1)(x+2)\cdots(x+n+1)

It may be hard to see, but there’s a bar on top of the exponent n for rising powers whereas before we had a bar under the n for falling powers.

Related posts

How many ways to make rock, paper, scissors, lizard, Spock?

In The Big Bang Theory, Sheldon Cooper explains an extension of the game Rock, Paper, Scissors by introducing two more possibilities, Lizard and Spock, so the game becomes Rock, Paper, Scissors, Lizard, Spock. Sam Kass and Karen Bryla invented the game before it became widely known via the television show.

The diagram below summarizes the rules:

Rules of rock, paper, scissors, lizard, spock

Alternative rules

Imagine yourself in the position of Sam Kass and Karen Bryla designing the game. You first try adding one extra move, but it turns out that’s not possible.

The main property of Rock, Paper, Scissors is that no player has a winning strategy. That implies you can only add an even number of extra moves, keeping the total number of moves odd. That way, for every move one player makes, the other player has an equal number of winning and losing moves. Otherwise some moves would be better and others worse. So you can’t add just one move, say Lizard. You have to add two (or four, or six, …).

How many ways could you assign rules to an extension of Rock, Paper, Scissors adding two more moves?

Number the possible moves 1 through 5. We will make a table of which moves beat which, with +1 in the (ik) position if move i beats move j and -1 if j beats i. There will be zeros on the diagonal since it’s a tie if both players make the same move.

Let’s start with the original Rock, Paper, Scissors. In order for this game to not have a winning strategy, the table must be filled in as below, with the only option being to set a = 1 or a = -1.

|   |  1 |  2 |  3 |
| 1 |  0 |  a | -a |
| 2 | -a |  0 |  a |
| 3 |  a | -a |  0 |

If 1, 2, and 3 correspond to Rock, Paper, and Scissors, then a = 1 according to the usual rules, but we’ll allow the possibility that the usual rules are reversed. (If you don’t like that, just set a = 1).

Next we fill in the rest of the moves. The table must be skew-symmetric, i.e. the (ij) element must be the negative of the (ji) element, because if (ij) is a winning move then (ji) is a losing move and vice versa. Also, the rows and columns must sum to zero. Together these requirements greatly reduce the number of possibilities.

|   |  1 |  2 |  3 |  4 |  5 |
| 1 |  0 |  a | -a |  b | -b |
| 2 | -a |  0 |  a |  c | -c |
| 3 |  a | -a |  0 |  d | -d |
| 4 | -b | -c | -d |  0 |  d |
| 5 |  b |  c |  d | -d |  0 |

The values of a, b, and c may each be chosen independently to be 1 or -1. If bc = 0, then d can be chosen freely. Otherwise b and c have the same sign, and d must have the opposite sign. So all together there are 12 possibilities (6 if you insist a = 1). These are listed below.

| a | b | c | d |
| + | + | + | - |
| + | + | - | + |
| + | + | - | - |
| + | - | + | + |
| + | - | + | - |
| + | - | - | + |
| - | + | + | - |
| - | + | - | + |
| - | + | - | - |
| - | - | + | + |
| - | - | + | - |
| - | - | - | + |

The version of the rules used on The Big Bang Theory corresponds to the second row of the table above: a = b = d = 1 and c = -1.

Simpler solution

Here’s another way to count the number of possible designs. Suppose we start with tradition Rock, Paper, Scissors, corresponding to a = 1 in the notation above. Now let’s add the rules for Lizard. We can pick any two things from {Rock, Paper, Scissors, Spock} and say that Lizard beats them, and then the other two things must beat Lizard. There are 6 ways to choose 2 things from a set of 4.

Once we’ve decided the rules for Lizard, we have no choice regarding Spock. Spock’s rules must be the opposite of Lizard’s rules in order to balance everything out. If  we decide Lizard beats Rock, for example, then Rock must beat Spock so two things beat Rock and Rock beats two things.

If we’re willing to consider reversing the usual rules of Rock, Paper, Scissors, i.e. setting a = -1, then there are 6 more possibilities, for a total of 12.

Adding two more moves

By the way, we can see from the approach above how to add more moves. If we wanted to add Stephen Hawking and Velociraptor to our game, then we have 20 choices: we choose 3 things out of 6 for Hawking to beat, and the rest of the rules are determined by these choices. Velociraptor has to be the anti-Hawking. If we decide that Hawking beats Paper, Lizard, and Spock, then we’d get the rules in the diagram below.

Extending rock, paper, scissors, lizard, Spock with two more moves

Fair subsets

You might want to design the game so that for any subset of three moves you have a game with no winning strategy. Here’s an example why. If the subset (1 , 2, 4) is a fair game, then a = c. But if the subset (2, 3, 4) is a fair game, then a-c. So one of the two games must have a winning strategy.

Graphing the rules

The first graph above was made with GraphVis using the code below.

digraph rock {
    node [fontname = "helvetica"]

    "Rock"     -> "Scissors" 
    "Rock"     -> "Lizard"
    "Paper"    -> "Rock"
    "Paper"    -> "Spock"
    "Scissors" -> "Paper"
    "Scissors" -> "Lizard"
    "Lizard"   -> "Spock"
    "Lizard"   -> "Paper"
    "Spock"    -> "Rock"
    "Spock"    -> "Scissors"

    layout = circo

Save the code to a file, say rock.gv, then run the command

dot -Tpng rock.gv > rock.png

to produce a PNG file.

Related posts

Three things about dominoes

Here are three things about dominoes, two easy and one more advanced.


First, how many pieces are there in a set of dominoes? A domino corresponds to an unordered pair of numbers from 0 to n. The most popular form has n = 6, but there are variations with other values of n. You can show that the number of dominoes is

{ n+1\choose 2} + n + 1

This is because there are n+1 possible numbers (since blanks are a possibility) and each one is either a double or not. The number of ways to choose two distinct numbers is the binomial coefficient and the number of doubles is n+1.

Another way to look at this is that we are selecting two things from a set of n+1 things with replacement and so the number of possibilities is

\left({ n+1\choose 2}\right) = {n+2 \choose 2}

where the symbol on the left is Stanley’s symbol for selection with replacement.

In any case, there are 28 dominoes when n = 6, 55 when n = 9, and 91 when n = 12.

Magic squares

There are a couple ways to make a magic square of sorts from a set of dominoes. To read more about this, see this post.

Domino magic square


How many ways can you cover an m by n chess board with dominoes? The answer turns out to be

\sqrt{\prod_{k=1}^m \prod_{\ell=1}^n \left( 2\cos\left(\frac{\pi k}{m+1} \right) + 2i \cos\left(\frac{\pi \ell}{n+1} \right) \right)}

See this post for details.

Stirling numbers, including negative arguments

Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there.

Stirling numbers of the second kind

The Stirling number of the second kind

S_2(n,k) = \left\{ {n \atop k} \right\}

counts the number of ways to partition a set of n elements into k non-empty subsets. The notation on the left is easier to use inline, and the subscript reminds us that we’re talking about Stirling numbers of the second kind. The notation on the right suggests that we’re dealing with something analogous to binomial coefficients, and the curly braces suggest this thing might have something to do with counting sets.

Since the nth Bell number counts the total number of ways to partition a set of n elements into any number of non-empty subsets, we have

B_n = \sum_{k=1}^n \left\{ {n \atop k}\right\}

Another connection to Bell is that S2(n, k) is the sum of the coefficients in the partial exponential Bell polynomial Bn, k.

Stirling numbers of the first kind

The Stirling numbers of the first kind

S_1(n,k) = \left[ {n \atop k} \right]

count how many ways to partition a set into cycles rather than subsets. A cycle is a sort of ordered subset. The order of elements matters, but only a circular way. A cycle of size k is a way to place k items evenly around a circle, where two cycles are considered the same if you can rotate one into the other. So, for example, [1, 2, 3] and [2, 3, 1] represent the same cycle, but [1, 2, 3] and [1, 3, 2] represent different cycles.

Since a set with at least three elements can be arranged into multiple cycles, Stirling numbers of the first kind are greater than or equal to Stirling numbers of the second kind, given the same arguments.

Addition identities

We started out by saying Stirling numbers were like binomial coefficients, and here we show that they satisfy addition identities similar to binomial coefficients.

For binomial coefficients we have

{n \choose k} = {n - 1 \choose k} + {n-1 \choose k-1}.

To see this, imagine we start with the set of the numbers 1 through n. How many ways can we select a subset of k items? We have selections that exclude the number 1 and selections that include the number 1. These correspond to the two terms on the right side of the identity.

The analogous identities for Stirling numbers are

\left\{{n \atop k} \right\}= k \left\{ {n-1 \atop k} \right\} + \left\{ {n-1 \atop k-1} \right\}

\left[ {n \atop k} \right]= (n-1) \left[ {n-1 \atop k} \right] + \left[ {n-1 \atop k-1} \right]
The combinatorial proofs of these identities are similar to the argument above for binomial coefficients. If you want to partition the numbers 1 through n into k subsets (or cycles), either 1 is in a subset (cycle) by itself or not.

More general arguments

Everything above has implicitly assumed n and k were positive, or at least non-negative, numbers. Let’s look first at how we might handle zero arguments, then negative arguments.

It works out best if we define S1(0, k) and S2(0, k) to be 1 if k is 0 and zero otherwise. Also we define S1(n, 0) and S2(n, 0) to be 1 if n is 0 and zero otherwise. (See Concrete Mathematics.)

When either n or k is zero the combinatoric interpretation is strained. If we let either be negative, there is no combinatorial interpretation, though it can still be useful to consider negative arguments, much like one does with binomial coefficients.

We can take the addition theorems above, which are theorems for non-negative arguments, and treat them as definitions for negative arguments. When we do, we get the following beautiful dual relationship between Stirling numbers of the first and second kinds:

\left\{{n \atop k} \right\}= \left[ {-k \atop -n} \right]

Related posts

Bell numbers

The nth Bell number is the number of ways to partition a set of n labeled items. It’s also equal to the following sum.

B_n = \frac{1}{e}\sum_{k=0}^\infty \frac{k^n}{k!}

You may have to look at that sum twice to see it correctly. It looks a lot like the sum for en except the roles of k and n are reversed in the numerator.

The sum reminded me of the equation

\frac{d}{de}e^x = x e^{x-1}

that I blogged about a few weeks ago. It’s correct, but you may have to stare at it a little while to see why.

Incidentally, the nth Bell number is the nth moment of a Poisson random variable with mean 1.

There’s also a connection with Bell polynomials. The nth Bell number is the sum of the coefficients of the nth complete Bell polynomial. The nth Bell polynomial is sum over k of the partial exponential Bell polynomials Bn,k. The partial (exponential) Bell polynomials are defined here.

Computing Bell numbers

You can compute Bell numbers in Python with sympy.bell and in Mathematical with BellB. You can compute Bell numbers by hand, or write your own function in a language that doesn’t provide one, with the recursion relation B0 = 1 and

B_n = \sum_{k=0}^{n-1}{ n-1 \choose k}B_k

for n > 0. Bell numbers grow very quickly, faster than exponential, so you’ll need extended precision integers if you want to compute very many Bell numbers.

For theoretical purposes, it is sometimes helpful to work with the exponential generating function of the Bell numbers. The nth Bell number is n! times the coefficient of xn in exp(exp(x) – 1).

\sum_{n=0}^\infty \frac{B_n}{n!} x^n = \exp(\exp(x) -1)

An impractical but amusing way to compute Bell numbers would be via simulation, finding the expected value of nth powers of random draws from a Poisson distribution with mean 1.