# Up-down permutations

An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example

1, 5, 3, 4, 2

is an up-down permutation of 1, 2, 3, 4, 5 because

1 < 5 > 3 < 4 > 2.

Up-down permutations are also called alternating permutations for obvious reasons. The number of up-down permutations of a set of size n is often denoted A(n) because these permutations were first studied by Désiré André. It is also sometimes denoted En, where the E alludes to Euler.

André found the generating function for A(n):

You could read this as saying sec(x) + tan(x) is the exponential generating function for A(n) or as saying sec(x) + tan(x) is the ordinary generating function for

where P(n) is the proportion of all permutations of the digits 1 through n which are alternating permutations.

It’s amazing that up-down permutations have anything to do with trig functions, but they do.

Knowing the (exponential) generating function for A(n) lets us use generating function techniques to prove various things about these number. For example, we can use the generating function to discover their asymptotic behavior.

The secant and tangent functions are well behaved at 0, but both have a singularity at π/2. This means the generating function above is not just a formal power series but an analytic power series with radius of convergence π/2, the distance from the center of the power series to the closest singularity. It follows that

and so the proportion of permutations which are alternating permutations decreases exponentially for large n.

# Ruzsa distance

A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group.

## Subset difference

Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A B is defined the set

As is customary with Abelian groups, we denote the group operation by + and a b means the group operation applied to a and the inverse of b.

For example, let G be the group of integers mod 10. Let A = {1, 5} and B = {3, 7}. Then A B is the set {2, 4, 8}. There are only three elements because 1 − 3 and 5 − 7 are both congruent to 8.

## Ruzsa distance

The Ruzsa distance between two subsets of an Abelian group is defined by

where |S| denotes the number of elements in a set S.

The Ruzsa distance is not a metric, but it fails in an unusual way. the four axioms of a metric are

1. d(x, x) = 0
2. d(x, y) > 0 unless x = y
3. d(x, y) = d(y, x)
4. d(x, z) ≤ d(x, y) + d(y, z)

The first axiom is usually trivial, but it’s the only one that doesn’t hold for Ruzsa distance. In particular, the last axiom, the triangle inequality, does hold.

To see that the first axiom does not always hold, lets again let G be the integers mod 10 and let A = {1, 3}. Then A A is the set {0, 2, 8} and d(A, A) = log 3/2 > 0.

Sometimes d(A, A) does equal zero. If A = G then A A = A, and so d(A, A) = log 1 = 0.

## Entropic Ruzsa distance

If we switch from subsets of G to random variables taking values in G we can define an analog of Ruzsa distance between random variables X and Y, the entropic Ruzsa distance

where X′ and Y′ are independent copies of X and Y and H is Shannon entropy. For more on entropic Ruzsa distance, see this post by Terence Tao.

Note that if A and B are subsets of G, and X and Y are uniform random variables with support on A and B respectively, then the negative terms above correspond to the log of 1/|A|½ |B|½.  The H(X′ + Y′) term isn’t the log of |AB| though because for one thing its a sum rather than a difference. For another, the sum of uniform random variables may not be uniform: there may be more than one way to end up at a particular sum, and so sum values will have higher probability.

# The 10th Dedekind number

The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be

M(9) = 286386577668298411128469151667598498812366.

The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, and M(3) = 20. But as n increases, M(n) increases rapidly, with M(9) being on the order of 1041.

Although computing the Dedekind numbers exactly is difficult—M(8) was computed in 1991 and M(9) now in 2023—there is an explicit formula for these numbers, and much is known about their asymptotic growth. This post speculates about what M(10) might be.

Write the number k in binary and let bik be its ith bit:

Then the nth Dedekind number is given by

and so

In principle, all you have to do to compute M(10) is evaluate the sum above. However, since this sum has more than 10308 terms, it would take a while.

What can we say about M(10) without computing it? The number of monotone Boolean functions of n variables is less than the total number of Boolean functions of n variables, which equals

That tells us M(10) < 1.8 × 10308.

There are more useful bounds. It has been proven that

This gives us a definite lower bound but not a definite upper bound. We know M(10) ≥ 2252 which is approximately 7.237 × 1075, but we don’t know what the big-O term is. All we know is that for sufficiently large n, this term is smaller than some multiple of log(n)/n. How large does n need to be and what is this constant? I don’t know. Maybe researchers in this area have some partial results.

Let’s take a guess at the upper bound by seeing what the big-O term was for M(9). Find k such that

We get

and we can use this to guess that

which would imply M(10) = 3.253 × 1082.

So to recap, we know for certain that M(10) is between 7.237 × 1075 and 1.8 × 10308, and our guess based on the heuristic above is that M(10) = 3.253 × 1082.

# Computing Stirling numbers with limited integers

A couple days ago I wrote a post about a probability problem that involved calculating Stirling numbers. There are two kinds of Stirling numbers, creatively called “Stirling numbers of the first kind” and “Stirling numbers of the second kind.” The second kind come up more often in application, and so when authors say “Stirling numbers” without qualification, they mean the second kind. That’s the convention I’ll use here.

The Stirling number S(n, k) can be computed via the following series, though we will take a different approach for reasons explained below.

## Problem statement

This made me think of the following problem. Suppose you want to calculate Stirling numbers. You want to use integer arithmetic in order to get exact results.

And suppose you can use integers as large as you like, but you have to specify the maximum integer size before you start calculating. Imagine you have to pay \$N to work with N-digit integers, so you want to not use a larger value of N than necessary.

## Intermediate results

There are a couple problems with using the equation above. Direct implementation of the equation would first calculate k! S(n, k) first, then divide by k! to get S(n, k). This would be costly because it would require calculating a number k! times bigger than the desired final result. (You could refactor the equation so that there is no division by k! at the end, but this gives an expression that includes non-integer terms.)

In the earlier post, I actually needed to calculate k! S(n, k) rather than S(n, k) and so this wasn’t a problem. This brings up the second problem with the equation above. It’s an alternating series, and it’s not clear whether evaluating the series produces intermediate results that are larger than the final result.

## Recursion

Stirling numbers can be computed recursively using

with the base cases S(n, n) = 1 for n ≥ 0 and S(n, 0) = S(0, n) for n > 0. This computes Stirling numbers via a sum of smaller Stirling numbers, and so the intermediate results are no larger than the final result.

So now we’re left with the question of how big Stirling numbers can be. Before computing S(n, k) you need an upper bound on how many digits it will have.

## Bounding Stirling numbers

Stirling numbers can be bound in terms of binomial coefficients.

This reduces the problem to finding an upper bound on binomial coefficients. I wrote about a simple bound on binomial coefficients here.

# Hypergeometric distribution symmetry

One of these days I’d like to read Feller’s probability book slowly. He often says clever things in passing that are easy to miss.

Here’s an example from Feller [1] that I overlooked until I saw it cited elsewhere.

Suppose an urn contains n marbles, n1 red and n2 black. When r marbles are drawn from the urn, the probability that precisely k of the marbles are red follows a hypergeometric distribution.

Feller mentions that this expression can be rewritten as

What just happened? The roles of the total number of red marbles n1 and the sample size r, have been swapped.

The proof is trivial—expand both expressions using factorials and observe that they’re equal—but that alone doesn’t give much insight. It just shows that two jumbles of symbols represent the same quantity.

The combinatorial interpretation is more interesting and more useful. It says that you have two ways to evaluate the probability: focusing on the sample or focusing on the population of red marbles.

The former perspective says we could multiply the number of ways to choose k marbles from the supply of n1 red marbles and rk black marbles from the supply of nn1 black marbles, then divide by the number of ways to choose a sample of r marbles from an urn of n marbles.

The latter perspective says that rather than partitioning our sample of r marbles, we could think of partitioning the population of red marbles into two groups: those included in our sample and those not in our sample.

I played around with adding color to make it easier to see how the terms move between the two expressions. I colored the number of red marbles red and the sample size blue.

Moving from left to right, the red variable goes down and the blue variable goes up.

## Related posts

[1] William Feller. An Introduction to Probability Theory and Its Applications. Volume 1. Third edition. Page 44.

# Shuffle product

The shuffle product of two words, w1 and w2, written

w1 Ш w2,

is the set of all words formed by the letters in w1 and w2, preserving the order of each word’s letters. The name comes from the analogy with doing a riffle shuffle of two decks of cards.

For example, bcd Ш ae, the shuffle product of bcd and ae, would be all permutations of abcde in which the consonants appear in alphabetical order and the vowels are also in alphabetical order. So abecd and baecd would be included, but badec would not be because the d and c are in the wrong order.

## Side note on Ш

Incidentally, the symbol for shuffle product is the Cyrillic letter sha (Ш, U+0428), the only Cyrillic letter commonly used in mathematics, at least internationally. Presumably Russian mathematicians use other Cyrillic letters, but the only Cyrillic letter an American mathematician, for example, is likely to use is Ш.

The uses of Ш that I’m aware of are the Dirac comb distribution, the Tate–Shafarevich group, and the shuffle product.

## Duplicate letters

What is the shuffle product of words containing duplicate letters? For example, what about the shuffle product of bread and crumb? Each word contains an r. The shuffle product, defined above as a set, doesn’t distinguish between the two rs. But another way to define the shuffle product is as a formal sum, with coefficients that count duplicates.

Imagine coloring the letters in abc blue and the letters in cde red. Then abccde and abccde would count as two different possibilities, one with blue c followed by red c, and one the other way around. This term in the formal sum would be 2abccde, the two capturing that there are two ways to arrive at this word.

You could also have duplicate letters within a single word. So in banana, for example, you could imagine coloring each a a different color and coloring the two ns different colors.

## Mathematica code

shuffleW[s1_, s2_] :=
Module[{p, tp, ord},
p = Permutations@Join[1 & /@ s1, 0 & /@ s2]\[Transpose];
tp = BitXor[p, 1];
ord = Accumulate[p] p + (Accumulate[tp] + Length[s1]) tp;
Outer[Part, {Join[s1, s2]}, ord, 1][[1]]\[Transpose]]


This code takes two lists of characters and returns a list of lists of characters. You can use this to compute both senses of the shuffle product. For example, let’s compute abc Ш ac.

The Mathematica command

    shuffleW[{a, b, c}, {a, c}]

returns a list of 10 lists:


{{a, b, c, a, c}, {a, b, a, c, c}, {a, b, a, c, c},
{a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c},
{a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c},
{a, c, a, b, c}}


If we ask for the union of the set above with Union[%] we get

    {{a, a, b, c, c}, {a, a, c, b, c}, {a, b, a, c, c},
{a, b, c, a, c}, {a, c, a, b, c}}


So using the set definition, we could say

abc Ш ac = {aabcc, aacbc, abacc, abcac, acabc}.

Using the formal sum definition we could say

abc Ш ac = 4aabcc + 2aacbc + 2abacc + abcacacabc.

# Reverse engineering options

This weekend I saw a sign in the window of a Burger King™ that made me think of an interesting problem. If you know the number of possibilities like this, how would you reverse engineer what the options that created the possibilities?

In the example above, there are 211,184 = 213×33 possible answers, and so most likely there are 13 questions that have a yes or no answer, and 3 questions that have 3 possible answers each. But there are other possibilities in principle, a lot of other possibilities.

Maybe there are 15 questions: 1 with 4 possible answers, 12 with 2 possible answers, and 3 with 3 possible answers. Or maybe there are again 15 questions, but one of the questions has 9 answers. If there are 14 questions, maybe one question has 8 answers, or 27 answers, or maybe two questions have 6 answers, … it gets complicated.

## More general problem

How would you address this kind of problem in general?

Suppose there is some list of questions Q1, Q2, …, Qk where there are qi possible answers to question Qi. Then the product of all the qi equals the total number of possible responses P to the questions. Note that we don’t know the number of questions k; all we know is the product P. Let’s call the ordered list {q1, q2, …, qk} of number of responses to questions a survey.

There are as many possible surveys as there are multiplicative partitions of P. The number of possible surveys only depends on the exponents in the prime factorization of P, not on the prime factors themselves. It would make no difference if in the example above we had 513×73 possibilities rather than 213×33 possibilities. We’d reason the same way when counting the possible surveys. The number of questions is somewhere between 1 and the sum of the exponents in the prime factorization.

We have to decide how we want to count surveys. For example, if we do have 13 questions with binary answers and 3 questions with trinary answers, does it matter which questions have three answers? If it does, we multiply the number of possibilities by the binomial coefficient C(16, 3) because we choose which of the 16 total questions have three possible answers. And then we have to reason about the case of 15 questions, 14 questions, etc.

Multiplicative partitions are complicated, and its not possible to write down simple expressions for the number of multiplicative partitions of numbers with many prime factors. But in the special case of only two distinct prime factors, i.e.

then there is a relatively simple expression for the total number of possible surveys, assuming order matters [1]:

This works out to 3,760,128 in the example above. Curiously,

3760128 = 221184 × 17.

But note that our result does not depend on the primes in the factorization of P, only on their exponents. There are no p‘s in the summation above, only i‘s and j‘s. As we noted earlier, we’d come to the same count if we’d started with P = 513×73. The factors of 2 and 3 in our count arrive for different reasons than being the primes in the factorization of P.

## Related posts

[1] A. Knopfmacher and M. E. Mays. A survey of factorization counting functions, International Journal of Number Theory. Vol. 01, No. 04, pp. 563-581 (2005)

# Design of experiments and design theory

Design of experiments is a branch of statistics, and design theory is a branch of combinatorics, and yet they overlap quite a bit.

It’s hard to say precisely what design theory is, but it’s consider with whether objects can be arranged in certain ways, and if so how many ways this can be done. Design theory is pure mathematics, but it is of interest to people working in ares of applied mathematics such as coding theory and statistics.

Here’s a recap of posts I’ve written recently related to design of experiments and design theory.

## Design of Experiments

A few weeks ago I wrote about fractional factorial design. Then later I wrote about response surface models. Then a diagram from central composite design, a popular design in response surface methodology, was one the diagrams in a post I wrote about visually similar diagrams from separate areas of application.

I wrote two posts about pitfalls with A/B testing. One shows how play-the-winner sequential testing has the same problems as Condorcet’s voter paradox, with the order of the tests potentially determining the final winner. More seriously, A/B testing cannot detect interaction effects which may be critical.

## ANSI and Military Standards

There are several civilian and military standards related to design of experiments. The first of these was MIL-STD-105. The US military has retired this standard in favor of the civilian standard ASQ/ANSI Z1.4 which is virtually identical.

Similarly, the US military standard MIL-STD-414 was replaced by the very similar civilian standard ASQ/ANSI Z1.9. This post looks at the mean-range method for estimating variation which these two standards reference.

## Design Theory

I wrote a couple posts on Room squares, one on Room squares in general and one on Thomas Room’s original design now known as a Room square. Room squares are used in tournament designs.

I wrote a couple posts about Costas arrays, an introduction and a post on creating Costas arrays in Mathematica.

## Latin Squares

Latin squares and Greco-Latin squares a part of design theory and a part of design of experiments. Here are several posts on Latin and Greco-Latin squares.

# The Chicken McNugget Monoid

When McDonalds first introduced Chicken McNuggets, you could buy McNuggets in boxes of 6, 9, or 20. The Chicken McNugget problem is to determine which numbers of McNuggets you can and cannot buy. A number n is a McNugget number if it is possible to buy exactly that many McNuggets (using the original boxes).

There are only finitely many non-McNugget numbers, and these are listed in OEIS sequence A065003.

Note that if you order eight boxes of 6 nuggets you get 48 nuggets, and so if you order more than eight boxes, in any combination, you get more than 48 nuggets. So the following program will compute all McNugget numbers less than 48.

    ns = set()
for i in range(8):
for j in range(8):
for k in range(8):
n = 6*i + 9*j + 20*k
if n <= 48:

comp = set(range(48)) - ns


The non-McNugget numbers less than 48 are stored in comp, the set complement of ns. These numbers are

1, 2, 3, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 22, 23, 25, 28, 31, 34, 37, 43.

It turns out 48 and 49 are also McNugget numbers. You can verify this by changing “48” to “50” in the code above and looking at ns. This means that 44, 45, 46, 47, 48, and 49, a run of 6 consecutive numbers are all McNugget numbers, and so you can add boxes of 6 to these numbers to get any greater number. This shows that 43 is the largest non-McNugget number.

The set of McNugget numbers is contains 0, my favorite number of McNuggets, and is closed under addition, and so it forms a monoid.

Update: The next post generalizes this one, giving a little theory and more general code.

Source: Factoring in the Chicken McNugget monoid. arXiv:1709.01606

# Efficiently testing a black box

Suppose you have a black box that takes three bits as input and produces one bit as output. You could think of the input bits as positions of toggle switches, and the output bit as a light attached to the box that is either on or off.

## Full factorial design

Now suppose that only one combination of 3 bits produces a successful output. There’s one way to set the switches that makes the light turn on. You can find the successful input by brute force if you test all 2³ = 8 possible inputs. In statistical lingo, you are conducting an experiment with a factorial design, i.e. you test all combinations of inputs.

In the chart above, each row is an experimental run and each column is a bit. I used − and + rather than 0 and 1 because it is conventional in this topic to use a minus sign to indicate that a factor is not present and a plus sign to indicate that it is present.

## Fractional factorial design

Now suppose your black box takes 4 bits as inputs, but only 3 of them matter. One of the bits does nothing, but you don’t know which bit that is. You could use a factorial design again, testing all 24 = 16 possible inputs. But there’s a more clever approach that requires only 8 guesses. In statistical jargon, this is a fractional factorial design.

No matter which three bits the output depends on, all combinations of those three bits will be tested. Said another way, if you delete any one of the four columns, the remaining columns contain all combinations of those bits.

## Replications

Now suppose your black box takes 8 bits. Again only 3 bits matter, but you don’t know which 3. How many runs do you need to do to be certain of finding the successful combination of 3 bits? It’s clear that you need at least 8 runs: if you know that the first three bits are the important ones, for example, you still need 8 runs. And it’s also clear that you could go back to brute force and try all 28 = 256 possible inputs, but the example above raises your hopes that you could get by with less than 256 runs. Could you still get by with 8? That’s too much to hope for, but you could use only 16 runs.

Note that since this design works for 8 factors, it also works for fewer factors. If you had 5, 6, or 7 factors, you could use the first 5, 6, or 7 columns of the design above.

This design has some redundancy: every combination of three particular bits is tested twice. This is unfortunate in our setting because we are assuming the black box is deterministic: the right combination of switches will always turn the light on. But what if the right combination of switches probably turns on the light? Then redundancy is a good thing. If there’s an 80% chance that the right combination will work, then there’s a 96% chance that at least one of the two tests of the right combination will work.

Fractional factorial experiment designs are usually used with the assumption that there are random effects, and so redundancy is a good thing.

You want to test each main effect, i.e. each single bit, and interaction effects, i.e. combinations of bits, such as pairs of bits or triples of bits. But you assume that not all possible interactions are important; otherwise you’d need a full factorial design. You typically hit diminishing returns with interactions quickly: pairs of effects are often important, combinations of three effects are less common, and rarely would an experiment consider fourth order interactions.

If only main effects and pairs of main effects matter, and you have a moderately large number of factors n, a fractional factorial design can let you use a lot less than 2n runs while giving you as many replications of main and interaction effects as you want.

## Verification

The following Python code verifies that the designs above have the claimed properties.

    import numpy as np
from itertools import combinations

def verify(matrix, k):
"verify that every choice of k columns has 2^k unique rows"
nrows, ncols = matrix.shape
for (a, b, c) in combinations(range(ncols), k):
s = set()
for r in range(nrows):
if len(s) != 2**k:
print("problem with columns", a, b, c)
print("number of unique elements: ", len(s))
print("should be", 2**k)
return
print("pass")

m = [
[-1, -1, -1, -1],
[-1, -1, +1, +1],
[-1, +1, -1, +1],
[-1, +1, +1, -1],
[+1, -1, -1, +1],
[+1, -1, +1, -1],
[+1, +1, -1, -1],
[+1, +1, +1, +1]
]

verify(np.matrix(m), 3)

m = [
[-1, -1, -1, -1, -1, -1, -1, -1],
[+1, -1, -1, -1, -1, +1, +1, +1],
[-1, +1, -1, -1, +1, -1, +1, +1],
[+1, +1, -1, -1, +1, +1, -1, -1],
[-1, -1, +1, -1, +1, +1, +1, -1],
[+1, -1, +1, -1, +1, -1, -1, +1],
[-1, +1, +1, -1, -1, +1, -1, +1],
[+1, +1, +1, -1, -1, -1, +1, -1],
[-1, -1, -1, +1, +1, +1, -1, +1],
[+1, -1, -1, +1, +1, -1, +1, -1],
[-1, +1, -1, +1, -1, +1, +1, -1],
[+1, +1, -1, +1, -1, -1, -1, +1],
[-1, -1, +1, +1, -1, -1, +1, +1],
[+1, -1, +1, +1, -1, +1, -1, -1],
[-1, +1, +1, +1, +1, -1, -1, -1],
[+1, +1, +1, +1, +1, +1, +1, +1],
]

verify(np.matrix(m), 3)