Separable functions in different contexts

I was skimming through the book Mathematical Reflections [1] recently. He was discussing a set of generalizations [2] of the Star of David theorem from combinatorics.

\gcd\left(\binom{n - 1}{r - 1}, \binom{n}{r+1}, \binom{n+1}{r}\right) = \gcd\left(\binom{n-1}{r}, \binom{n+1}{r+1}, \binom{n}{r-1}\right)

The theorem is so named because if you draw a Star of David by connecting points in Pascal’s triangle then each side corresponds to the vertices of a triangle.

diagram illustrating the Star of David Theorem

One such theorem was the following.

\binom{n - \ell}{r - \ell} \binom{n - k}{r} \binom{n - k - \ell}{r - k} = \binom{n-k}{r-k} \binom{n - \ell}{r} \binom{n - k - \ell}{r - \ell}

This theorem also has a geometric interpretation, connecting vertices within Pascal’s triangle.

The authors point out that the binomial coefficient is a separable function of three variables, and that their generalized Star of David theorem is true for any separable function of three variables.

The binomial coefficient C(nk) is a function of two variables, but you can think of it as a function of three variables: n, k, and nk. That is

{n \choose k} = f(n) \, g(k) \, g(n-k)

where f(n) = n! and g(k) = 1/k!.

I was surprised to see the term separable function outside of a PDE context. My graduate work was in partial differential equations, and so when I hear separable function my mind goes to separation of variables as a technique for solving PDEs.

Coincidentally, I was looking a separable coordinate systems recently. These are coordinate systems in which the Helmholtz equation can be solved by separable function, i.e. a coordinate system in which the separation of variables technique will work. The Laplacian can take on very different forms in different coordinate systems, and if possible you’d like to choose a coordinate system in which a PDE you care about is separable.

Related posts

[1] Peter Hilton, Derek Holton, and Jean Pedersen. Mathematical Reflections. Springer, 1996.

[2] Hilton et al refer to a set of theorems as generalizations of the Star of David theorem, but these theorems are not strictly generalizations in the sense that the original theorem is clearly a special case of the generalized theorems. The theorems are related, and I imagine with more effort I could see how to prove the older theorem from the newer ones, but it’s not immediately obvious.

A variation on Rock, Paper, Scissors

Imagine in a game of Rock, Paper, Scissors one player is free to play as usual but the other is required to choose each option the same number of times. That is, in 3n rounds of the game, the disadvantaged player much choose Rock n times, Paper n times, and Scissors n times.

Obviously the unrestricted player would be expected to win more games, but how many more? At least one, because the unrestricted player knows what the restricted player will do on the last round.

If n is large, then in the early rounds of the game it makes little difference that one of the players is restricted. The restriction doesn’t give the unrestricted player much exploitable information. But in the later rounds of the game, the limitations on the restricted player’s moves increase the other players chances of winning more.

It turns out that the if the unrestricted player uses an optimal strategy, he can expect to win O(√n) more rounds than he loses. More precisely, the expected advantage approaches 1.4658√n as n grows. You can find a thorough analysis of the problem in this paper.

Related posts

Up-down permutations

An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example

1, 5, 3, 4, 2

is an up-down permutation of 1, 2, 3, 4, 5 because

1 < 5 > 3 < 4 > 2.

Up-down permutations are also called alternating permutations for obvious reasons. The number of up-down permutations of a set of size n is often denoted A(n) because these permutations were first studied by Désiré André. It is also sometimes denoted En, where the E alludes to Euler.

André found the generating function for A(n):

\sum_{n=0}^\infty A(n) \frac{x^n}{n!} = \sec x + \tan x.

You could read this as saying sec(x) + tan(x) is the exponential generating function for A(n) or as saying sec(x) + tan(x) is the ordinary generating function for

P(n) =\frac{A(n)}{n!}

where P(n) is the proportion of all permutations of the digits 1 through n which are alternating permutations.

It’s amazing that up-down permutations have anything to do with trig functions, but they do.

Knowing the (exponential) generating function for A(n) lets us use generating function techniques to prove various things about these numbers. For example, we can use the generating function to discover their asymptotic behavior.

The secant and tangent functions are well behaved at 0, but both have a singularity at π/2. This means the generating function above is not just a formal power series but an analytic power series with radius of convergence π/2, the distance from the center of the power series to the closest singularity. It follows that

P(n) = {\cal O}\left( \left( \frac{2}{\pi} \right)^n \right)

and so the proportion of permutations which are alternating permutations decreases exponentially for large n.

Related posts

Ruzsa distance

A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group.

Subset difference

Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A B is defined the set

A - B = \{a - b \mid a \in A, b \in B \}

As is customary with Abelian groups, we denote the group operation by + and a b means the group operation applied to a and the inverse of b.

For example, let G be the group of integers mod 10. Let A = {1, 5} and B = {3, 7}. Then A B is the set {2, 4, 8}. There are only three elements because 1 − 3 and 5 − 7 are both congruent to 8.

Ruzsa distance

The Ruzsa distance between two subsets of an Abelian group is defined by

d(A,B) = \log \frac{|A-B|}{|A|^{1/2}\, |B|^{1/2}}

where |S| denotes the number of elements in a set S.

The Ruzsa distance is not a metric, but it fails in an unusual way. the four axioms of a metric are

  1. d(x, x) = 0
  2. d(x, y) > 0 unless x = y
  3. d(x, y) = d(y, x)
  4. d(x, z) ≤ d(x, y) + d(y, z)

The first axiom is usually trivial, but it’s the only one that doesn’t hold for Ruzsa distance. In particular, the last axiom, the triangle inequality, does hold.

To see that the first axiom does not always hold, lets again let G be the integers mod 10 and let A = {1, 3}. Then A A is the set {0, 2, 8} and d(A, A) = log 3/2 > 0.

Sometimes d(A, A) does equal zero. If A = G then A A = A, and so d(A, A) = log 1 = 0.

Entropic Ruzsa distance

If we switch from subsets of G to random variables taking values in G we can define an analog of Ruzsa distance between random variables X and Y, the entropic Ruzsa distance

d_{\text{ent}}(X, Y) = H(X' + Y') - \frac{1}{2}H(X) - \frac{1}{2}H(Y)

where X′ and Y′ are independent copies of X and Y and H is Shannon entropy. For more on entropic Ruzsa distance, see this post by Terence Tao.

Note that if A and B are subsets of G, and X and Y are uniform random variables with support on A and B respectively, then the negative terms above correspond to the log of 1/|A|½ |B|½.  The H(X′ + Y′) term isn’t the log of |AB| though because for one thing its a sum rather than a difference. For another, the sum of uniform random variables may not be uniform: there may be more than one way to end up at a particular sum, and so sum values will have higher probability.

 

The 10th Dedekind number

The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be

M(9) = 286386577668298411128469151667598498812366.

The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, and M(3) = 20. But as n increases, M(n) increases rapidly, with M(9) being on the order of 1041.

Although computing the Dedekind numbers exactly is difficult—M(8) was computed in 1991 and M(9) now in 2023—there is an explicit formula for these numbers, and much is known about their asymptotic growth. This post speculates about what M(10) might be.

Write the number k in binary and let bik be its ith bit:

b_i^k=\left\lfloor\frac{k}{2^i}\right\rfloor - 2\left\lfloor\frac{k}{2^{i+1}}\right\rfloor

Then the nth Dedekind number is given by

M(n)=\sum_{k=1}^{2^{2^n}} \prod_{j=1}^{2^n-1} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)

and so

M(10)=\sum_{k=1}^{2^{1024}} \prod_{j=1}^{1023} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)

In principle, all you have to do to compute M(10) is evaluate the sum above. However, since this sum has more than 10308 terms, it would take a while.

What can we say about M(10) without computing it? The number of monotone Boolean functions of n variables is less than the total number of Boolean functions of n variables, which equals

2^{2^n}

That tells us M(10) < 1.8 × 10308.

There are more useful bounds. It has been proven that

{n\choose \lfloor n/2\rfloor}\le \log_2 M(n)\le {n\choose \lfloor n/2\rfloor}\left(1+O\left(\frac{\log n}{n}\right)\right)

This gives us a definite lower bound but not a definite upper bound. We know M(10) ≥ 2252 which is approximately 7.237 × 1075, but we don’t know what the big-O term is. All we know is that for sufficiently large n, this term is smaller than some multiple of log(n)/n. How large does n need to be and what is this constant? I don’t know. Maybe researchers in this area have some partial results.

Let’s take a guess at the upper bound by seeing what the big-O term was for M(9). Find k such that

\log_2 M(9) = \binom{9}{4}\left(1 + k \frac{\log 9}{9}\right)

We get

k = \left(\frac{\log_2M(9)}{126} - 1 \right)\frac{9}{\log 9} \approx 0.3809

and we can use this to guess that

\log_2 M(10) \stackrel{?}{=} \binom{10}{5}\left(1 + 0.3809 \frac{\log 10}{10}\right) \approx 274.1

which would imply M(10) = 3.253 × 1082.

So to recap, we know for certain that M(10) is between 7.237 × 1075 and 1.8 × 10308, and our guess based on the heuristic above is that M(10) = 3.253 × 1082.

Computing Stirling numbers with limited integers

A couple days ago I wrote a post about a probability problem that involved calculating Stirling numbers. There are two kinds of Stirling numbers, creatively called “Stirling numbers of the first kind” and “Stirling numbers of the second kind.” The second kind come up more often in application, and so when authors say “Stirling numbers” without qualification, they mean the second kind. That’s the convention I’ll use here.

The Stirling number S(n, k) can be computed via the following series, though we will take a different approach for reasons explained below.

S(n,k) = \frac{1}{k!}\sum_{i = 0}^k (-1)^i {k \choose i} (k-i)^n

Problem statement

This made me think of the following problem. Suppose you want to calculate Stirling numbers. You want to use integer arithmetic in order to get exact results.

And suppose you can use integers as large as you like, but you have to specify the maximum integer size before you start calculating. Imagine you have to pay $N to work with N-digit integers, so you want to not use a larger value of N than necessary.

Intermediate results

There are a couple problems with using the equation above. Direct implementation of the equation would first calculate k! S(n, k) first, then divide by k! to get S(n, k). This would be costly because it would require calculating a number k! times bigger than the desired final result. (You could refactor the equation so that there is no division by k! at the end, but this gives an expression that includes non-integer terms.)

In the earlier post, I actually needed to calculate k! S(n, k) rather than S(n, k) and so this wasn’t a problem. This brings up the second problem with the equation above. It’s an alternating series, and it’s not clear whether evaluating the series produces intermediate results that are larger than the final result.

Recursion

Stirling numbers can be computed recursively using

S(n+1, k) = k S(n,k) + S(n, k-1)

with the base cases S(n, n) = 1 for n ≥ 0 and S(n, 0) = S(0, n) for n > 0. This computes Stirling numbers via a sum of smaller Stirling numbers, and so the intermediate results are no larger than the final result.

So now we’re left with the question of how big Stirling numbers can be. Before computing S(n, k) you need an upper bound on how many digits it will have.

Bounding Stirling numbers

Stirling numbers can be bound in terms of binomial coefficients.

S(n, k) \leq \frac{1}{2} \binom{n}{k}k^{n-k}

This reduces the problem to finding an upper bound on binomial coefficients. I wrote about a simple bound on binomial coefficients here.

Hypergeometric distribution symmetry

One of these days I’d like to read Feller’s probability book slowly. He often says clever things in passing that are easy to miss.

Here’s an example from Feller [1] that I overlooked until I saw it cited elsewhere.

Suppose an urn contains n marbles, n1 red and n2 black. When r marbles are drawn from the urn, the probability that precisely k of the marbles are red follows a hypergeometric distribution.

\frac{\binom{n_1}{k}\binom{n-n_1}{r-k}} {\binom{n}{r}}

Feller mentions that this expression can be rewritten as

 \frac{\binom{r}{k} \binom{n-r}{n_1 - k}} { \binom{n}{n_1} }

What just happened? The roles of the total number of red marbles n1 and the sample size r, have been swapped.

The proof is trivial—expand both expressions using factorials and observe that they’re equal—but that alone doesn’t give much insight. It just shows that two jumbles of symbols represent the same quantity.

The combinatorial interpretation is more interesting and more useful. It says that you have two ways to evaluate the probability: focusing on the sample or focusing on the population of red marbles.

The former perspective says we could multiply the number of ways to choose k marbles from the supply of n1 red marbles and rk black marbles from the supply of nn1 black marbles, then divide by the number of ways to choose a sample of r marbles from an urn of n marbles.

The latter perspective says that rather than partitioning our sample of r marbles, we could think of partitioning the population of red marbles into two groups: those included in our sample and those not in our sample.

I played around with adding color to make it easier to see how the terms move between the two expressions. I colored the number of red marbles red and the sample size blue.

\[ \frac{\binom{{\color{red} n_1}}{k}\binom{n-{\color{red} n_1}}{{\color{blue} r}-k}} {\binom{n}{{\color{blue}r}}} = \frac{\binom{{\color{blue} r}}{k} \binom{n-{\color{blue} r}}{{\color{red} n_1} - k}} { \binom{n}{{\color{red} n_1}} } \]

Moving from left to right, the red variable goes down and the blue variable goes up.

Related posts

[1] William Feller. An Introduction to Probability Theory and Its Applications. Volume 1. Third edition. Page 44.

Shuffle product

The shuffle product of two words, w1 and w2, written

w1 Ш w2,

is the set of all words formed by the letters in w1 and w2, preserving the order of each word’s letters. The name comes from the analogy with doing a riffle shuffle of two decks of cards.

For example, bcd Ш ae, the shuffle product of bcd and ae, would be all permutations of abcde in which the consonants appear in alphabetical order and the vowels are also in alphabetical order. So abecd and baecd would be included, but badec would not be because the d and c are in the wrong order.

Side note on Ш

Incidentally, the symbol for shuffle product is the Cyrillic letter sha (Ш, U+0428), the only Cyrillic letter commonly used in mathematics, at least internationally. Presumably Russian mathematicians use other Cyrillic letters, but the only Cyrillic letter an American mathematician, for example, is likely to use is Ш.

The uses of Ш that I’m aware of are the Dirac comb distribution, the Tate–Shafarevich group, and the shuffle product.

Duplicate letters

What is the shuffle product of words containing duplicate letters? For example, what about the shuffle product of bread and crumb? Each word contains an r. The shuffle product, defined above as a set, doesn’t distinguish between the two rs. But another way to define the shuffle product is as a formal sum, with coefficients that count duplicates.

Imagine coloring the letters in abc blue and the letters in cde red. Then abccde and abccde would count as two different possibilities, one with blue c followed by red c, and one the other way around. This term in the formal sum would be 2abccde, the two capturing that there are two ways to arrive at this word.

You could also have duplicate letters within a single word. So in banana, for example, you could imagine coloring each a a different color and coloring the two ns different colors.

Mathematica code

This page gives an implementation of the shuffle product in Mathematica.

shuffleW[s1_, s2_] := 
     Module[{p, tp, ord}, 
          p = Permutations@Join[1 & /@ s1, 0 & /@ s2]\[Transpose];
          tp = BitXor[p, 1];
          ord = Accumulate[p] p + (Accumulate[tp] + Length[s1]) tp;
          Outer[Part, {Join[s1, s2]}, ord, 1][[1]]\[Transpose]]

This code takes two lists of characters and returns a list of lists of characters. You can use this to compute both senses of the shuffle product. For example, let’s compute abc Ш ac.

The Mathematica command

    shuffleW[{a, b, c}, {a, c}]

returns a list of 10 lists:

 
    {{a, b, c, a, c}, {a, b, a, c, c}, {a, b, a, c, c}, 
     {a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c}, 
     {a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c}, 
     {a, c, a, b, c}}

If we ask for the union of the set above with Union[%] we get

    {{a, a, b, c, c}, {a, a, c, b, c}, {a, b, a, c, c}, 
     {a, b, c, a, c}, {a, c, a, b, c}}

So using the set definition, we could say

abc Ш ac = {aabcc, aacbc, abacc, abcac, acabc}.

Using the formal sum definition we could say

abc Ш ac = 4aabcc + 2aacbc + 2abacc + abcacacabc.

Related posts

Reverse engineering options

There are 221,184 ways to have a Whopper. You Rule.

This weekend I saw a sign in the window of a Burger King™ that made me think of an interesting problem. If you know the number of possibilities like this, how would you reverse engineer what the options that created the possibilities?

In the example above, there are 211,184 = 213×33 possible answers, and so most likely there are 13 questions that have a yes or no answer, and 3 questions that have 3 possible answers each. But there are other possibilities in principle, a lot of other possibilities.

Maybe there are 15 questions: 1 with 4 possible answers, 12 with 2 possible answers, and 3 with 3 possible answers. Or maybe there are again 15 questions, but one of the questions has 9 answers. If there are 14 questions, maybe one question has 8 answers, or 27 answers, or maybe two questions have 6 answers, … it gets complicated.

More general problem

How would you address this kind of problem in general?

Suppose there is some list of questions Q1, Q2, …, Qk where there are qi possible answers to question Qi. Then the product of all the qi equals the total number of possible responses P to the questions. Note that we don’t know the number of questions k; all we know is the product P. Let’s call the ordered list {q1, q2, …, qk} of number of responses to questions a survey.

There are as many possible surveys as there are multiplicative partitions of P. The number of possible surveys only depends on the exponents in the prime factorization of P, not on the prime factors themselves. It would make no difference if in the example above we had 513×73 possibilities rather than 213×33 possibilities. We’d reason the same way when counting the possible surveys. The number of questions is somewhere between 1 and the sum of the exponents in the prime factorization.

We have to decide how we want to count surveys. For example, if we do have 13 questions with binary answers and 3 questions with trinary answers, does it matter which questions have three answers? If it does, we multiply the number of possibilities by the binomial coefficient C(16, 3) because we choose which of the 16 total questions have three possible answers. And then we have to reason about the case of 15 questions, 14 questions, etc.

Multiplicative partitions are complicated, and its not possible to write down simple expressions for the number of multiplicative partitions of numbers with many prime factors. But in the special case of only two distinct prime factors, i.e.

P = p_1^i \, p_2^j

then there is a relatively simple expression for the total number of possible surveys, assuming order matters [1]:

2^{i+j-1} \sum_{k=0}^j {i \choose k}{j \choose k} 2^{-k}

This works out to 3,760,128 in the example above. Curiously,

3760128 = 221184 × 17.

But note that our result does not depend on the primes in the factorization of P, only on their exponents. There are no p‘s in the summation above, only i‘s and j‘s. As we noted earlier, we’d come to the same count if we’d started with P = 513×73. The factors of 2 and 3 in our count arrive for different reasons than being the primes in the factorization of P.

Related posts

[1] A. Knopfmacher and M. E. Mays. A survey of factorization counting functions, International Journal of Number Theory. Vol. 01, No. 04, pp. 563-581 (2005)

Design of experiments and design theory

Design of experiments is a branch of statistics, and design theory is a branch of combinatorics, and yet they overlap quite a bit.

It’s hard to say precisely what design theory is, but it’s consider with whether objects can be arranged in certain ways, and if so how many ways this can be done. Design theory is pure mathematics, but it is of interest to people working in ares of applied mathematics such as coding theory and statistics.

Here’s a recap of posts I’ve written recently related to design of experiments and design theory.

Design of Experiments

A few weeks ago I wrote about fractional factorial design. Then later I wrote about response surface models. Then a diagram from central composite design, a popular design in response surface methodology, was one the diagrams in a post I wrote about visually similar diagrams from separate areas of application.

I wrote two posts about pitfalls with A/B testing. One shows how play-the-winner sequential testing has the same problems as Condorcet’s voter paradox, with the order of the tests potentially determining the final winner. More seriously, A/B testing cannot detect interaction effects which may be critical.

ANSI and Military Standards

There are several civilian and military standards related to design of experiments. The first of these was MIL-STD-105. The US military has retired this standard in favor of the civilian standard ASQ/ANSI Z1.4 which is virtually identical.

Similarly, the US military standard MIL-STD-414 was replaced by the very similar civilian standard ASQ/ANSI Z1.9. This post looks at the mean-range method for estimating variation which these two standards reference.

Design Theory

I wrote a couple posts on Room squares, one on Room squares in general and one on Thomas Room’s original design now known as a Room square. Room squares are used in tournament designs.

I wrote a couple posts about Costas arrays, an introduction and a post on creating Costas arrays in Mathematica.

Latin Squares

Latin squares and Greco-Latin squares a part of design theory and a part of design of experiments. Here are several posts on Latin and Greco-Latin squares.