Moment generating functions and connections to other things

This post relates moment generating functions to the Laplace transform and to exponential generating functions. It also brings in connections to the z-transform and the Fourier transform.

Thanks to Brian Borchers who suggested the subject of this post in a comment on a previous post on transforms and convolutions.

Moment generating functions

The moment generating function (MGF) of a random variable X is defined as the expected value of exp(tX). By the so-called rule of the unconscious statistician we have

M_X(t) \equiv \mathrm{E}[e^{tX}] = \int_{-\infty}^\infty e^{tx} f_X(x)\, dx

where fX is the probability density function of the random variable X. The function MX is called the moment generating function of X because it’s nth derivative, evaluated at 0, gives the nth moment of X, i.e. the expected value of Xn.

Laplace transforms

If we flip the sign on t in the integral above, we have the two-sided Laplace transform of fX. That is, the moment generating function of X at t is the two-sided Laplace transform of fX at −t. If the density function is zero for negative values, then the two-sided Laplace transform reduces to the more common (one-sided) Laplace transform.

Exponential generating functions

Since the derivatives of MX at zero are the moments of X, the power series for MX is the exponential generating function for the moments. We have

M_X(t) = m_0 + m_1t + \frac{m_2}{2}t^2 + \frac{m_3}{3!} t^3 + \cdots

where mn is the nth moment of X.

Other generating functions

This terminology needs a little explanation since we’re using “generating function” two or three different ways. The “moment generating function” is the function defined above and only appears in probability. In combinatorics, the (ordinary) generating function of a sequence is the power series whose coefficient of xn is the nth term of the sequence. The exponential generating function is similar, except that each term is divided by n!. This is called the exponential generating series because it looks like the power series for the exponential function. Indeed, the exponential function is the exponential generating function for the sequence of all 1’s.

The equation above shows that MX is the exponential generating function for mn and the ordinary generating function for mn/n!.

If a random variable Y is defined on the integers, then the (ordinary) generating function for the sequence Prob(Yn) is called, naturally enough, the probability generating function for Y.

The z-transform of a sequence, common in electrical engineering, is the (ordinary) generating function of the sequence, but with x replaced with 1/z.

Characteristic functions

The characteristic function of a random variable is a variation on the moment generating function. Rather than use the expected value of tX, it uses the expected value of itX. This means the characteristic function of a random variable is the Fourier transform of its density function.

Characteristic functions are easier to work with than moment generating functions. We haven’t talked about when moment generating functions exist, but it’s clear from the integral above that the right tail of the density has to go to zero faster than ex, which isn’t the case for fat-tailed distributions. That’s not a problem for the characteristic function because the Fourier transform exists for any density function. This is another example of how complex variables simplify problems.

Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

    from sympy.ntheory import mobius, divisors

    def num_primitive(n):
        return sum( mobius(n/d)*2**d for d in divisors(n) )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

    from sympy.ntheory import factorint

    def mobius(n):
        exponents = factorint(n).values()
        lenexp = len(exponents)
        m = 0 if lenexp == 0 else max(exponents)
        return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

Related: Applied number theory

Making change

How many ways can you make change for a dollar? This post points to two approaches to the problem, one computational and one analytic.

SICP gives a Scheme program to solve the problem:

(define (count-change amount) (cc amount 5))

(define (cc amount kinds-of-coins)
    (cond ((= amount 0) 1)
    ((or (< amount 0) (= kinds-of-coins 0)) 0)
    (else (+ (cc amount
                 (- kinds-of-coins 1))
             (cc (- amount
                    (first-denomination
                     kinds-of-coins))
                     kinds-of-coins)))))

(define (first-denomination kinds-of-coins)
    (cond ((= kinds-of-coins 1) 1)
          ((= kinds-of-coins 2) 5)
          ((= kinds-of-coins 3) 10)
          ((= kinds-of-coins 4) 25)
          ((= kinds-of-coins 5) 50)))

Concrete Mathematics explains that the number of ways to make change for an amount of n cents is the coefficient of zn in the power series for the following:

\frac{1}{(1 - z)(1 - z^5)(1 - z^{10})(1 - z^{25})(1 - z^{50})}

Later on the book gives a more explicit but complicated formula for the coefficients.

Both show that there are 292 ways to make change for a dollar.

Counting magic squares

How many k × k magic squares are possible? If you start from a liberal definition of magic square, there’s an elegant result. For the purposes of this post, a magic square is a square arrangement of non-negative numbers such that the rows and columns all sum to the same non-negative number m called the magic constant. Note that this allows the possibility that numbers will be repeated, and this places no restriction on the diagonals.

With this admittedly non-standard definition, the number of k × k magic squares with magic constant m is always a polynomial in m of degree no more than (k − 1)2. For k = 3, the result is

(m + 1)(m + 2)(m2 + 3m + 4)/8

There is no general formula for all k, but there is an algorithm for finding a formula for each value of k.

Source: The Concrete Tetrahedron

Update: I had reported the polynomial degree as k + 1, but looking back at Concrete Tetrahedron, that book lists the order as (k + 1)2. However, the paper cited in the comments lists the exponent as (k − 1)2, which I believe is correct.

More on magic squares