Multiplication theorem rabbit hole

When I started blogging I was reluctant to allow comments. It seems this would open the door to a flood of spam. Indeed it does, but nearly all of it can be filtered out automatically.

The comments that aren’t spam have generally been high quality. A comment on my post about the sawtooth function a few days ago has sent me down a fascinating rabbit hole.The sawtooth function belongs to a general class of functions called replicative functions, functions that satisfy the equation

f(nx) = \sum_{i=0}^{n-1} f\left(x + \frac{i}{n} \right )

for positive integers n.

I ran across this by reading Knuth’s TAOCP. Knuth gives an understated comment in section 1.2.4 exercise 40 that “it may be interesting to study also the more general class of functions” for which the left-hand side above includes constants that depend on n, i.e.

a_n f(nx) + b_n = \sum_{i=0}^{n-1} f\left(x + \frac{i}{n} \right )

where the constants an and bn depend on n but not on x.

Bernoulli polynomials

The first Bernoulli polynomial B1(x) = x – ½ is a replicative function, and the only continuous replicative functions are multiples of B1 (proved in exercise 40 referenced above).

The rest of the Bernoulli polynomials Bm(x) are generalized replicative functions, with an = nm-1 and bn = 0. This was discovered by Joseph Raabe in the nineteenth century.

Gamma and digamma functions

Gauss’s multiplication theorem for the gamma function says that for positive n,

This is equation 6.1.20 in A&S. If we take logs of both sides, we find that log Γ is a almost a generalized replicative function.

\log \Gamma(nz) = \frac{1-n}{2}\log(2\pi) + \left(nz - \frac{1}{2}\right)\log n + \sum_{k=0}^{n-1} \log \Gamma\left(z + \frac{k}{n} \right )

The factor of n log(n) z prevents the definition from holding, but if we take derivatives, this term goes away.

This means that the derivative of log Γ, a.k.a the ψ function, a.k.a. the digamma function, is a generalized replicative function with an = n and bn = –n log n.

The derivative of a generalized replicative function is another function in the same class, so the higher derivatives of log Γ, known as the polygamma functions, are also generalized replicative functions.

Hurwitz zeta function

The Hurwitz zeta function ζ(s, z) is defined by

\zeta(s, z) = \sum_{k=0}^\infty \frac{1}{(z+k)^s}

The Riemann zeta function is the special case of the Hurwitz zeta function with z = 1.

The Hurwitz zeta function is related to the polygamma functions by

\psi^{(m)}(z)= (-1)^{m+1} m! \zeta (m+1,z)

It follows from this that for fixed s, ζ(s, z) is a generalized replicative function of z. You can show that an = ns and bn = 0.

Trig functions

See the next post for a couple examples involving trig functions: cot πx and csc² πx .

Related posts

Gamma and the Pi function

The gamma function satisfies

Γ(n+1) = n!

for all non-negative integers n, and extends to an analytic function in the complex plane with the exception of singularities at the non-positive integers [1]. Incidentally, going back to the previous post, this is an example of a theorem that would have to be amended if 0! were not defined to be 1.

Wouldn’t it be simpler if the gamma function were defined so that it’s value at n, not at n+1, extended factorial? Well, some things would be simpler, but other things would not.

The Pi function defined by

Π(z) = Γ(z + 1)

extends factorial with no extra factor of 1 to keep up with, and some theorems are simpler to state in terms of the Pi function than the gamma function. For example, it’s simpler to say that the volume of a unit n-sphere is

(π/2)n/2 / Π(n/2)

than to say it’s

(π/2)n/2 / Γ(n/2 + 1).

The former has an easy-to-remember form, with lower case π on top and capital π on bottom.

The reflection identity is also a little simpler in the form

Π(z) Π(-z) = 1/sinc(z)

than in the form

Γ(z) Γ(1-z) = π / sin(πz)

The drawback to the former is that you have to remember to use the convention

sinc(z) = sin(πz) / πz

because some sources define sinc with the factor of π and some without. Neither convention makes a large majority of theorems simpler, so there’s no clear winner. [2]

Fortunately the Pi function has a different name and isn’t an alternative convention for defining the gamma function. That would be terribly confusing. [3]

Related posts

[1] The Pi function is the unique way to extend factorial to an analytic function, given some regularity assumptions. See Wielandt’s theorem and the Bohr-Mollerup theorems.

[2] Fourier transforms may be the worst example in mathematics of no clear convention winner. Multiple conventions thrive because each makes some some things easier to state. (See this page for a sort of Rosetta stone translating between the various Fourier transform conventions.) In fact, the lack of standardization for the sinc function is related to the lack of standardization for the Fourier transform: you’d like to define the sinc function so that it has the simplest Fourier transform under your convention.

[3] I’m fuzzy on the history of all this, but I think what we now call the Pi function was an alternative definition of the gamma function briefly, but fortunately that got ironed out quickly.

Why is it defined that way?

There are numerous conventions in mathematics that student continually question.

  • Why isn’t 1 a prime number?
  • Why is 0! defined to be 1?
  • Why is an empty sum 0 and an empty product 1?
  • Why can’t you just say 1/0 = ∞?
  • Etc.

There are good reasons for the existing conventions, and they usually boil down to this: On the whole, theorems are more simply stated with these conventions than with the alternatives. For example, if you defined 0! to be some other value, say 0, then there would be countless theorems that would have to be amended with language of the form “… except when n is zero, in which case …”

In short, the existing conventions simplify things more than they complicate them. But that doesn’t mean that everything is simpler under the standard conventions. The next post gives an example along these lines.

Related posts

Sawtooth and replicative functions

Here’s something I ran across in Volume 2 of Donald Knuth’s The Art of Computer Programming.

Knuth defines the sawtooth function by

((x)) = x – (⌊x⌋ + ⌈x⌉)/2.

Here’s a plot.

This is an interesting way to write the sawtooth function, one that could make it easier to prove things about, such as the following theorem. For positive integers n, we have

((nx)) = ((x)) + \left(\left(x + \frac{1}{n}\right)\right) +\cdots+ \left(\left(x + \frac{n-1}{n}\right)\right)

It’s not at all obvious, at least to me, that this should be true.

Knuth does not give a proof but leaves the proof to Exercise 2 in Section 3.3.3. I suppose you could prove it by slugging it out with floor and ceiling identities, but I looked at the solutions to see whether Knuth has a more elegant proof.

The solution for Exercise 2 says to see exercises 38 and 39 in Volume 1, section 1.2.4. These exercises break the problem of proving the theorem above into smaller parts. They also generalize the problem, defining a function f to be replicative if it satisfies the equation

f(nx) = \sum_{i=0}^{n-1} f\left(x + \frac{i}{n} \right )

The theorem above says that the sawtooth function is replicative. Other examples of replicative functions include the floor function and the function that takes x to x – ½.

Related posts

Aquinas on epicycles

C. S. Lewis quotes Thomas Aquinas in The Discarded Image:

In astronomy an account is given of eccentricities and epicycles on the ground that if their assumption is made the sensible appearances as regards celestial motion can be saved. But this is not a strict proof since for all we know they could also be saved by some different assumption.

Estimating normal tail extreme probabilities

In the previous post I said the probability of a normal distribution being 50 standard deviations from its mean was absurdly small, but I didn’t quantify it.

You can’t simply fire up something like R and ask for the probability. The actual value is smaller than can be represented by a floating point number, so you’ll just get zero.

Though there’s no practical reason to compute the probability of such events, we’ll do it anyway. The technique shown here is useful for less extreme cases that could be practical.

Let Z be a normal random variable with mean 0 and standard deviation 1. We need to estimatethe complementary CDF [1], given by

 \Phi^c(t) = P(Z > t) = \frac{1}{\sqrt{2\pi}} \int_t^\infty e^{-x^2/2}\, dx.

for t = 50. This post gives us the upper and lower bounds we need:

\frac{t}{t^2 +1} < \sqrt{2\pi} e^{t^2/2} \Phi^c(t) < \frac{1}{t}.

The same post also gives tighter bounds, which are useful for less extreme cases, but not necessary here.

The bounds above tell us the probability of Z taking on a value of 50 or more is approximately

\frac{e^{-1250}}{50\sqrt{2\pi}}

We’d like to write this value in scientific notation. We can’t directly evaluate it with most software because, as we said above, it’s too small. If you were to type exp(-2500) in Python, for example, it would return exactly 0. We can, however, evaluate the log base 10 of the expression above using Python.

    >>> from math import exp, pi, log10
    >>> -1250*log10(exp(1)) - 0.5*log10(2*pi) - log10(50)
    -544.9661623175799

So our probability is about 10-545. This is far smaller than the probability of selecting a particular elementary particle at random from the universe. (See here.)

Related posts

[1] Numerical software packages will provide functions for CDFs and CCDFS, e.g. for Φ and Φc = 1 – Φ. This may seem unnecessary, and it would be if we could compute with infinite precision. But it is possible, and often necessary, to compute Φc(x) for values of x so large that all precision would be lost when evaluating 1 – Φ(x). For less extreme values of x we wouldn’t lose all precision, but we’d lose some. See the discussion of erf and erfc and other functions that seem redundant here.

50 sigma events for t distributions

I had a recent discussion with someone concerning a 50 sigma event, and that conversation prompted this post.

When people count “sigmas” they usually have normal distributions in mind. And six-sigma events so rare for normal random variables that it’s more likely the phenomena under consideration doesn’t have a normal distribution. As I explain here,

If you see a six-sigma event, that’s evidence that it wasn’t really a six-sigma event.

If six-sigma events are hard to believe, 50-sigma events are absurd. But not if you change the distribution assumption.

Chebyshev’s inequality says that if a random variable X has mean μ and variance σ², then

Prob( |X – μ| ≥ kσ) ≤ 1/k².

Not only that, it is possible to construct a random variable so that the inequality above is tight, i.e. it is possible to construct a random variable X such that the probability of it being 50 standard deviations away from its mean equals 1/2500. However, this requires constructing a custom random variable just to make equality hold. No distribution that you run into in the wild will have such a high probability of being that far out from its mean.

I wondered what the probability of a 50-sigma event would be like with a random variable with such a fat tail that it barely has a finite variance. I chose to look at Student t distributions because the variance of a Student t distribution with ν degrees of freedom is ν/(ν – 2), so the variance blows up as ν decreases toward 2. The smaller v us, the fatter the tails.

The probability of a t random variable with 3 degrees of freedom being 50 sigmas from its mean is about 3.4 × 10-6. The corresponding probability for 30 degrees of freedom is 6.7 × 10-31. So the former is rare but plausible, but the latter is so small as to be hard to believe.

This suggests that 50-sigma events are more likely with low degrees of freedom. And that’s sorta true, but not entirely. For 2.001 degrees of freedom, the probability of a 50-sigma event is 2 × 10-7, i.e. the probability is smaller for ν = 2.001 than for ν = 3.

Here’s a plot of the probability of 50-sigma events for a t distribution with ν degrees of freedom.

Apparently the probability peaks somewhere around ν = 2.25 then decreases exponentially as ν increases.

This isn’t what I expected, but it makes sense after the fact. As ν increases, the tails get thinner, but σ also gets smaller. We have two competing effects. For ν less than 2.25 the size of σ matters more. For larger values of ν, the thinness of tails matters more.

Guide to the recent flurry of posts

I wrote six blog posts this weekend, and they’re all related. Here’s how.

Friday evening I wrote a blog post about a strange random number generator based on factorials. The next day my electricity went out, and that led me to think how I would have written the factorial RNG post without electricity. That led to two threads: interpolation in tables and calculations with floors.

The interpolation thread has two posts. The first looks at error estimates for polynomial interpolation, and shows that numerical error can easily dominate approximation error in interpolation. The second looks shows that interpolation for tables of sines and cosines can be better than interpolation in general.

The floor calculation thread first lead to this post which states a useful theorem from Concrete Mathematics and uses a slight generalization of the theorem to justify a numerical calculation. Then a comment on an earlier post led to a new post giving simple and tight bounds on the number of trailing zeros in a factorial.

You can read each of these posts by itself, but I thought some people would appreciate seeing how they fit together.

Trailing zeros of factorials, revisited

I needed to know the number of trailing zeros in n! for this post, and I showed there that the number is

\sum_{i=1}^\infty \left\lfloor \frac{n}{5^i} \right\rfloor

Jonathan left a comment in a follow-up post giving a brilliantly simple approximation to the sum above:

… you can do extremely well when calculating the number of trailing 0’s by dropping the floor functions and using all the terms of the infinite series, which leaves you with just n/4.

In symbols,

And in case n is not a multiple of 4, we can take the floor of n/4 as our estimate.

For example, let n = 8324228646. Then the exact number of trailing zeros in n! is 2081057156, and ⌊n/4⌋ = 2081057161 is only off by 5.

The approximation above is also an upper bound since removing a floor either makes no difference or makes things bigger.

We can make the upper bound a tiny bit tighter by noting that the infinite sum on the left is really a sum up to k = ⌊log5 n⌋ and so

\begin{align*} \sum_{i=1}^\infty \left\lfloor \frac{n}{5^i} \right\rfloor &= \sum_{i=1}^k \left\lfloor \frac{n}{5^i} \right\rfloor \\ &\leq \sum_{i=1}^k \frac{n}{5^i} \\ &= \frac{n}{4}(1 - 5^{-k}) \end{align*}

The end result never differs from n/4 by more than 1, so it’s hardly worth it. However, the calculation above can be modified slightly to get a lower bound.

Since ⌊x⌋ > x – 1, we have

\begin{align*} \sum_{i=1}^\infty \left\lfloor \frac{n}{5^i} \right\rfloor &= \sum_{i=1}^k \left\lfloor \frac{n}{5^i} \right\rfloor \\ &> \sum_{i=1}^k \left( \frac{n}{5^i} - 1 \right )\\ &= \frac{n}{4}(1 - 5^{-k}) -k \end{align*}

So the number of trailing zeros in n! is greater than

n(1 – 5k)/4 – k

and no more than

n(1 – 5k)/4.

Applying this to the example above where n = 8324228646, we have k = 14, and lower bound 2081057147 and upper bound 2081057161, The exact number is 2081057156. Our lower bound was 9 below the exact value, and the upper bound was 5 above the exact value.

Filling in gaps in a trig table

The previous post shows how you could use linear interpolation to fill in gaps in a table of logarithms. You could do the same for a table of sines and cosines, but there’s a better way. As before, we’ll assume you’re working by hand with just pencil, paper, and a handbook of tables.

Linear interpolation

Suppose you want to find the sine of 12.3456° and you have a table of sines for angles in increments of 0.1°. In Table 4.10 of A&S we find

sin 12.3° = 0.21303 03862 74977
sin 12.4° = 0.21473 53271  67063

If we were to use linear interpolation, we’d estimate

sin 12.3456° = sin 12.3° + 0.456(sin 12.4° – sin 12.3°) = 0.21380 78393 21768

which is accurate to six decimal places.

Better aproach

Another approach would be to use the identity

sin(θ + φ) = sin θ cos φ + cos θ sin φ

rather than linear interpolation, setting θ = 12.3° and φ = 0.0456°. We can look up the sine and cosine of θ in our table, but how do we find the sine and cosine of φ?

The cosine is easy: set it to 1. For a small angle x (in radians) the cosine of x is approximately 1 with an error of less than x²/2. In radians,

φ = 0.0456 π/180 = 0.00079 58701 38909

and so the truncation error in approximating cos φ with 1 is about 3×10-7.

Computing the sine of φ is easy, but it requires converting φ to radians. You could probably find the conversion factor in your handbook, e.g. in Table 1.1 of A&S.

0.0456° = 0.0456 × 0.01745 32925 19943

Once φ is in radians, sin φ = φ with an error of less than φ³/6 (see here).

Putting the pieces together we have

sin(θ + φ) = sin 12.3° × 1 + cos 12.3° × φ

which, using the numbers above, gives us 0.21380785249034476, which is off by about 6×10-8.

More accuracy

If we want even more accuracy, we need to find the weakest link in our calculation. The error in approximating sin φ as φ is on the order of φ³ while the error in approximating cos φ as 1 is on the order of φ², so the latter is the largest source of error.

If we approximate cos φ as 1 – φ²/2, the error is on the order of φ4, and the weakest link would be the sine approximation which has error on the order of φ³, which is still quite small. The overall error in computing sin 12.3456° would be less than 10-10 if we use this higher order approximation for cosine φ.

Compare and contrast

Let’s go back to approximating the cosine of a small angle by 1 and compare the two approximation approaches above.

Linear interpolation:

sin 12.3456° = sin 12.3° + 0.456(sin 12.4° – sin 12.3°)

Addition formula:

sin 12.3456° = sin 12.3° + 0.0456 (π/180) (cos 12.3°)

The the second terms in the two approaches are

0.0456(sin 12.4° – sin 12.3°)/0.1

and

0.0456 (π/180) (cos 12.3°).

The two are similar because

(sin 12.4° – sin 12.3°)/0.1 ≈ (π/180) (cos 12.3°).

The term on the left is the difference quotient for sine at 12.3° with step h = 0.1 and the term on the right is the derivative of sine at 12.3°.

Wait, isn’t the derivative of sine just cosine? It is when you’re working in radians, which is why calculus almost always uses radians, but when you work in degrees, the derivative of sine is π/180 times cosine.

What this shows is that if you approximate cosines of small angles as zero, the sum formula reduces to a one-term Taylor approximation.