# Quadruple factorials and Legendre functions

Last week I claimed that double, triple, and quadruple factorials came up in applications. The previous post showed how triple factorials come up in solutions to Airy’s equation. This post will show how quadruple factorials come up in solutions to Legendre’s equation.

Legendre’s differential equation is

The Legendre functions Pν and Qν are independent solutions to the equation for given values of ν.

When ν is of the form n + ½ for an integer n, the values of Pν and Qν at 0 involve quadruple factorials and also Gauss’s constant.

For example, if ν = 1/2, 5/2, 9/2, …, then Pν(0) is given by

Source:  An Atlas of Functions

# Triple factorials and Airy functions

Last week I wrote in a post on multifactorials in which I said that

Double factorials come up fairly often, and sometimes triple, quadruple, or higher multifactorials do too.

This post gives a couple examples of triple factorials in practice.

One example I wrote about last week. Triple factorial comes up when evaluating the gamma function at an integer plus 1/3. As shown here,

Another example involves solutions to Airy’s differential equation

One pair of independent solutions consists of the Airy functions Ai(x) and Bi(x). Another pair consists of the functions

given in A&S 10.4.3. Because both pairs of functions solve the same linear ODE, each is a linear combination of the other.

Notice that the numerators are triple factorials, and so the series above can be rewritten as

The next post gives an example of quadruple factorials in action.

# Gauss’s constant

I hadn’t heard of Gauss’s constant until recently. I imagine I’d seen it before and paid no attention. But once I paid attention, I started seeing it more often. There’s a psychology term for this—reticular activation?—like when you buy a green Toyota and suddenly you see green Toyotas everywhere.

Our starting point is the arithmetic-geometric mean or AGM. It takes two numbers, takes the arithmetic (ordinary) mean and the geometric mean, then repeats the process over and over. The limit of this process is the AGM of the two numbers.

Gauss’s constant is the reciprocal of the AGM of 1 and √2.

Gauss’s constant can be expressed in terms of the Gamma function:

## Exponential sums

Last week I wrote about the sum

which we will see shortly is related to Gauss’s constant.

We can the definition of g in terms of Γ(1/4) and the reflection identify for the gamma function to show that the sum above can be written in terms of Gauss’s constant:

The alternating version of the sum is also related to g:

## Integrals

Another place where g comes up is in integrals of hyperbolic functions. For example

Here

## Beta probabilities

Another place Gauss’s constant comes up is in special values of beta distribution probabilities.

Define the incomplete beta function by

If X is a random variable with a beta(a, b) distribution, then the CDF of X is the normalized incomplete beta function, i.e.

A couple special values of B involve Gauss’s constant, namely

and

Source:  An Atlas of Functions

## Ubiquitous constant

The constant M defined by 1/M = √2 g is the so-called “ubiquitous constant,” though this seems like an unjustified name for such an arcane constant. Perhaps there is some context in which the constant is ubiquitous in the colloquial sense.

# Multiple angle identity for cotangent

The previous post discusses generalized replicative functions, functions that satisfy

Donald Knuth says in Exercise 40 of Section 1.2.4 of TAOCP that the functions cot πx and csc² πx fall into this class of functions.

If this is true of cot πx then it follows by differentiation that it is true of csc² πx as well, so we will focus on the former.

I believe that for f(x) = cot(πx) we have an = n and bn = 0, but I don’t have a proof. If this is correct then

Here’s a Python script that draws values of n and x at random and verifies that the left and right sides above are equal to at least 10 decimal places. It’s not a proof, but it’s good evidence.

    import numpy as np

def cot(x): return 1/np.tan(x)

def f1(n, x): return n*cot(n*np.pi*x)

def f2(n, x): return sum(cot(np.pi*(x + k/n)) for k in range(n))

np.random.seed(20211011)
for _ in range(1000):
n = np.random.randint(1, 50)
x = np.random.random()
a, b = f1(n,x), f2(n, x)
assert( abs(a - b) <= abs(a)*1e-10 )


Note that NumPy doesn’t have a cotangent function. See this post for quirks of how trig functions are supported in various environments.

An actual proof is left as an exercise for the reader. Knuth left it as an exercise too, but he didn’t say what the a‘s are, so I’m being nicer than he was. :)

If you come up with a proof, please share it in the comments.

## Update

and substitute πx for x. Take the logarithm and then the derivative of both sides, divide by π, and you get the identity we’re trying to prove.

# Multiplication theorem rabbit hole

When I started blogging I was reluctant to allow comments. It seems this would open the door to a flood of spam. Indeed it does, but nearly all of it can be filtered out automatically.

The comments that aren’t spam have generally been high quality. A comment on my post about the sawtooth function a few days ago has sent me down a fascinating rabbit hole.The sawtooth function belongs to a general class of functions called replicative functions, functions that satisfy the equation

for positive integers n.

I ran across this by reading Knuth’s TAOCP. Knuth gives an understated comment in section 1.2.4 exercise 40 that “it may be interesting to study also the more general class of functions” for which the left-hand side above includes constants that depend on n, i.e.

where the constants an and bn depend on n but not on x.

## Bernoulli polynomials

The first Bernoulli polynomial B1(x) = x – ½ is a replicative function, and the only continuous replicative functions are multiples of B1 (proved in exercise 40 referenced above).

The rest of the Bernoulli polynomials Bm(x) are generalized replicative functions, with an = nm-1 and bn = 0. This was discovered by Joseph Raabe in the nineteenth century.

## Gamma and digamma functions

Gauss’s multiplication theorem for the gamma function says that for positive n,

This is equation 6.1.20 in A&S. If we take logs of both sides, we find that log Γ is a almost a generalized replicative function.

The factor of n log(n) z prevents the definition from holding, but if we take derivatives, this term goes away.

This means that the derivative of log Γ, a.k.a the ψ function, a.k.a. the digamma function, is a generalized replicative function with an = n and bn = –n log n.

The derivative of a generalized replicative function is another function in the same class, so the higher derivatives of log Γ, known as the polygamma functions, are also generalized replicative functions.

## Hurwitz zeta function

The Hurwitz zeta function ζ(s, z) is defined by

The Riemann zeta function is the special case of the Hurwitz zeta function with z = 1.

The Hurwitz zeta function is related to the polygamma functions by

It follows from this that for fixed s, ζ(s, z) is a generalized replicative function of z. You can show that an = ns and bn = 0.

## Trig functions

See the next post for a couple examples involving trig functions: cot πx and csc² πx .

# Gamma and the Pi function

The gamma function satisfies

Γ(n+1) = n!

for all non-negative integers n, and extends to an analytic function in the complex plane with the exception of singularities at the non-positive integers [1]. Incidentally, going back to the previous post, this is an example of a theorem that would have to be amended if 0! were not defined to be 1.

Wouldn’t it be simpler if the gamma function were defined so that it’s value at n, not at n+1, extended factorial? Well, some things would be simpler, but other things would not.

The Pi function defined by

Π(z) = Γ(z + 1)

extends factorial with no extra factor of 1 to keep up with, and some theorems are simpler to state in terms of the Pi function than the gamma function. For example, it’s simpler to say that the volume of a unit n-sphere is

(π/2)n/2 / Π(n/2)

than to say it’s

(π/2)n/2 / Γ(n/2 + 1).

The former has an easy-to-remember form, with lower case π on top and capital π on bottom.

The reflection identity is also a little simpler in the form

Π(z) Π(-z) = 1/sinc(z)

than in the form

Γ(z) Γ(1-z) = π / sin(πz)

The drawback to the former is that you have to remember to use the convention

sinc(z) = sin(πz) / πz

because some sources define sinc with the factor of π and some without. Neither convention makes a large majority of theorems simpler, so there’s no clear winner. [2]

Fortunately the Pi function has a different name and isn’t an alternative convention for defining the gamma function. That would be terribly confusing. [3]

## Related posts

[1] The Pi function is the unique way to extend factorial to an analytic function, given some regularity assumptions. See Wielandt’s theorem and the Bohr-Mollerup theorems.

[2] Fourier transforms may be the worst example in mathematics of no clear convention winner. Multiple conventions thrive because each makes some some things easier to state. (See this page for a sort of Rosetta stone translating between the various Fourier transform conventions.) In fact, the lack of standardization for the sinc function is related to the lack of standardization for the Fourier transform: you’d like to define the sinc function so that it has the simplest Fourier transform under your convention.

[3] I’m fuzzy on the history of all this, but I think what we now call the Pi function was an alternative definition of the gamma function briefly, but fortunately that got ironed out quickly.

# Computing ζ(3)

I’ve started reading Paul Nahin’s new book “In Pursuit of ζ(3).” The actual title is “In Pursuit of Zeta-3.” I can understand why a publisher would go with such a title, but I assume most people who read this blog are not afraid of Greek letters.

I’ve enjoyed reading several of Nahin’s books, so I was pleased to see he had a new one.

Nahin’s latest book is about attempts to find a closed-form expression for the sum

If you replace the “3” with “s” in the sum above, you have an expression for the Riemann zeta function ζ(s), and so ζ(3) is a convenient way of describing the sum and putting it in context [1]. There are closed-form expressions for ζ(s) when s is an even integer [2] but so far not nobody has found one for s = 3.

The value of ζ(3) is sometimes called Apéry’s constant because in 1978 Roger Apéry was the first to prove that it is an irrational number.

The names “ζ(3)” and “Apéry’s constant” are anachronisms because Euler studied the sum in the 18th century, but Riemann came along in the 19th century and Apéry in the 20th.

## Naive calculation

The most obvious way to compute ζ(3) numerically is to replace the upper limit of the infinite sum with a large number N. This works, but it’s very inefficient.

Let’s calculate the sum with upper limit 100 with a little Python code.

    >>> sum(n**-3 for n in range(1, 101))
1.2020074006596781


We can judge the accuracy of this approximation using the implementation of the zeta function in SciPy.

    >>> from scipy.special import zeta
>>> zeta(3)
1.2020569031595942


We can predict how accurate our sum will be by estimating the tail of the sum using the integral test as follows.

The integral on the left evaluates to N-2/2 and the integral on the right evaluates to (N+1)-2/2.

This says that if we sum up to N, our error will be between (N+1)-2/2 and N-2/2.

We can estimate from this that if we want d decimal places, we’ll need to sum 10d/2 terms. So to get 16 decimal places, we’d have to sum 100,000,000 terms. This is certainly not what the call to zeta(3) above is doing.

## A better approach

Nahin’s book derives an equation for ζ(3) by Ernst Kummer (1810–1893):

Because the denominator in the sum is a 5th degree polynomial, we get faster convergence than we do from directly evaluating the definition of ζ(3). We could find an interval around the error using the integral test above, and it would show that the error is O(N-4).

Let’s try this out in Python.

    >>> f = lambda n: (n**3 * (n**2-1))**-1
>>> 1.25 - sum(f(n) for n in range(2, 101))
1.2020569056101726
>>> _ - zeta(3)
2.4505784068651337e-09


This says we get 8 correct decimal places from summing 100 terms, which is just what we’d expect from our error estimate.

## Doing even better

Kummer’s approach is better than naively computing ζ(3) from its definition, but there are much more efficient methods.

With naive summation up to N, we get

2 log10 N

correct decimals. With Kummer’s sum we get twice as many,

4 log10 N

correct decimals.

But there are methods for which the number of correct decimals is linear in N rather than logarithmic. Wikipedia mentions a method that gives 5.04 N decimal places.

Update: See this post for a sequence of efficient algorithms.

## Related posts

[1] That’s how the Riemann zeta function is defined for s with real part greater than 1. For the rest of the complex plane, except s = 1, the Riemann zeta function is defined by analytic continuation. The sum is not valid for s with real part less than 1, and so, for example, the sum makes no sense at s = -1. But the analytic continuation of the sum is valid at -1, and equals -1/12 there. This leads to the rigorous justification of the otherwise nonsensical expression

1 + 2 + 3 + … = -1/12.

[2] For s = 2n where n is a positive integer,

For s = 0 and negative integers, we have

Here the B‘s are the Bernoulli numbers, which have closed-form expressions.

# Computing logs with the AGM

In the previous post I said that Jonathan and Peter Borwein figured out how to use the rapid convergence of the AGM to compute various functions, including logarithms. This post will show how to compute logarithms using the AGM.

First I need to define the Jacobi theta functions

Note that if q is very small, the series above converge very quickly. We will pick q so small that θ2(q) and θ3(q) are trivial to compute with sufficient accuracy.

Suppose we want to compute the natural log of 10 to a thousand digits. We can use the following equation from [1].

We want q to be small, so we want 1/q to be large. We set q = 10n to compute log 10n and divide the result by n.

We can show from the definition of the theta functions that

Since we want 1000 digit accuracy, we can set q = 10-250. Then θ22(q4) = 4×10-500 and θ32(q4) = 1 to about 2000 digits of accuracy.

We can calculate the result we want with just 20 iterations of the AGM as the following bc code shows.

    define agm(a, b, n) {
auto i, c, d
for (i = 1; i <= n; i++) {
c = (a + b)/2
d = sqrt(a*b)
a = c
b = d
}
return a
}

scale=2000
# a(1) = arctan(1) = pi/4
x = a(1)/(agm(4*10^-500, 1, 20)*250) - l(10) # error
l(x)/l(10) # log_10(error)


This returns -999.03…, so we were basically able to get 1000 digits, though the last digit or two might be dodgy. Let’s try again with a smaller value of q, setting 10-300.

    x = a(1)/(agm(4*10^-600,1, 20)*300) - l(10); l(x)/l(10)

This returns -1199.03… and so we have about 1200 digits. We were using 2000 digit precision in our calculations, but our approximation θ32(q4) = 1 wasn’t good to quite 2000 digits. Using a smaller value of q fixed that.

## Related posts

[1] The Borwein brothers, Pi and the AGM. arXiv

# The magic AGM box

Suppose you are visited by aliens from halfway across the galaxy. After asking you a lot of questions, they give you a parting gift, little black boxes can compute

xx²/2 + x³/3 – …

with unbelievable speed and accuracy. You say thank you and your visitors vanish.

You get back home and wonder what you can do with your black boxes. The series the boxes compute looks vaguely familiar, but you can’t remember what it is. You call up a friend and he tells you that it’s the Taylor series for log(1 + x).

OK, so now what?

Your friend tells you he can predict what the boxes will output. He tells you, for example, that if you enter 0.5 it will output 0.4054651081081644. You try it, and your friend is right. At least partly. If you ask your box to give you only 16 digits, it will give you exactly what your friend said it would. But you could ask it for a thousand digits or a million digits or a billion digits, something your friend cannot do, at least not quickly.

Then you realize your friend has things backward. The way to exploit these boxes is not to compute logs on your laptop to predict their output, but to use the boxes instead of your laptop.

So you have a way to compute logs. You can bootstrap that to compute inverse logs, i.e. exp(x). And you can bootstrap that to compute sines and cosines. You try to compute anything you can starting from logs.

## Enter the AGM

The preceding story was an introduction to the AGM, the arithmetic-geometric mean. It is the limit of alternatingly taking ordinary and geometric means. More on that here. What I want to focus on here is that the AGM can be computed extremely quickly.

Each iteration in the process of computing the AGM doubles the number of correct figures in the answer. Suppose you want to compute its output to a billion decimal places, and you’ve calculated a million decimal places. You need to compute 999,000,000 more decimal places, but you’re nearly there! Ten more steps and you’ll have all billion decimal places.

If you want to compute something to millions of digits, it would make sense to try to compute it in terms of the AGM. This was the research program of the brothers Jonathan and Peter Borwein. Much of this research was codified in their book Pi and the AGM. They used the AGM to compute π to crazy precision, but that wasn’t their goal per se.

Computing π was a demonstration project for a deeper agenda. While describing the work of the Borwein brothers, Richard Brent said

… theorems about π are often just the tips of “mathematical icebergs”—much of interest lies hidden beneath the surface.

The AGM of x and y equals

where K is the “complete elliptic integral of the first kind.” You might reasonably think “Great. I’ll keep that in mind if I ever need to compute the compute elliptic integral of the first kind, whatever that is.” But K is like the log function in the story above, something that can be bootstrapped to compute other things.

The aliens Gauss, Lagrange, and Legendre gave the Borweins the AGM black box, and the Borweins figured out how to use it to compute π, but also log, exp, cos, etc. The Borwein algorithms may not be the most efficient if you only want, say, 16 digits of precision. But as you need more precision, eventually they become the algorithms to use.

See the next post for an example using the AGM to compute logarithms to 1000 digits.

And see the one after that for a way to compute π with the AGM.

# Lambert W strikes again

I was studying a statistics paper the other day in which the author said to solve

t log( 1 + n/t ) = k

for t as part of an algorithm. Assume 0 < k < n.

## Is this well posed?

First of all, can this equation be solved for t? Second, if there is a solution, is it unique?

It’s not hard to show that as a function of t, the left side approaches 0 as t approaches 0, and it approaches n as t goes to infinity. So there is a solution for all k between 0 and n. The restriction on k is necessary since the left side cannot exceed n.

With a little more work one can show that the derivative is always positive, so the left side is a monotonically increasing function, and so the solution given each value of k is unique.

## Analytic solution

Now if we fix n, we can think of the equation above as defining t as a function of k. Can we solve for t exactly? I suspected that there might be a solution in terms of the Lambert W function because if you divide by t and exponentiate, you get an equation that smells like the equation

z = w exp(w)

defining the function W(z). It turns out this is indeed the case.

    Solve[t Log[1 + n/t] ==  k, t]

we get

Great! There’s a closed-form solution, if you accept using the W function as being closed form.

## Problems with the solution

I found the solution in Mathematica, but I tested it in Python to make sure both platforms define W the same way.

    from numpy import log, exp
from scipy.special import lambertw

def f(t, n):
return t*log(1 + n/t)

def g(k, n):
r = k/n
return -k/(lambertw(-r*exp(-r)) + r)

n, k = 10, 8
t = g(k, n)
print(f(t, n))


This should print k, so it prints 8, right? No, it prints 10.

What’s up with that?

If we look back at the equation for t above, we see that the W function is being evaluated at x exp(x) where x = –k/n, so we should get –k/n back since W(x exp(x)) = x by definition. But that means our denominator is zero, and so the equation doesn’t make sense!

Things are getting worse. At first we had a wrong value, but at least it was finite!

The problem is not a difference between Mathematica and Python.

## Resolution

The problem is we’ve glossed over a branch cut of the W function. To make a long story short, we were using the principle branch of the W function, but we should have used a different branch.

Let’s go back to where I asked Mathematica

    Solve[t Log[1 + n/t] == k, t]

I ran the solution through TeXForm to get the TeX code that produced the image for the solution equation. I made a few aesthetic changes to the TeX, but it was essentially Mathematica’s output.

Without the TeXForm, Mathematica’s solution was in terms of ProductLog, not in terms of W; the TeXForm function turned ProductLog into W. If you look up ProductLog, it will tell you

ProductLog[z] gives the principal solution for w in z = wew.

The principle solution. So we should be on the alert for difficulties with branches. There are two real solutions to z = wew for some values of z, and we have to choose the right one. For example, if z = -0.1, the w could be -0.1118 or -3.5772.

Mathematica gave me the wrong branch. But to be fair, it did try to warn me.

Solve::ifun: Inverse functions are being used by Solve, so some solutions may not be found; use Reduce for complete solution information.

The solution is to use the -1 branch. In Mathematica’s notation, the branch comes before the argument. In SciPy, the branch comes second. So to fix the code above, we need to change

    lambertw(-r*exp(-r))

to

   lambertw(-r*exp(-r), -1)

and then the code will be correct.

If x is negative, and we use the -1 branch of W, then

W-1(x exp(x)) ≠ x

and so we’re not dividing by zero in our solution.