Mathematical plot twist: q-binomial coefficients

The simplest instance of a q-analog in math is to replace a positive integer n by a polynomial in q that converges to n as q → 1. Specifically, we associate with n the polynomial

[n]q = 1 + q + q² + q³ + … + qn-1.

which obviously converges to n when q → 1.

More generally we can associate with a real number a the rational polynomial

[a]q = (1 – qa) / (1 – q).

When a is a positive integer, the two definitions are the same. When a is not an integer, you can see by L’Hôpital’s rule that

limq → 1 [a]q = a.

Motivation

Why would you do this? I believe the motivation was to be able to define a limit of a series where the most direct approach fails or is difficult to work with. Move over to the land of q-analogs, do all your manipulations there where requiring |q| < 1 avoids problems with convergence, then take the limit as q goes to 1 when you’re done.

q-factorials and q-binomial coefficients

The factorial of a positive integer is

n! = 1 × 2 × 3 × … × n

The q-factorial of n replaces each term in the definition of factorial with its q-analog:

[n]q! = [1]q! [2]q! [3]q! … [n]q!

(It helps to think of []q! as a single operator defining the q-factorial as above. This is not the factorial of [n]q.

n! / (k! (nk)!)

and define the q-binomial coefficient by replacing factorials with q-factorials.

[n]q! / ([k]q! [nk]q!)

You can apply the same process to rising and falling powers, and then replace the rising powers in the definition of a hypergeometric function to define q-analogs of these functions.

(The q-analogs of hypergeometric functions are confusingly called “basic hypergeometric functions.” They are not basic in the sense of simple examples; they’re not examples at all but new functions. And they’re not basic in the sense of a basis in linear algebra. Instead, the name comes from q being the base of an exponent.)

Plot twist

And now for the plot twist promised in the title.

I said above that the q-analogs were motivated by subtle problems with convergence. There’s the idea in the background that eventually someone is going to let q approach 1 in a limit. But what if they don’t?

The theorems proved about q-analogs were first developed with the implicit assumption that |q| < 1, but at some point someone realized that the theorems don’t actually require this, and are generally true for any q. (Maybe you have to avoid q = 1 to not divide by 0 or something like that.)

Someone observed that q-analog theorems are useful when q is a prime power. I wonder who first thought of this and why. It could have been inspired by the notation, since q often denotes a prime power in the context of number theory. It could have been as simple as a sort of mathematical pun. What if I take the symbol q in this context and interpret it the way q is interpreted in another context?

Now suppose we have a finite field F with q elements; a theorem says q has to be a prime power. Form an n-dimensional vector space over F. How many subspaces of dimension k are there in this vector space? The answer is the q-binomial coefficient of n and k. I explore this further in the next post.

Three error function series

A common technique for numerically evaluating functions is to use a power series for small arguments and an asymptotic series for large arguments. This might be refined by using a third approach, such as rational approximation, in the middle.

The error function erf(x) has alternating series on both ends: its power series and asymptotic series both are alternating series, and so both are examples that go along with the previous post.

The error function is also a hypergeometric function, so it’s also an example of other things I’ve written about recently, though with a small twist.

The error function is a minor variation on the normal distribution CDF Φ. Analysts prefer to work with erf(x) while statisticians prefer to work with Φ(x). See these notes for translating between the two.

Power series

The power series for the error function is

The series converges for all x, but it is impractical for large x. For every x the alternating series theorem eventually applies, but for large x you may have to go far out in the series before this happens.

Asymptotic series

The complementary error function erfc(x) is given by

erfc(x) = 1 – erf(x).

Even though the relation between the two functions is trivial, it’s theoretically and numerically useful to have names for both functions. See this post on math functions that don’t seem necessary for more explanation.

The function erfc(x) has an alternating asymptotic series.

Here a!!, a double factorial, is the product of the positive integers less than or equal to a and of the same parity as a. Since in our case a = 2n – 1, a is odd and so we have the product of the positive odd integers up to and including 2n – 1. The first term in the series involves (-1)!! which is defined to be 1: there are no positive integers less than or equal to -1, so the double factorial of -1 is an empty product, and empty products are defined to be 1.

Confluent hypergeometric series

The error function is a hypergeometric function, but of a special kind. (As is often the case, it’s not literally a hypergeometric function but trivially related to a hypergeometric function.)

The term “hypergeometric function” is overloaded to mean two different things. The strictest use of the term is for functions F(a, b; c; x) where the parameters a and b specify terms in the numerator of coefficients and the parameter c specifies terms in the denominator. You can find full details here.

The more general use of the term hypergeometric function refers to functions that can have any number of numerator and denominator terms. Again, full details are here.

The special case of one numerator parameter and one denominator parameter is known as a confluent hypergeometric function. Why “confluent”? The name comes from the use of these functions in solving differential equations. Confluent hypergeometric functions correspond to a case in which two singularities of a differential equation merge, like the confluence of two rivers. More on hypergeometric functions and differential equations here.

Without further ado, here are two ways to relate the error function to confluent hypergeometric functions.

The middle expression is the same as power series above. The third expression is new.

Escaping the unit disk

Hypergeometric functions are defined in terms of their power series representation. This power series converges inside the unit circle and diverges outside. The functions extend to most [1] of the complex plane, and do so in a computationally nice way.

Analytic continuation is often sort of existential: we prove that it can be done, but we may not be able to say much explicitly about the result. But for hypergeometric functions, there is an explicit formula.

This formula is so complicated that it’s possible to miss its basic structure. Here’s an outline version:

This says that F evaluated at a point z outside of the unit circle is equal to the sum of two terms involving F evaluated at 1/z, a point inside the unit circle. This means you can evaluate F everywhere [1] if you can evaluate it inside the unit circle.

The parameters in each of the F‘s above are different, so the formula doesn’t relate a single function’s value at z to the same function’s values at 1/z. Instead, it relates the value of one instance of a class of functions at z to values of other instances of the class at 1/z.

And now without further ado, here is the analytic continuation formula in all its glory:

[1] There may be a singularity at 1, and there may be a branch cut along the real axis from 1 to infinity.

Connections

One of my favorite things in math is connecting together things that do not seem related at first. This post will elaborate on a connection between two recent posts.

Gauss’s constant

A couple weeks ago I wrote about Gauss’s constant

and several things it was related to. We can calculate g with n iterations of the process defining the AGM (arithmetic-geometric mean) as follows.

    def f(n):
a, b = 1, 2**0.5
for _ in range(n):
a, b = 0.5*(a+b), (a*b)**0.5
return 1/a


This process converges very quickly, calculating g to full machine precision when n = 4.

Elliptic integrals

Gauss’s constant, and the AGM more generally, can be computed in terms of an elliptic integral, and that in turn can be computed in terms of a hypergeometric function.

Elliptic integrals are so named because historically they are related to the problem of finding the perimeter of an ellipse. The particular elliptic integral we’re interested here is Legendre’s K function

(There are several conventions for parameterizing elliptic integrals, so this definition may not match what you see elsewhere. See last section.)

With this definition of K,

and by the symmetry of the AGM, you can reverse the order of x and y. It follows that

Hypergeometric functions

The previous post briefly introduced hypergeometric functions. The function K is related to Gauss’s hypergeometric function by

Expanding F as a series shows the connection to double factorials.

This means you could numerically evaluate a power series to compute Gauss’s constant, thought that would be inefficient. Because the AGM converges so quickly, you’d want to evaluate elliptic integrals in terms of the AGM, not evaluate the AGM in terms of elliptic integrals.

Conventions

There is some confusing terminology and variation in conventions when defining elliptic integrals. We’ve defined K here in terms of the “elliptic modulus” k. Some software, such as Mathematica and SciPy, define K in terms of “the parameter” m = k². So to evaluate K above using Mathematica, you’d need to square the argument first.

    g = EllipticK[0.5] Sqrt[2]/Pi

or

    g = 2 EllipticK[-1]/Pi

Similarly, in Python we’d write

    from scipy.special import ellipk
from math import pi
g = ellipk(0.5)*2**0.5/pi


or

    g = 2*ellipk(-1)/pi

I’ve said in several blog posts that multi-factorials come up often in practice, and given examples. This post will give a glimpse of why this is.

Rising powers

The kth rising power of a is

(a)k = a (a+1) (a + 2) … (a + k – 1).

So, for example, (5)3 = 5*6*7 and (1)k = n!.

Double factorials

Now let’s look at rising powers of fractions.

(1/2)k = (1/2) (3/2) (5/2) … (2k-1)/2 = (2k-1)!! / 2k.

This means double factorials of odd numbers can be written as rising powers.

(2k-1)!! = 2k (1/2)k .

Now for even numbers,

(2k)!! = 2k k! = 2k (1)k

and so even double factorials can be written in terms of rising powers too. This means that the term

(2k – 1)!! / (2k)!!

from the series in the previous post could be written as

(1/2)k / (1)k .

Hmm. Ratios of rising powers sound useful.

Generalizing

From the discussion above you can imagine how to create triple factorials out of rising powers of 1/3 and 2/3. In fact it’s easy to see that all multifactorials can be made out of rising powers of rational numbers.

Ratios of multifactorials come up often in coefficients of power series, and these can be rewritten as ratios of rising powers. Maybe we should have a name for functions whose coefficients are ratios of rising powers. And we do: hypergeometric functions. You can find the exact definition here.

At first the definition of a hypergeometric function seems arbitrary. I hope my previous posts, and especially this post, has been a sort of warm up to motivate why you might be interested in such functions.

Applications

In an old post I quoted W. W. Sawyer saying

There must be many universities today where 95 percent, if not 100 percent, of the functions studied by physics, engineering, and even mathematics students, are covered by the single symbol F(a, b; c; x).

The notation F(a, b; c; x) here denotes the hypergeometric function whose kth power series coefficient is

(a)k (b)k / (c)k  k!

More general hypergeometric functions have two lists of parameters, separated by a semicolon. The first set corresponds to rising powers in the numerator, and the second set corresponds to rising powers in the denominator.

The exponential function is the simplest hypergeometric function: both parameter lists are empty, leaving only the k! term in the denominator.

The arcsine function mentioned above is an example of a function that is not hypergeometric per se, but can be written simply in terms of a hypergeometric function:

sin-1(x) = x F(1/2, 1/2; 3/2, x²).

This is more what Sawyer had in mind. Elementary functions like sine, cosine, and log note pure hypergeometric functions but are simply related to hypergeometric functions, similar to arcsine. The same is true of more advanced functions, like elliptic integrals and the CDFs of common probability distributions.

More on hypergeometric functions

Last week I claimed that double, triple, and quadruple factorials came up in applications. The previous post showed how triple factorials come up in solutions to Airy’s equation. This post will show how quadruple factorials come up in solutions to Legendre’s equation.

Legendre’s differential equation is

The Legendre functions Pν and Qν are independent solutions to the equation for given values of ν.

When ν is of the form n + ½ for an integer n, the values of Pν and Qν at 0 involve quadruple factorials and also Gauss’s constant.

For example, if ν = 1/2, 5/2, 9/2, …, then Pν(0) is given by

Source:  An Atlas of Functions

Triple factorials and Airy functions

Last week I wrote in a post on multifactorials in which I said that

Double factorials come up fairly often, and sometimes triple, quadruple, or higher multifactorials do too.

This post gives a couple examples of triple factorials in practice.

One example I wrote about last week. Triple factorial comes up when evaluating the gamma function at an integer plus 1/3. As shown here,

Another example involves solutions to Airy’s differential equation

One pair of independent solutions consists of the Airy functions Ai(x) and Bi(x). Another pair consists of the functions

given in A&S 10.4.3. Because both pairs of functions solve the same linear ODE, each is a linear combination of the other.

Notice that the numerators are triple factorials, and so the series above can be rewritten as

The next post gives an example of quadruple factorials in action.

Gauss’s constant

I hadn’t heard of Gauss’s constant until recently. I imagine I’d seen it before and paid no attention. But once I paid attention, I started seeing it more often. There’s a psychology term for this—reticular activation?—like when you buy a green Toyota and suddenly you see green Toyotas everywhere.

Our starting point is the arithmetic-geometric mean or AGM. It takes two numbers, takes the arithmetic (ordinary) mean and the geometric mean, then repeats the process over and over. The limit of this process is the AGM of the two numbers.

Gauss’s constant is the reciprocal of the AGM of 1 and √2.

Gauss’s constant can be expressed in terms of the Gamma function:

Exponential sums

Last week I wrote about the sum

which we will see shortly is related to Gauss’s constant.

We can the definition of g in terms of Γ(1/4) and the reflection identify for the gamma function to show that the sum above can be written in terms of Gauss’s constant:

The alternating version of the sum is also related to g:

Integrals

Another place where g comes up is in integrals of hyperbolic functions. For example

Here

Beta probabilities

Another place Gauss’s constant comes up is in special values of beta distribution probabilities.

Define the incomplete beta function by

If X is a random variable with a beta(a, b) distribution, then the CDF of X is the normalized incomplete beta function, i.e.

A couple special values of B involve Gauss’s constant, namely

and

Source:  An Atlas of Functions

Ubiquitous constant

The constant M defined by 1/M = √2 g is the so-called “ubiquitous constant,” though this seems like an unjustified name for such an arcane constant. Perhaps there is some context in which the constant is ubiquitous in the colloquial sense.

Multiple angle identity for cotangent

The previous post discusses generalized replicative functions, functions that satisfy

Donald Knuth says in Exercise 40 of Section 1.2.4 of TAOCP that the functions cot πx and csc² πx fall into this class of functions.

If this is true of cot πx then it follows by differentiation that it is true of csc² πx as well, so we will focus on the former.

I believe that for f(x) = cot(πx) we have an = n and bn = 0, but I don’t have a proof. If this is correct then

Here’s a Python script that draws values of n and x at random and verifies that the left and right sides above are equal to at least 10 decimal places. It’s not a proof, but it’s good evidence.

    import numpy as np

def cot(x): return 1/np.tan(x)

def f1(n, x): return n*cot(n*np.pi*x)

def f2(n, x): return sum(cot(np.pi*(x + k/n)) for k in range(n))

np.random.seed(20211011)
for _ in range(1000):
n = np.random.randint(1, 50)
x = np.random.random()
a, b = f1(n,x), f2(n, x)
assert( abs(a - b) <= abs(a)*1e-10 )


Note that NumPy doesn’t have a cotangent function. See this post for quirks of how trig functions are supported in various environments.

An actual proof is left as an exercise for the reader. Knuth left it as an exercise too, but he didn’t say what the a‘s are, so I’m being nicer than he was. :)

If you come up with a proof, please share it in the comments.

Update

and substitute πx for x. Take the logarithm and then the derivative of both sides, divide by π, and you get the identity we’re trying to prove.

Multiplication theorem rabbit hole

When I started blogging I was reluctant to allow comments. It seems this would open the door to a flood of spam. Indeed it does, but nearly all of it can be filtered out automatically.

The comments that aren’t spam have generally been high quality. A comment on my post about the sawtooth function a few days ago has sent me down a fascinating rabbit hole.The sawtooth function belongs to a general class of functions called replicative functions, functions that satisfy the equation

for positive integers n.

I ran across this by reading Knuth’s TAOCP. Knuth gives an understated comment in section 1.2.4 exercise 40 that “it may be interesting to study also the more general class of functions” for which the left-hand side above includes constants that depend on n, i.e.

where the constants an and bn depend on n but not on x.

Bernoulli polynomials

The first Bernoulli polynomial B1(x) = x – ½ is a replicative function, and the only continuous replicative functions are multiples of B1 (proved in exercise 40 referenced above).

The rest of the Bernoulli polynomials Bm(x) are generalized replicative functions, with an = nm-1 and bn = 0. This was discovered by Joseph Raabe in the nineteenth century.

Gamma and digamma functions

Gauss’s multiplication theorem for the gamma function says that for positive n,

This is equation 6.1.20 in A&S. If we take logs of both sides, we find that log Γ is a almost a generalized replicative function.

The factor of n log(n) z prevents the definition from holding, but if we take derivatives, this term goes away.

This means that the derivative of log Γ, a.k.a the ψ function, a.k.a. the digamma function, is a generalized replicative function with an = n and bn = –n log n.

The derivative of a generalized replicative function is another function in the same class, so the higher derivatives of log Γ, known as the polygamma functions, are also generalized replicative functions.

Hurwitz zeta function

The Hurwitz zeta function ζ(s, z) is defined by

The Riemann zeta function is the special case of the Hurwitz zeta function with z = 1.

The Hurwitz zeta function is related to the polygamma functions by

It follows from this that for fixed s, ζ(s, z) is a generalized replicative function of z. You can show that an = ns and bn = 0.

Trig functions

See the next post for a couple examples involving trig functions: cot πx and csc² πx .