Dilogarithm, polylogarithm, and related functions

The functions dilogarithm, trilogarithm, and more generally polylogarithm are meant to be generalizations of the logarithm. I first came across the dilogarithm in college when I was evaluating some integral with Mathematica, and they’ve paid a visit occasionally ever since.

Unfortunately polylogarithms are defined in several slightly different and incompatible ways. I’ll start by following An Atlas of Functions and then mention differences in A&S, SciPy, and Mathematica.

Polylogarithms

According to Atlas,

Polylogarithms are themselves special cases of Lerch’s function. Also known as Jonquière’s functions (Ernest Jean Philippe Fauque de Jonquières, 1820–1901, French naval officer and mathematician), they appear in the Feynman diagrams of particle physics.

The idea is to introduce an extra parameter ν in the power series for natural log:

\mbox{polyln}_\nu(x) = -\sum_{j=1}^\infty \frac{(1-x)^j}{j^\nu}

When ν = 1 we get the ordinary logarithm, i.e. polyln1 = log. Then polyln2 is the dilogarithm diln, and polyln3 is the trilogarithm triln.

One advantage of the definition in Atlas is that the logarithm is a special case of the polylogarithm. Other conventions don’t have this property.

Other conventions

The venerable A&S defines dilogarithms in a way that’s equivalent to the negative of the definition above and does not define polylogarithms of any other order. SciPy’s special function library follows A&S. SciPy uses the name spence for the dilogarithm for reasons we’ll get to shortly.

Mathematica has the function PolyLog[ν, x] that evaluates to

\mbox{Li}_\nu(x) = \sum_{j=1}^\infty \frac{x^j}{j^\nu}

So polylnν above corresponds to -PolyLog[ν, -x] in Mathematica. Matlab’s polylog is the same as Mathematica’s PolyLog.

Relation to other functions

Spence’s integral is the function of x given by the integral

\int_1^x \frac{\log t}{t-1}\, dt

and equals diln(x). Note that the SciPy function spence returns the negative of the integral above.

The Lerch function mentioned above is named for Mathias Lerch (1860–1922) and is defined by the integral

\Phi(x, \nu, u) = \frac{1}{\Gamma(\nu)} \int_0^\infty \frac{t^{\nu-1} \exp(-ut)}{1 - x\exp(-t)} \, dt

The connection with polylogarithms is easier to see from the series expansion:

\Phi(x, \nu, u) = \sum_{j=0}^\infty \frac{x^j}{(j+u)^\nu}

The connection with polylogarithms is then

\Phi(x, \nu,1) = -\frac{1}{x} \mbox{polyln}_\nu(1-x)

Note that the Lerch function also generalizes the Hurwitz zeta function, which in turn generalizes the Riemann zeta function. When x = 1, the Lerch function reduces to ζ(ν, u).

Hypergeometric bootstrapping: implement one, get seven free

Suppose you have software to compute one hypergeometric function. How many more hypergeometric functions can you compute from it?

Hypergeometric functions satisfy a lot of identities, so you can bootstrap one such function into many more. That’s one reason they’re so useful in applications. For this post, I want to focus on just three formulas, the so-called linear transformation formulas that relate one hypergeometric function to another. (There are more linear formulas that relate one hypergeometric function to a combination of two others. I may consider those in a future post, but not today.)

Linear transformations

The classical hypergeometric function has three parameters. The linear transformations discussed above relate the function with parameters (abc) to those with parameters

  • (c – ac – bc)
  • (ac – bc)
  • (bc – ac)

Here are the linear transformations in detail:

\begin{eqnarray*} F(a, b; c; z) &=& (1-z)^{c -a-b} F(c-a, c-b; c; z) \\ &=& (1-z)^{-a} F\left(a, c-b; c; \frac{z}{z-1}\right) \\ &=& (1-z)^{-b} F\left(b, c-a; c; \frac{z}{z-1}\right) \end{eqnarray*}

How many more?

The three transformations above are the ones listed in Handbook of Mathematical Functions by Abramowitz and Stegun. How many more transformations can we create by combining them? To answer this, we represent each of the transformations with a matrix. The transformations correspond to multiplying the matrix by the column vector  (abc).

A &=& \left( \begin{array}{rrr} -1 & 0 & \phantom{-}1 \\ 0 & -1 & 1 \\ 0 & 0 & 1 \end{array} \right) \\ B &=& \left( \begin{array}{rrr} \phantom{-}1 & 0 & 0 \\ 0 & -1 & \phantom{-}1 \\ 0 & 0 & 1 \end{array} \right) \\ C &=& \left( \begin{array}{rrr} 0 & \phantom{-}1 & 0 \\ -1 & 0 & \phantom{-}1 \\ 0 & 0 & 1 \end{array} \right)

The question of how many transformations we can come up with can be recast as asking what is the order of the group generated by AB, and C. (I’m jumping ahead a little by presuming these matrices generate a group. Conceivably the don’t have inverses, but in fact they do. The inverses of AB, and C are AB, and AC respectively.)

A little experimentation reveals that there are eight transformations: I, ABCDABEACFBC, and GCB. So in addition to the identity I, the do-nothing transformation, we have found four more: DEF, and G. These last four correspond to taking  (abc) to

  • (c – abc)
  • (ac – bc)
  • (bac)
  • (c – bc – ac)

This means that if we have software to compute the hypergeometric function with one fixed set of parameters, we can bootstrap that into computing the hypergeometric function with up to seven more sets of parameters. (Some of the seven new combinations could be repetitive if, for example, ab.)

Identifying the group

We have uncovered a group of order 8, and there are only 5 groups that size. We should be able to find a familiar group isomorphic to our group of transformations.

The five groups of order 8 are Z8 (cyclic group), Z4×Z2 (direct product), E8 (elementary Abelian group), D8 (dihedral group), and the quaternion group. The first three are Abelian, and our group is not, so our group must be isomorphic to either the quaternion group or the dihedral group. The quaternion group has only 1 element of order 2, and the dihedral group has 5 elements of order 2. Our group has five elements of order 2 (ABDFG) and so it must be isomorphic to the dihedral group of order 8, the rotations and reflections of a square. (Some authors call this D8, because the group has eight elements. Others call it D4, because it is the group based on a 4-sided figure.)

Dihedral group of order 8

 

Special function resources

This week’s resource post: some pages of notes on special functions:

See also blog posts tagged special functions.

I tweet about special functions occasionally on AnalysisFact

Last week: HTML, LaTeX, and Unicode

Next week: Python resources

Uses for orthogonal polynomials

When I interviewed Daniel Spielman at this year’s Heidelberg Laureate Forum, we began our conversation by looking for common mathematical ground. The first thing that came up was orthogonal polynomials. (If you’re wondering what it means for two polynomials to be orthogonal, see here.)

JC: Orthogonal polynomials are kind of a lost art, a topic that was common knowledge among mathematicians maybe 50 or 100 years ago and now they’re obscure.

DS: The first course I taught I spent a few lectures on orthogonal polynomials because they kept coming up as the solutions to problems in different areas that I cared about. Chebyshev polynomials come up in understanding solving systems of linear equations, such as if you want to understand how the conjugate gradient method behaves. The analysis of error correcting codes and sphere packing has a lot of orthogonal polynomials in it. They came up in a course in multi-linear algebra I had in grad school. And they come up in matching polynomials of graphs, which is something people don’t study much anymore. … They’re coming back. They come up a lot in random matrix theory. … There are certain things that come up again and again and again so you got to know what they are.

* * *

More from my interview with Daniel Spielman:

 

Multiple zeta

The Riemann zeta function, introduced by Leonard Euler, is defined by

\zeta(k) = \sum n^{-k}

where the sum is over all positive integers n.

Euler also introduced a multivariate generalization of the zeta function

\zeta(k_1, \ldots, k_r) = \sum n_1^{-k_1}\cdots n_r^{-k_r}

where the sum is over all decreasing k-tuples of positive integers. This generalized zeta function satisfies the following beautiful identity:

 \zeta(a)\,\zeta(b) = \zeta(a, b) + \zeta(b, a) + \zeta(a+b)

The multivariate zeta function and identities such as the one above are important in number theory and are the subject of open conjectures.

Source

Relating Airy and Bessel functions

The Airy functions Ai(x) and Bi(x) are independent solutions to the differential equation

y'' - xy = 0

For negative x they act something like sin(x) and cos(x). For positive x they act something like exp(x) and exp(-x). This isn’t surprising if you look at the differential equation. If you replace x with a negative constant, you get sines and cosines, and if you replace it with a positive constant, you get positive and negative exponentials.

The Airy functions can be related to Bessel functions as follows:

\mathrm{Ai}(x) = \left\{ 	\begin{array}{ll} 		\frac{1}{3}\sqrt{\phantom{-}x} \left(I_{-1/3}(\hat{x}) - I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /><br /><br /><br />
                \\<br /><br /><br /><br />
		\frac{1}{3}\sqrt{-x} \left(J_{-1/3}(\hat{x}) + J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 	\end{array} \right.

and

\mathrm{Bi}(x) = \left\{ 	\begin{array}{ll} 		\sqrt{\phantom{-}x/3} \left(I_{-1/3}(\hat{x}) + I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br />                 \\<br /> 		\sqrt{-x/3} \left(J_{-1/3}(\hat{x}) - J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 	\end{array} \right.

Here J is a “Bessel function of the first kind” and I is a “modified Bessel function of the first kind.” Also

\hat{x} = \frac{2}{3} \left(\sqrt{|x|}\right)^3

To verify the equations above, and to show how to compute these functions in Python, here’s some code.

The SciPy function airy computes both functions, and their first derivatives, at once. I assume that’s because it doesn’t take much longer to compute all four functions than to compute one. The code for Ai2 and Bi2 below uses np.where instead of if ... else so that it can operate on NumPy vectors all at once. You can plot Ai and Ai2 and see that the two curves lie on top of each other. The same holds for Bi and Bi2.

 

from scipy.special import airy, jv, iv
from numpy import sqrt, where

def Ai(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return ai

def Bi(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return bi

def Ai2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        third*sqrt( x)*(iv(-third, hatx) - iv(third, hatx)),
        third*sqrt(-x)*(jv(-third, hatx) + jv(third, hatx)))

def Bi2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        sqrt( x/3.0)*(iv(-third, hatx) + iv(third, hatx)),
        sqrt(-x/3.0)*(jv(-third, hatx) - jv(third, hatx)))

 

There is a problem with Ai2 and Bi2: they return nan at 0. A more careful implementation would avoid this problem, but that’s not necessary since these functions are only for illustration. In practice, you’d simply use airy and it does the right thing at 0.

Related links:

Fibonacci numbers and orthogonal polynomials

The famous Fibonacci numbers are defined by the initial conditions

F0 = 0,  F1 = 1

and the recurrence relation

Fn = Fn-1 + Fn-2

for n > 1.

The Fibonacci polynomials are defined similarly. The have the same initial conditions

F0(x) = 0, F1(x) = 1

but a slightly different recurrence relation

Fn(x) = x Fn-1(x) + Fn-2(x)

for n > 1.

Several families of orthogonal polynomials satisfy a similar recurrence relationship

Fn(x) = p(x) Fn-1(x) + c(n) Fn-2(x).

The table below gives the values of p(x) and c(n) for a few families of polynomials.

Family p(x) c(n) P0 P1(x)
Fibonacci x 1 0 1
Chebyshev T 2x -1 1 x
Chebyshev U 2x -1 1 2x
Hermite 2x 2 – 2n 1 2x

There are two common definitions of the Hermite polynomials, sometimes called the physicist convention and the probabilist convention. The table above gives the physicist convention. For the probabilist definition, change the 2’s to 1’s in the last row and leave the 1 alone.

(If you haven’t heard of orthogonal polynomials, you might be wondering “orthogonal to what?”. These polynomials are orthogonal in the sense of an inner product formed by multiplying two polynomials and a weight, then integrating. Orthogonal polynomials differ by the the weight and the limits of integration. For example, the (physicist) Hermite polynomials are orthogonal with weight function exp(-x2) and integrating over the entire real line. The probabilist Hermite polynomials are very similar, but use the standard normal distribution density exp(-x2/2)/√(2π) as the weight instead of exp(-x2).)

Giant components and the Lambert W-function

The Lambert W-function is the function w(z) implicitly defined by w exp(w) = z. When I first saw this, I thought that I’d seen something like this come up many times and that it would be really handy to know that this function has a name and has many software implementations1. Since then, however, I’ve hardly ever run into the W-function in application. But here’s an application I ran into recently.

A “giant component” in a random network is a component whose size grows in proportion to the size of the network. For a Poisson random network, let c = p(n – 1) be the expected number of vertices connected to any given vertex. If c > 1 then as n goes to infinity, there will be a giant component with probability 1. S, the proportion of nodes inside the a giant component, satisfies the equation

S = 1 – exp(-cS).

S plays a role similar to that of w in the definition of the W-function, which suggests the W-function might come in handy. And in fact, this book gives this solution for S:

S = 1 + w( –c exp(-c) )/c.

There are a couple issues. First, it’s not obvious that solution is correct. Second, we need to be more careful about how we define w(z) when z is negative.

Given some value of c, let S = 1 + w( –c exp(-c) )/c. Then

w( –c exp(-c) ) = –c(1 – S).

Apply the function f(x) = x exp( x ) to both sides of the equation above. By the definition of the W-function, we have

c exp(-c) = –c(1 – S) exp( –c(1 – S) ).

From there it easily follows that

S = 1 – exp(-cS).

Now let’s look more carefully at the definition of the W-function. For positive real z, w exp(w) = z has a unique real solution defining w. But in the calculations above, we have evaluated w at –c exp(-c), which is negative since c > 1. For negative arguments, we have to pick a branch of w. Which branch guarantees that S = 1 – exp(-cS)? Any branch2. No matter which solution to w exp(w) = z we pick, the resulting S satisfies the giant component equation. However, the wrong branch might result in a meaningless solution. Since S is a proportion, S should be between 0 and 1.

Let’s go back and say that for our purposes, we will take w(z) to mean the real number no less than -1 satisfying w exp(w) = z. Then for c > 1, the solution S = 1 + w( –c exp(-c) )/c is between 0 and 1. You can show that S = 1 – exp(-cS) has a unique solution for 0 < S < 1, so any other branch would not give a solution in this interval.

Related links:

* * *

1 For example, in Python the Lambert W-function is scipy.special.lambertw. The function takes two arguments: z and an integer denoting the branch. By default, the branch argument is set to 0, giving the branch discussed in this post.

2 For -1/e < z < 0, there are two real branches of w(z), but there are also infinitely many complex branches.

Addition formulas for Bessel functions

When trying to understand a complex formula, it helps to first ask what is being related before asking how they are related.

This post will look at addition theorems for Bessel functions. They related the values of Bessel functions at two points to the values at a third point.

Let a, b, and c be lengths of the sides of a triangle and let θ be the angle between the sides of length a and b. The triangle could be degenerate, i.e. θ could be π. The addition theorems here give the value of Bessel functions at c in terms of values of Bessel functions evaluated at b and c.

The first addition theorem says that

J0(c) = ∑ Jm(a) Jm(b) exp(imθ)

where the sum is over all integer values of m.

This says that the value of the Bessel function J0 at c is determined by a sort of inner product of values of Bessel functions of all integer orders at a and b. It’s not exactly an inner product because it is not positive definite. (Sometimes you’ll see the formula above written with a factor λ multiplying a, b, and c. This is because you can scale every side of a triangle by the same amount and not change the angles.)

Define the vectors x, y, and z to be the values of all Bessel functions evaluated at a, b, and c respectively. That is, for integer k, xk = Jk(a) and similar for y and z. Also, define the vector w by wk = exp(ikθ). Then the first addition theorem says

z0 = ∑ xm ym wm.

This is a little unsatisfying because it relates the value of one particular Bessel function at c to the values of all Bessel functions at a and b. We’d like to relate all Bessel function values at c to the values at a and b. That is, we’d like to relate the whole vector z to the vectors x and y.

Define the vector v by vk = exp(inψ) where ψ is the angle between the sides of length b and c. Then the formula we’re looking for is

zn = v-nxn+m ym wm.

Related links:

Math tools for the next 20 years

Igor Carron commented on his blog that

… the mathematical tools that we will use in the next 20 years are for the most part probably in our hands already.

He compares this to progress in treating leukemia: survival rates increased dramatically over the last 40 years, not primarily by developing new drugs, but by better applying drugs that were available in 1970.

I find that plausible. I’ve gotten a lot of mileage out of math that was well known 100 years ago.

Related post: Doing good work with bad tools

Castles and quantum mechanics

How are castles and quantum mechanics related? One connection is rook polynomials.

The rook is the chess piece that looks like a castle, and used to be called a castle. It can move vertically or horizontally, any number of spaces.

A rook polynomial is a polynomial whose coefficients give the number of ways rooks can be arranged on a chess board without attacking each other. The coefficient of xk in the polynomial Rm,n(x) is the number of ways you can arrange k rooks on an m by n chessboard such that no two rooks are in the same row or column.

The rook polynomials are related to the Laguerre polynomials by

Rm,n(x) = n! xn Lnmn(-1/x)

where Lnk(x) is an “associated Laguerre polynomial.” These polynomials satisfy Laguerre’s differential equation

x y” + (n+1-x) y‘ + k y = 0,

an equation that comes up in numerous contexts in physics. In quantum mechanics, these polynomials arise in the solution of the Schrödinger equation for the hydrogen atom.

Related: Relations between special functions

Six analysis and probability diagrams

Here are a few diagrams I’ve created that summarize relationships in analysis and probability. Click on a thumbnail image to go to a page with the full image and explanatory text.

Special functions

Gamma and related functions

Probability distributions

Conjugate priors

Convergence theorems

Bessel functions

 

Related: Visualizing category theory concept dependencies

Doubly periodic functions

Functions like sine and cosine are periodic. For example, sin(x + 2πn) = sin(x) for all x and any integer n, and so the period of sine is 2π. But what happens if you look at sine or cosine as functions of a complex variable? They’re still periodic if you shift left or right, but not if you shift up or down. If you move up or down, i.e. in a pure imaginary direction, sines and cosines become unbounded.

Doubly periodic functions are periodic in two directions. Formally, a function f(z) of complex variable is doubly periodic if there exist two constants ω1 and ω2 such that

f(z) = f(z + ω1) = f(z + ω2)

for all z. The ratio ω1 / ω2 cannot be real; otherwise the two periods point in the same direction. For the rest of this post, I’ll assume ω1 = 1 and ω2 = i. Such functions are periodic in the horizontal (real-axis) and vertical (imaginary-axis) directions. They repeat everywhere their behavior on the unit square.

What do doubly periodic functions look like? It depends on what restrictions we place on the functions. When we’re working with complex functions, we’re typically interested in functions that are analytic, i.e. differentiable as complex functions.

Only constant functions can be doubly periodic and analytic everywhere. Why? Our functions take on over and over the values they take on over the unit square. If a function is analytic over the (closed) unit square then it’s bounded over that square, and since it’s doubly periodic, it’s bounded everywhere. By Liouville’s theorem, the only bounded analytic functions are constants.

This says that to find interesting doubly periodic functions, we’ll have to relax our requirements. Instead of requiring functions to be analytic everywhere, we will require them to be analytic except at isolated singularities. That is, the functions are allowed to blow up at a finite number of points. There’s a rich set of such functions, known as elliptic functions.

There are two well-known families of elliptic functions. One is the Weierstrass ℘ function (TeX symbol wp, Unicode U+2118) and its derivatives. The other is the Jacobi functions sn, cn, and dn. These functions have names resembling familiar trig functions because the Jacobi functions are in some ways analogous to trig functions.

It turns out that all elliptic functions can be written as combinations either of the Weierstrass function and its derivative or combinations of Jacobi functions. Roughly speaking, Weierstrass functions are easier to work with theoretically and Jacobi functions are easier to work with numerically.

Related posts:

College math in a single symbol

From Prelude to Mathematics by W. W. Sawyer (1955):

There must be many universities today where 95 percent, if not 100 percent, of the functions studied by physics, engineering, and even mathematics students, are covered by the single symbol F(a, b; c; x).

The symbol Sawyer refers to is the hypergeometric function. (There are hypergeometric functions with any number of parameters, but the case with three parameters is so common that it is often called “the” hypergeometric function.) The most commonly used functions in application — trig functions, exp, log, the error function, Bessel functions, etc. — are either hypergeometric functions or closely related to hypergeometric functions. Sawyer continues:

I do not wish to imply that the hypergeometric function is the only function about which mathematics knows anything. That is far from being true. … but the valley inhabited by schoolboys, by engineers, by physicists, and by students of elementary mathematics, is the valley of the Hypergeometric Function, and its boundaries are (but for one or two small clefts explored by pioneers) virgin rock.

Related links: