Turning a trick into a technique

Someone said a technique is a trick that works twice.

I wanted to see if I could get anything interesting by turning the trick in the previous post into a technique. The trick created a high-order approximation by subtracting a multiple one even function from another. Even functions only have even-order terms, and by using the right multiple you can cancel out the second-order term as well.

For an example, I’d like to approximate the Bessel function J0(x) by the better known cosine function. Both are even functions.

J0(x) = 1 − x2/4 + x4/64 + …
cos(x) = 1 − x2/2 + x4/24 + …

and so

2 J0(x) − cos(x) = 1 − x4/96 + …

which means

J0(x) ≈ (1 + cos(x))/2

is an excellent approximation for small x.

Let’s try this for a couple examples.

J0(0.2) = 0.990025 and (1 + cos(0.2))/2 = 0.990033.

J0(0.05) = 0.99937510 and (1 + cos(0.05))/2 = 0.99937513.

Closed-form solution to the nonlinear pendulum equation

The previous post looks at the nonlinear pendulum equation and what difference it makes to the solutions if you linearize the equation.

If the initial displacement is small enough, you can simply replace sin θ with θ. If the initial displacement is larger, you can improve the accuracy quite a bit by solving the linearized equation and then adjusting the period.

You can also find an exact solution, but not in terms of elementary functions; you have to use Jacobi elliptic functions. These are functions somewhat analogous to trig functions, though it’s not helpful to try to pin down the analogies. For example, the Jacobi function sn is like the sine function in some ways but very different in others, depending on the range of arguments.

We start with the differential equation

θ″(t) + c² sin( θ(t) ) = 0

where c² = g/L, i.e. the gravitational constant divided by pendulum length, and initial conditions θ(0) = θ0 and θ′(0) = 0. We assume −π < θ0 < π.

Then the solution is

θ(t) = 2 arcsin( a cd(ct | m ) )

where a = sin(θ0/2), ma², and cd is one of the 12 Jacobi elliptic functions. Note that cd, like all the Jacobi functions, has an argument and a parameter. In the equation above the argument is ct and the parameter is m.

The last plot in the previous post was misleading, showing roughly equal parts genuine difference and error from solving the differential equation numerically. Here’s the code that was used to solve the nonlinear equation.

from scipy.special import ellipj, ellipk
from numpy import sin, cos, pi, linspace, arcsin
from scipy.integrate import solve_ivp

def exact_period(θ):
    return 2*ellipk(sin(θ/2)**2)/pi

def nonlinear_ode(t, z):
    x, y = z
    return [y, -sin(x)]    

theta0 = pi/3
b = 2*pi*exact_period(theta0)
t = linspace(0, 2*b, 2000)

sol = solve_ivp(nonlinear_ode, [0, 2*b], [theta0, 0], t_eval=t)

The solution is contained in sol.y[0].

Let’s compare the numerical solution to the exact solution.

def f(t, c, theta0):
    a = sin(theta0/2)
    m = a**2
    sn, cn, dn, ph = ellipj(c*t, m)
    return 2*arcsin(a*cn/dn)

There are a couple things to note about the code. First,SciPy doesn’t implement the cd function, but it can be computed as cn/dn. Second, the function ellipj returns four functions at once because it takes about as much time to calculate all four as it does to compute one of them.

Here is a plot of the error in solving the differential equation.

And here is the difference between the exact solution to the nonlinear pendulum equation and the stretched solution to the linear equation.

A lesser-known characterization of the gamma function

The gamma function Γ(z) extends the factorial function from integers to complex numbers. (Technically, Γ(z + 1) extends factorial.) There are other ways to extend the factorial function, so what makes the gamma function the right choice?

The most common answer is the Bohr-Mollerup theorem. This theorem says that if f: (0, ∞) → (0, ∞) satisfies

  1. f(x + 1) = x f(x)
  2. f(1) = 1
  3. log f is convex

then f(x) = Γ(x). The theorem applies on the positive real axis, and there is a unique holomorphic continuation of this function to the complex plane.

But the Bohr-Mollerup theorem is not the only theorem characterizing the gamma function. Another theorem was by Helmut Wielandt. His theorem says that if f is holomorphic in the right half-plane and

  1. f(z + 1) = z f(z)
  2. f(1) = 1
  3. f(z) is bounded for {z: 1 ≤ Re z ≤ 2}

then f(x) = Γ(x). In short, Wielandt replaces the log-convexity for positive reals with the requirement that f is bounded in a strip of the complex plane.

You might wonder what is the bound alluded to in Wielandt’s theorem. You can show from the integral definition of Γ(z) that

|Γ(z)| ≤ |Γ(Re z)|

for z in the right half-plane. So the bound on the complex strip {z: 1 ≤ Re z ≤ 2} equals the bound on the real interval [1, 2], which is 1.

Γ(1/n)

If n is a positive integer, then rounding Γ(1/n) up to the nearest integer gives n. In symbols,

\left\lceil \Gamma\left( \tfrac{1}{n}\right) \right\rceil = n

We an illustrate this with the following Python code.

>>> from scipy.special import gamma
>>> from math import ceil
>>> for n in range(1, 101):
    ... assert(ceil(gamma(1/n)) == n)

You can find a full proof in [1]. I’ll give a partial proof that may be more informative than the full proof.

The asymptotic expansion of the gamma function near zero is

\Gamma(z) = \frac{1}{z} - \gamma + {\cal O}(z^2)

where γ is the Euler-Mascheroni constant.

So when we set z = 1/n we find Γ(1/n) ≈ n − γ + O(1/n²). Since 0 < γ < 1, the theorem above is true for sufficiently large n. And it turns out “sufficiently large” can be replaced with n ≥ 1.

[1] Gamma at reciprocals of integers: 12225. American Mathematical Monthly. October 2022. pp 789–790.

Trigamma

Guy wearing a ΓΓΓ shirt

The most important mathematical function after the basics is the gamma function. If I could add one function to a calculator that has trig functions, log, and exponential, it would be the gamma function. Or maybe the log of the gamma function; it’s often more useful than the gamma function itself because it doesn’t overflow as easily.

The derivative of the log gamma function is the digamma function, denoted ψ. It comes up often in application. I just did a quick search and found I’ve written six posts containing the word “digamma.”

\psi(z) = \frac{d}{dz} \log \Gamma(z)

The derivative of the digamma function ψ′ is the trigamma function.

\psi^\prime(z) = \frac{d}{dz} \psi(z) = \frac{d^2}{dz^2} \log \Gamma(z)

The trigamma function, and higher derivatives of the digamma function, appear in applications. I remember, for example, a researcher asking me to add the trigamma function to the mathematical library I wrote for the biostatistics department at MD Anderson.

I was thinking about the trigamma function because I ran across a the following series for the function [1].

\psi^\prime (z+1) = \frac{1}{(z + 1)} + \frac{1!}{2} \frac{1}{(z+1)^{\bar{2}}} + \frac{2!}{3}\frac{1}{(z+1)^{\bar{3}}} + \cdots

Note the bars on top of the exponents: the denominators are rising powers of z + 1, not ordinary powers.

The series converges uniformly for Re(z) > −1 + δ for δ > 0 [2]. It series converges quickly for large z.

When I saw the title of the paper I thought it sounded like a Greek fraternity. There is a Tri-Delta fraternity, but as far as I know there is no Tri-Gamma fraternity.

Related posts

[1] Harold Ruben. A Note on the Trigamma Function. The American Mathematical Monthly. Vol 83, No. 8. p. 622.

[2] It may seem unnecessary to say Re(z) > −1 + δ for δ > 0. Couldn’t you just say for Re(z) > −1? Pointwise, yes, but uniform convergence requires the real part of z to be bounded away from −1 by a fixed amount, regardless of the imaginary part of z.

An integral theorem of Gauss

Gauss proved in 1818 that the value of integral

\int_0^{\pi/2} \left( x^2 \sin^2\theta + y^2 \cos^2 \theta \right)^{-1/2} \, d\theta

is unchanged if x and y are replaced by (xy)/2 and √(xy), i.e. if you replaced x and y with their arithmetic mean and geometric mean [1].

So, for example, if you wanted to compute

\int_0^{\pi/2} \left( 9 \sin^2\theta + 49 \cos^2 \theta \right)^{-1/2} \, d\theta

you could instead compute

\int_0^{\pi/2} \left( 25 \sin^2\theta + 21 \cos^2 \theta \right)^{-1/2} \, d\theta

Notice that the coefficients of sin² θ and cos² θ are more similar in the second integral. It would be nice if the two coefficients were equal because then the integrand would be a constant, independent of θ, and we could evaluate the integral. Maybe if we apply Gauss’ theorem again and again, the coefficients will become more nearly equal.

We started with x = 3 and y = 7. Then we had x = 5 and y = √21 = 4.5826. If we compute the arithmetic and geometric means again, we get x = 4.7913 and y = 4.7874.  If we do this one more time we get xy = 4.789013. The values of x and y still differ, but only after the sixth decimal place.

It would seem that if we keep replacing x and y by their arithmetic and geometric means, we will converge to a constant. And indeed we do. This constant is called the arithmetic-geometric mean of x and y, denoted AGM(xy). This means that the value of the integral above is π/ (2 AGM(xy)). Because the iteration leading to the AGM converges quickly, this provides an efficient numerical algorithm for computing the integral at the top of the post.

This is something I’ve written about several times, though in less concrete terms. See, for example, here. Using more advanced terminology, the AGM gives an efficient way to evaluate elliptic integrals.

Related posts

[1] B. C. Carlson, Invariance of an Integral Average of a Logarithm. The American Mathematical Monthly, Vol. 82, No. 4 (April 1975), pp. 379–382.

Legendre polynomials

The previous post mentioned Legendre polynomials. This post will give a brief introduction to these polynomials and a couple hints of how they are used in applications.

One way to define the Legendre polynomials is as follows.

  • P0(x) = 1
  • Pk are orthogonal on [−1, 1].
  • Pk(1) = 1 for all k ≥ 0.

The middle bullet point means

\int_{-1}^1 P_m(x) P_n(x) \, dx = 0

if mn. The requirement that each Pk is orthogonal to each of its predecessors determines Pk up to a constant, and the condition Pk(1) = 1 determines this constant.

Here’s a plot of the first few Legendre polynomials.

There’s an interesting pattern that appears in the white space of a graph like the one above when you plot a large number of Legendre polynomials. See this post.

The Legendre polynomial Pk(x) satisfies Legendre’s differential equation; that’s what motivated them.

(1 - x^2) y^{\prime\prime} -2x y^\prime + k(k+1)y = 0

This differential equation comes up in the context of spherical harmonics.

Next I’ll describe a geometric motivation for the Legendre polynomials. Suppose you have a triangle with one side of unit length and two longer sides of length r and y.

You can find y in terms of r by using the law of cosines:

y = \sqrt{1 - 2r \cos \theta + r^2}

But suppose you want to find 1/y in terms of a series in 1/r. (This may seem like an arbitrary problem, but it comes up in applications.) Then the Legendre polynomials give you the coefficients of the series.

\frac{1}{y} = \frac{P_0(\cos\theta)}{r} + \frac{P_1(\cos\theta)}{r^2} + \frac{P_2(\cos\theta)}{r^3} + \cdots

Source: Keith Oldham et al. An Atlas of Functions. 2nd edition.

Related posts

Reproducing a hand-drawn plot

The plots in old (i.e. pre-computer) math books are impressive. These plots took a lot of effort to produce, and so they weren’t created lightly. Consequently they tend to be aesthetically and mathematically interesting. A few weeks ago I recreated a plot from A&S using Mathematica, and today I’d like to do the same for the plot below from a different source [1].

Here is my approximate reproduction.

I’ll give the mathematical details below for those who are interested.

The plot shows “normalized associated Legendre functions of the first kind.” There are two families of Legendre polynomials, denoted P and Q; we’re interested in the former. “Associated” means polynomials that are derived the Legendre polynomials by taking derivatives. Normalized means the polynomials are multiplied by constants so that their squares integrate to 1 over [−1, 1].

Mathematica has a function LegendreP[n, m, x] that implements the associated Legendre polynomials Pnm(x). I didn’t see that Mathematica has a function for the normalized version of these functions, so I rolled by own.

f[n_, m_, x_] := (-1)^n LegendreP[n, m, x] 
        Sqrt[(2 n + 1) Factorial[n - m]/(2 Factorial[n + m])]

I added the alternating sign term up front after discovering that apparently the original plot used a different convention for defining Pnm than Mathematica uses.

I make my plot by stacking the plots created by

Plot[Table[f[n, n, x], {n, 1, 8}],  {x, 0, 1}]

and

Plot[Table[f[n + 4, n, x], {n, 0, 4}],  {x, 0, 1}]

The original plot shows P4(x). I used the fact that this equals P40(x) to simplify the code. I also left out the flat line plotting P0 because I thought that looked better.

Related post: Duplicating a Hankel function plot from A&S.

[1] Tables Of Functions With Formulae And Curves by Fritz Emde, published in 1945. Available on Archive.org.

Multiple angle asymmetry

The cosine of a multiple of θ can be written as a polynomial in cos θ. For example,

cos 3θ = 4 cos3 θ − 3 cos θ

and

cos 4θ = 8 cos4 θ − 8 cos2 θ + 1.

But it may or may not be possible to write the sine of a multiple of θ as a polynomial in sin θ. For example,

sin 3θ = −4 sin3 θ + 3 sin θ

but

sin 4θ =  − 8 sin3 θ cos θ + 4 sin θ cos θ

It turns out cos nθ can always be written as a polynomial in cos θ, but sin nθ can be written as a polynomial in sin θ if and only if n is odd. We will prove this, say more about sin nθ for even n, then be more specific about the polynomials alluded to.

Proof

We start by writing exp(inθ) two different ways:

cos nθ + i sin nθ = (cos θ + i sin θ)n

The real part of the left hand side is cos nθ and the real part of the right hand side contains powers of cos θ and even powers of sin θ. We can convert the latter to cosines by replacing sin2 θ with 1 − cos2 θ.

The imaginary part of the left hand side is sin nθ. If n is odd, the right hand side involves odd powers of sin θ and even powers of cos θ, in which case we can replace the even powers of cos θ with even powers of sin θ. But if n is even, every term in the imaginary part will involve odd powers of sin θ and odd powers of cos θ. Every odd power of cos θ can be turned into terms involving a single cos θ and an odd power of sin θ.

We’ve proven a little more than we set out to prove. When n is even, we cannot write sin nθ as a polynomial in sin θ, but we can write it as cos θ multiplied by an odd degree polynomial in sin θ. Alternatively, we could write sin nθ as sin θ multiplied by an odd degree polynomial in cos θ.

Naming polynomials

The polynomials alluded to above are not arbitrary polynomials. They are well-studied polynomials with many special properties. Yesterday’s post on Chebyshev polynomials defined Tn(x) as the nth degree polynomial for which

Tn(cos θ) = cos nθ.

That post didn’t prove that the right hand side is a polynomial, but this post did. The polynomials Tn(x) are known as Chebyshev polynomials of the first kind, or sometimes simply Chebyshev polynomials since they come up in application more often than the other kinds.

Yesterday’s post also defined Chebyshev polynomials of the second kind by

Un(cos θ) sin θ = sin (n+1)θ.

So when we say cos nθ can be written as a polynomial in cos θ, we can be more specific: that polynomial is Tn.

And when we say sin nθ can be written as sin θ times a polynomial in cos θ, we can also be more specific:

sin nθ = sin θ Un−1(cos θ).

Solving trigonometric equations

A couple years ago I wrote about systematically solving trigonometric equations. That post showed that any polynomial involving sines and cosines of multiples of θ could be reduced to a polynomial in sin θ and cos θ. The results in this post let us say more about this polynomial, that we can write it in terms of Chebyshev polynomials. This might allow us to apply some of the numerous identities these polynomials satisfy and find useful structure.

Related posts

Posthumous Chebyshev Polynomials

Two families of orthogonal polynomials are named after Chebyshev because he explored their properties. These are prosaically named Chebyshev polynomials of the first and second kind.

I recently learned there are Chebyshev polynomials of the third and fourth kind as well. You might call these posthumous Chebyshev polynomials. They were not developed by Mr. Chebyshev, but they bear a family resemblance to the polynomials he did develop.

The four kinds of Chebyshev polynomials may be defined in order as follows.

\begin{align*} T_n(\cos\theta) &= \cos n\theta \\ U_n(\cos\theta) &= \frac{\sin (n+1)\theta}{\sin \theta} \\ V_n(\cos\theta) &= \frac{\cos \left(n+\frac{1}{2}\right)\theta}{\cos \frac{1}{2}\theta} \\ W_n(\cos\theta) &= \frac{\sin \left(n+\frac{1}{2}\right)\theta}{\sin \frac{1}{2}\theta} \\ \end{align*}

It’s not obvious that these definitions even make sense, but in each case the right hand side can be expanded into a sum of powers of cos θ, i.e. a polynomial in cos θ. [1]

All four kinds of Chebyshev polynomials satisfy the same recurrence relation

P_n(x) = 2x\,P_{n-1}(x) - P_{n-2}(x)

for n ≥ 2 and P0 = 1 but with different values of P1, namely x, 2x, 2x − 1, and 2x + 1 respectively [2].

Plots

We can implement Chebyshev polynomials of the third kind using the recurrence relation above.

def V(n, x):
    if n == 0: return 1
    if n == 1: return 2*x - 1
    return 2*x*V(n-1, x) - V(n-2, x)

Here is a plot of Vn(x) for n = 0, 1, 2, 3, 4.

The code for implementing Chebyshev polynomials of the fourth kind is the same, except the middle line becomes

    if n == 1: return 2*x + 1

Here is the corresponding plot.

Square roots

The Chebyshev polynomials of the first and third kind, and polynomials of the second and fourth kind, are related as follows:

\begin{align*} V_n(x)&=\sqrt\frac{2}{1+x}T_{2n+1}\left(\sqrt\frac{x+1}{2}\right) \\ W_n(x)&=U_{2n}\left(\sqrt\frac{x+1}{2}\right) \end{align*}

To see that the expressions on the right hand side really are polynomials, note that Chebyshev polynomials of the first and second kinds are odd for odd orders and even for even orders [3]. This means that in the first equation, every term in T2n + 1 has a factor of √(1 + x) that is canceled out by the 1/√(1 + x) term up front. In the second equation, there are only even powers of the radical term so all the radicals go away.

You could take the pair of equations above as the definition of Chebyshev polynomials of the third and fourth kind, but the similarity between these polynomials and the original Chebyshev polynomials is more apparent in the definition above using sines and cosines.

The square roots hint at how these polynomials first came up in applications. According to [2], Chebyshev polynomials of the third and fourth kind

have been called “airfoil polynomials”, since they are appropriate for approximating the single square root singularities that occur at the sharp end of an airfoil.

Dirichlet kernel

There’s an interesting connection between Chebyshev polynomials of the fourth kind and Fourier series.

The right hand side of the definition of Wn is known in Fourier analysis as Dn, the Dirichlet kernel of order n.

D_n(\theta) = \frac{\sin \left(n+\frac{1}{2}\right)\theta}{\sin \frac{1}{2}\theta}

The nth order Fourier series approximation of f, i.e. the sum of terms −n through n in the Fourier series for f is the convolution of f with Dn, times 2π.

(D_n * f)(\theta) = 2\pi \sum_{k=-n}^n \hat{f}(k) \exp(ik\theta)

Note that Dn(θ) is a function of θ, not of x. The equation Wn(cos θ) = Dn(θ) defines Wn(x) where x = cos θ. To put it another way, Dn(θ) is not a polynomial, but it can be expanded into a polynomial in cos θ.

Related posts

[1] Each function on the right hand side is an even function, which implies it’s at least plausible that each can be written as powers of cos θ. In fact you can apply multiple angle trig identities to work out the polynomials in cos θ.

[2] J.C. Mason and G.H. Elliott. Near-minimax complex approximation by four kinds of Chebyshev polynomial expansion. Journal of Computational and Applied Mathematics 46 (1993) 291–300

[3] This is not true of Chebyshev polynomials of the third and fourth kind. To see this note that V1(x) = 2x − 1, and W1(x) = 2x + 1, neither of which is an odd function.