Lambert W strikes again

I was studying a statistics paper the other day in which the author said to solve

t log( 1 + n/t ) = k

for t as part of an algorithm. Assume 0 < k < n.

Is this well posed?

First of all, can this equation be solved for t? Second, if there is a solution, is it unique?

It’s not hard to show that as a function of t, the left side approaches 0 as t approaches 0, and it approaches n as t goes to infinity. So there is a solution for all k between 0 and n. The restriction on k is necessary since the left side cannot exceed n.

With a little more work one can show that the derivative is always positive, so the left side is a monotonically increasing function, and so the solution given each value of k is unique.

Analytic solution

Now if we fix n, we can think of the equation above as defining t as a function of k. Can we solve for t exactly? I suspected that there might be a solution in terms of the Lambert W function because if you divide by t and exponentiate, you get an equation that smells like the equation

z = w exp(w)

defining the function W(z). It turns out this is indeed the case.

If we ask Mathematica

    Solve[t Log[1 + n/t] ==  k, t]

we get

t\to -\frac{k}{ W\left(-\cfrac{k \exp(-k/n)}{n}\right)+\cfrac{k}{n}}

Great! There’s a closed-form solution, if you accept using the W function as being closed form.

Problems with the solution

I found the solution in Mathematica, but I tested it in Python to make sure both platforms define W the same way.

    from numpy import log, exp
    from scipy.special import lambertw

    def f(t, n):
        return t*log(1 + n/t)

    def g(k, n):
        r = k/n
        return -k/(lambertw(-r*exp(-r)) + r)

    n, k = 10, 8
    t = g(k, n)
    print(f(t, n))

This should print k, so it prints 8, right? No, it prints 10.

What’s up with that?

If we look back at the equation for t above, we see that the W function is being evaluated at x exp(x) where x = –k/n, so we should get –k/n back since W(x exp(x)) = x by definition. But that means our denominator is zero, and so the equation doesn’t make sense!

Things are getting worse. At first we had a wrong value, but at least it was finite!

The problem is not a difference between Mathematica and Python.


The problem is we’ve glossed over a branch cut of the W function. To make a long story short, we were using the principle branch of the W function, but we should have used a different branch.

Let’s go back to where I asked Mathematica

    Solve[t Log[1 + n/t] == k, t]

I ran the solution through TeXForm to get the TeX code that produced the image for the solution equation. I made a few aesthetic changes to the TeX, but it was essentially Mathematica’s output.

Without the TeXForm, Mathematica’s solution was in terms of ProductLog, not in terms of W; the TeXForm function turned ProductLog into W. If you look up ProductLog, it will tell you

ProductLog[z] gives the principal solution for w in z = wew.

The principle solution. So we should be on the alert for difficulties with branches. There are two real solutions to z = wew for some values of z, and we have to choose the right one. For example, if z = -0.1, the w could be -0.1118 or -3.5772.

Mathematica gave me the wrong branch. But to be fair, it did try to warn me.

Solve::ifun: Inverse functions are being used by Solve, so some solutions may not be found; use Reduce for complete solution information.

The solution is to use the -1 branch. In Mathematica’s notation, the branch comes before the argument. In SciPy, the branch comes second. So to fix the code above, we need to change



   lambertw(-r*exp(-r), -1)

and then the code will be correct.

If x is negative, and we use the -1 branch of W, then

W-1(x exp(x)) ≠ x

and so we’re not dividing by zero in our solution.

More posts with Lambert W

Simple approximation for surface area of an ellipsoid

After writing the previous post, I wondered where else you might be able to use r-means to create accurate approximations. I thought maybe this would apply to the surface area of an ellipsoid, and a little searching around showed that Knud Thomsen thought of this in 2004.

The general equation for the surface of an ellipsoid is

frac{x^2}{a^2} + frac{y^2}{b^2} + frac{z^2}{c^2} = 1

If two of the denominators {a, b, c} are equal then there are formulas for the area in terms of elementary functions. But in general, the area requires special functions known as “incomplete elliptic integrals.” For more details, see this post.

Here I want to focus on Knud Thomsen’s approximation for the surface area and its connection to the previous post.

The previous post reported that the perimeter of an ellipse is 2πr where r is the effective radius. An ellipse doesn’t have a radius unless it’s a circle, but we can define something that acts like a radius by defining it to be the perimeter over 2π. It turns out that

r \approx \mathfrak{M}_{3/2}(x)

makes a very good approximation for the effective radius where x = (a, b).


\mathfrak{M}_r(x) = \left( \frac{1}{n} \sum_{i=1}^n x_i^r \right)^{1/r}

using the notation of Hardy, Littlewood, and Pólya.

This suggests we define an effective radius for an ellipsoid and look for an approximation to the effective radius in terms of an r-mean. The area of a sphere is 4πr², so we define the effective radius of an ellipsoid as the square root of its surface area divided by 4π. So the effective radius of a sphere is its radius.

Thomsen found that

r^2 \approx \mathfrak{M}_p(x)

where x = (ab, bc, ac) and p = 1.6. More explicitly,

\text{Area} \approx 4\pi \left(\frac{(ab)^{1.6} + (bc)^{1.6} + (ac)^{1.6}}{3} \right )^{1/1.6}

Note that it’s the square of the effective radius that is an r-mean, and that the argument to the mean is not simply (a, b, c). I would have naively tried looking for a value of p so that the r-mean of (a, b, c) gives a good approximation to the effective radius. In hindsight, it makes sense in terms of dimensional analysis that the inputs to the mean have units of area, and so the output has units of area.

The maximum relative error in Thomsen’s approximation is 1.178% with p = 1.6. You can tweak the value of p to reduce the worst-case error, but the value of 1.6 is optimal for approximately spherical ellipsoids.

For an example, let’s compute the surface area of Saturn, the planet with the largest equatorial bulge in our solar system. Saturn is an oblate spheroid with equatorial diameter about 11% greater than its polar diameter.

Assuming Saturn is a perfect ellipsoid, Thomsen’s approximation over-estimates its surface area by about 4 parts per million. By comparison, approximating Saturn as a sphere under-estimates its area by 2 parts per thousand. The code for these calculations, based on the code here, is given at the bottom of the post.

Related posts

Appendix: Python code

from numpy import pi, sin, cos, arccos
from scipy.special import ellipkinc, ellipeinc

# Saturn in km
equatorial_radius = 60268
polar_radius = 54364

a = b = equatorial_radius
c = polar_radius

phi = arccos(c/a)
m = 1

temp = ellipeinc(phi, m)*sin(phi)**2 + ellipkinc(phi, m)*cos(phi)**2
ellipsoid_area = 2*pi*(c**2 + a*b*temp/sin(phi))

def rmean(r, x):
    n = len(x)
    return (sum(t**r for t in x)/n)**(1/r)

approx_ellipsoid_area = 4*pi*rmean(1.6, (a*b, b*c, a*c))

r = (a*b*c)**(1/3)
sphere_area = 4*pi*r**2

def rel_error(exact, approx):
    return (exact-approx)/approx

print( rel_error(ellipsoid_area, sphere_area) )
print( rel_error(ellipsoid_area, approx_ellipsoid_area) )

Simple approximation for perimeter of an ellipse

The perimeter of an ellipse cannot be computed in closed form. That is, no finite combination of elementary functions will give you the exact value. But we will present a simple approximation that is remarkably accurate.

So this post has two parts: exact calculation, and simple approximation.

Exact perimeter

The perimeter can be computed exactly in terms of an elliptic integral. The name’s not a coincidence: elliptic integrals are so named because they were motivated by trying to find the perimeter of an ellipse.

Let a be the length of the major semi-axis and b the length of the minor semi-major axis. So a > b, and if our ellipse is actually a circle, a = b = r.

If ε is the eccentricity of our ellipse,

ε² = 1 – b² / a²

and the perimeter is give by

p = 4a E(ε²)

where E is the “complete elliptic integral of the second kind.” This function has popped up several times on this blog, most recently in the post about why serpentine walls save bricks.

You could calculate the perimeter of an ellipse in Python as follows.

    from scipy.special import ellipe

    def perimeter(a, b):
        assert(a > b > 0)
        return 4*a*ellipe(1 - (b/a)**2)

Simple approximation

But what if you don’t have a scientific library like SciPy at hand? Is there a simple approximation to the perimeter?

In fact there is. In [1] the authors present the approximation

p \approx 2 \pi \left(\frac{a^{3/2} + b^{3/2}}{2} \right )^{2/3}

If we define the effective radius as r = p/2π, then the approximation above says

r \approx \mathfrak{M}_{3/2}(x)

in the notation of Hardy, Littlewood, and Pólya, where x = (a, b).

We could code this up in Python as follows.

    def rmean(r, x):
        n = len(x)
        return (sum(t**r for t in x)/n)**(1/r)

    def approx(a, b):
        return 2*pi*rmean(1.5, (a, b))

It would have been less code to write up approx directly without defining rmean, but we’ll use rmean again in the next post.

For eccentricity less than 0.05, the error is less than 10-15, i.e. the approximation is correct to all the precision in a double precision floating point number. Even for fairly large eccentricity, the approximation is remarkably good.

The eccentricity of Pluto’s orbit is approximately 0.25. So the approximation above could compute the perimeter of Pluto’s orbit to nine significant figures, if the orbit were exactly an ellipse. A bigger source of error would be the failure of the orbit to be perfectly elliptical.

If an ellipse has major axes several times longer than its minor axis, the approximation here is less accurate, but still perhaps good enough, depending on the use. There is an approximation due to Ramanujan that works well even for much more eccentric ellipses, though it’s a little more complicated.

To my mind, the most interesting thing here is the connection to r-norms. I’ll explore that more in the next post, applying r-norms to the surface area of an ellipsoid.

[1] Carl E. Linderholm and Arthur C. Segal. An Overlooked Series for the Elliptic Perimeter. Mathematics Magazine, June 1995, pp. 216-220

Bessel determinants

The Bessel functions J and Y are analogous to sine and cosine. Bessel functions come up in polar coordinates the way sines and cosines come up in rectangular coordinates. There are J and Y functions of various orders, conventionally written with a subscript ν.

Plotting Bessel functions J_3 and Y_3

I recently ran across a curious set of relations between these functions and their derivatives. Let’s start with the following terms defined by 2 by 2 determinants.

\begin{align*} p_\nu &= \begin{vmatrix} J_\nu(a) & J_\nu(b) \\ Y_\nu(a) & Y_\nu(b) \\ \end{vmatrix} \\ q_\nu &= \begin{vmatrix} J_\nu(a) & J'_\nu(b) \\ Y_\nu(a) & Y'_\nu(b) \\ \end{vmatrix} \\ r_\nu &= \begin{vmatrix} J'_\nu(a) & J_\nu(b) \\ Y'_\nu(a) & Y_\nu(b) \\ \end{vmatrix} \\ s_\nu &= \begin{vmatrix} J'_\nu(a) & J'_\nu(b) \\ Y'_\nu(a) & Y'_\nu(b) \\ \end{vmatrix} \end{align*}

There are a lot of symmetries in the definitions above. First, every term has a subscript ν, so you can ignore those for now.

Second, every determinant has J‘s on the top row and Y‘s on the bottom.

 \begin{vmatrix} J & J \\ Y & Y \\ \end{vmatrix}

Third, every determinant has a‘s in the first column and b‘s in the second.

 \begin{vmatrix} a & b \\ a & b \\ \end{vmatrix}

And finally, the primes, indicating derivatives, have the following pattern.

 \begin{align*} p &= \begin{vmatrix} \phantom{\cdot} & \phantom{\cdot} \\ \phantom{\cdot} & \phantom{\cdot} \\ \end{vmatrix} \\ q &= \begin{vmatrix} \phantom{\cdot} & \phantom{\cdot}' \\ \phantom{\cdot} & \phantom{\cdot}' \\ \end{vmatrix} \\ r &= \begin{vmatrix} \phantom{\cdot}' & \phantom{\cdot} \\ \phantom{\cdot}' & \phantom{\cdot} \\ \end{vmatrix} \\ s &= \begin{vmatrix} \phantom{\cdot}' & \phantom{\cdot}' \\ \phantom{\cdot}' & \phantom{\cdot}' \\ \end{vmatrix} \end{align*}

Now that we have all these definitions in place, there are several identities that the p’s, q’s, r’s, and s‘s satisfy. Some of them depend on the orders, and that’s why we included the ν subscripts. You can find various relations in A&S equation 9.1.32, but I wanted to point out one that’s particularly elegant, A&S equation 9.1.34:

\begin{vmatrix} p_\nu & q_\nu \\ r_\nu & s_\nu \end{vmatrix} = \frac{4}{\pi^2ab}

This equation, a determinant of terms defined by determinants, reduces to a simple form that depends only on where the Bessel functions are evaluated, i.e. a and b, but not on their order ν.

Python example

The Python code below will show how to call Bessel functions and their derivatives and will illustrate the equations above.

    from math import pi
    from scipy.special import jv, yv, jvp, yvp
    def example(n, a, b):
        ja = jv(n, a)
        jb = jv(n, b)
        ya = yv(n, a)
        yb = yv(n, b)
        jap = jvp(n, a)
        jbp = jvp(n, b)
        yap = yvp(n, a)
        ybp = yvp(n, b)
        p = ja  * yb  - jb  * ya
        q = ja  * ybp - jbp * ya
        r = jap * yb  - jb  * yap
        s = jap * ybp - jbp * yap
        print(p*s - q*r)
    example(3, 16, 21)

This prints


The two results differ slightly in the last decimal place due to rounding error.

The order of a Bessel function can be any real number [1], and the arguments can be any complex number. Here’s another example with more general arguments.

    example(3.14, 16-3j, 21+1.2j)    

This prints

    (0.0011738907214344785 + 0.00015140286689885318j)
    (0.0011738907214345800 + 0.00015140286689880630j)

with the real and imaginary parts agreeing to 15 decimal places.

Related posts

[1] You can define Bessel functions with complex order, but SciPy doesn’t support that.


The factorial of a positive integer n is the product of the numbers from 1 up to and including n:

n! = 1 × 2 × 3 × … × n.

The superfactorial of n is the product of the factorials of the numbers from 1 up to and including n:

S(n) = 1! × 2! × 3! × … × n!.

For example,

S(5) = 1! 2! 3! 4! 5! = 1 × 2 × 6 × 24 × 120 = 34560.

Here are three examples of where superfactorial pops up.

Vandermonde determinant

If V is the n by n matrix whose ij entry is ij-1 then its determinant is S(n-1). For instance,

\begin{vmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \\ 1 & 3 & 9 & 27 \\ 1 & 4 & 16& 64 \end{vmatrix} = S(3) = 3!\, 2!\, 1! = 12

V is an example of a Vandermonde matrix.

Permutation tensor

One way to define the permutation symbol uses superfactorial:

\epsilon_{i_1 i_2 \cdots i_n} = \frac{1}{S(n-1)} \prod_{1 \leq k < j \leq n} (i_j - i_k)

Barnes G-function

The Barnes G-function extends superfactorial to the complex plane analogously to how the gamma function extends factorial. For positive integers n,

G(n) = \prod_{k=1}^{n-1}\Gamma(k) = S(n-2)

Here’s plot of G(x)

produced by

    Plot[BarnesG[x], {x, -2, 4}]

in Mathematica.

More posts related to factorial

Negative space graph

Here is a plot of the first 30 Chebyshev polynomials. Notice the interesting patterns in the white space.

Plot of first 30 Chebyshev polynomials

Forman Acton famously described Chebyshev polynomials as “cosine curves with a somewhat disturbed horizontal scale.” However, plotting  cosines with frequencies 1 to 30 gives you pretty much a solid square. Something about the way Chebyshev polynomials disturb the horizontal scale creates the interesting pattern in negative space. (The distortion is different for each polynomial; otherwise the cosine picture would be a rescaling of the Chebyshev picture.)

I found the example above in a book that referenced a book by Theodore Rivlin. There’s a new edition of Rivlin’s book coming out in August, so maybe it will say something about the gaps.

Update: Here are analogous graphs for Legendre polynomials, plotting the even and odd ordered polynomials separately. They also have conspicuous holes, but they don’t fill the unit square the way Chebyshev polynomials do.

Even order Legendre polynomials

Odd order Legendre polynomials

More on Chebyshev polynomials

To integrate the impossible integral

Don Quixote

In the Broadway musical Man of La Mancha, Don Quixote sings

To dream the impossible dream
To fight the unbeatable foe
To bear with unbearable sorrow
To run where the brave dare not go

Yesterday my daughter asked me to integrate the impossible integral, and this post has a few thoughts on the quixotic quest to run where the brave calculus student dare not go.

The problem apparently required computing the indefinite integral of the square root of sine:

\int \sqrt{\sin x} \,dx

I say apparently for reasons that will soon be clear.

My first thought was that some sort of clever u-substitution would work. This was coming from a homework problem, so I was in the mindset of expecting every problem to be doable. If I ran across this integral while I had my consultant hat on rather than my father/tutor hat, I would have thought “It probably doesn’t have an elementary closed form because most integrals don’t.”

It turns out I should have had my professional hat on, because the integral does indeed not have an elementary closed form. There is a well-known function which is an antiderivative of √sin, but it’s not an elementary function. (“Well-known” relative to professional mathematicians, not relative to calculus students.)

You can evaluate the integral as

\int \sqrt{\sin x} \,dx = -2 E\left(\frac{\pi - 2x}{4}\, \middle|\, 2 \right) + C

where E is Legendre’s “elliptic integral of the second kind.”

I assume it wasn’t actually necessary to compute the antiderivative, though I didn’t see the full problem. In the artificial world of the calculus classroom, everything works out nicely. And this is a shame.

For one thing, it creates a false impression. Most integrals are “impossible” in the sense of not having an elementary closed form. Students who go on to use calculus for anything more than artificial homework problems may incorrectly assume they’ve done something wrong when they encounter an integral in the wild.

For another, it’s a missed opportunity. Or maybe two or three missed opportunities. These may or may not be appropriate to bring up, depending on how receptive the audience is. (But we said at the beginning we would “run where the brave calculus student dare not go.”)

It’s a missed opportunity to emphasize what integrals really are, to separate meaning from calculation technique. Is the integral of √sin impossible? Not in the sense that it doesn’t exist. Yes, there are functions whose derivative is √sin, at least if we limit ourselves to a range where sine is not negative. No, the integral does not exist in the sense of a finite combination of functions a calculus student would have seen.

It’s also a missed opportunity to show that we can define new functions as the solution to problems we can’t solve otherwise. The elliptic integrals mentioned above are functions defined in terms of integrals that cannot be computed in more elementary terms. By giving the function a name, we can compare where it comes up in other contexts. For example, I wrote a few months ago about the problem of finding the length of a serpentine wall. This also pops out of a calculus problem that doesn’t have an elementary solution, and in fact it also has a solution in terms of elliptic integrals.

Finally, it’s a missed opportunity to give a glimpse of the wider mathematical landscape. If not everything elementary function has an elementary antiderivative, do they at least have antiderivatives in terms of special functions such as elliptic integrals? No, but that’s a good question.

Any continuous function has an antiderivative, and that antiderivative might be an elementary function, or it might be a combination of elementary functions and familiar special functions. Or it might be an obscure special function, something not exactly common, but something that has been named and studied before. Or it might be a perfectly valid function that hasn’t been given a name yet. Maybe it’s too specialized to deserve a name, or maybe you’ve discovered something that comes up repeatedly and deserves to be cataloged.

This touches more broadly on the subject of what functions exist versus what functions have been named. Students implicitly assume these two categories are the same. Here’s an example of the same phenomenon in probability. It also touches on the gray area between what has been named and what hasn’t, and how you decide whether something deserves to be named.

Update: The next post gives a simple approximation to the integral in this post.

Chebyshev’s other polynomials

There are two sequences of polynomials named after Chebyshev, and the first is so much more common that when authors say “Chebyshev polynomial” with no further qualification, they mean Chebyshev polynomials of the first kind. These are denoted with Tn, so they get Chebyshev’s initial [1]. The Chebyshev polynomials of the second kind are denoted Un.

Chebyshev polynomials of the first kind are closely related to cosines. It would be nice to say that Chebyshev polynomials of the second kind are to sines what Chebyshev polynomials of the first kind are to cosines. That would be tidy, but it’s not true. There are relationships between the two kinds of Chebyshev polynomials, but they’re not that symmetric.

It is true that Chebyshev polynomials of the second kind satisfy a relation somewhat analogous to the relation

\cos n\theta = T_n(\cos\theta)

for his polynomials of the first kind, and it involves sines:

\frac{\sin n\theta}{\sin \theta} = U_{n-1}(\cos\theta)

We can prove this with the equation we’ve been discussing in several posts lately, so there is yet more juice to squeeze from this lemon.

Once again we start with the equation

\cos n\theta + i\sin n\theta = \sum_{j=0}^n {n\choose j} i^j(\cos\theta)^{n-j} (\sin\theta)^j

and take the complex part of both sides. The odd terms of the sum contribute to the imaginary part, so we can assume j = 2k + 1. We make the replacement

(\sin\theta)^{2k+1} = (1 - \cos^2\theta)^k \sin\theta

and so we’re left with a polynomial in cos θ, except for an extra factor of sin θ in every term.

This shows that sin nθ / sin θ, is a polynomial in cos θ, and in fact a polynomial of degree n-1. Given the theorem that

\frac{\sin n\theta}{\sin \theta} = U_{n-1}(\cos\theta)

it follows that the polynomial in question must be Un-1.

More special function posts

[1]Chebyshev has been transliterated from the Russian as Chebysheff, Chebychov, Chebyshov, Tchebychev, Tchebycheff, Tschebyschev, Tschebyschef, Tschebyscheff, Chebychev, etc. It is conventional now to use “Chebyshev” as the name, at least in English, and to use “T” for the polynomials.

More juice in the lemon

There’s more juice left in the lemon we’ve been squeezing lately.

A few days ago I first brought up the equation

\cos n\theta + i\sin n\theta = \sum_{j=0}^n {n\choose j} i^j(\cos\theta)^{n-j} (\sin\theta)^j

which holds because both sides equal exp(inθ).

Then a couple days ago I concluded a blog post by noting that by taking the real part of this equation and replacing sin²θ with 1 – cos²θ one could express cos nθ as a polynomial in cos θ,

\cos n\theta = \sum {n\choose 2k} (-1)^k (\cos\theta)^{n-2k} (1 - \cos^2\theta)^k

and in fact this polynomial is the nth Chebyshev polynomial Tn since these polynomials satisfy

\cos n\theta = T_n(\cos\theta)

Now in this post I’d like to prove a relationship between Chebyshev polynomials and sines starting with the same raw material. The relationship between Chebyshev polynomials and cosines is well known, even a matter of definition depending on where you start, but the connection to sines is less well known.

Let’s go back to the equation at the top of the post, replace n with 2n + 1, and take the imaginary part of both sides. The odd terms of the sum contribute to the imaginary part, so we sum over 2ℓ+ 1.

\begin{align*} \sin((2n+1)\theta) &= \sum_{\ell} (-1)^\ell {2n+1 \choose 2\ell + 1} (\cos\theta)^{2n-2\ell} (\sin\theta)^{2\ell + 1} \\ &= \sum_k (-1)^{n-k}{2n + 1 \choose 2k} (\sin \theta)^{2n + 1 -2k} (\cos\theta)^{2k} \\ &= (-1)^n \sum_k (-1)^k{2n + 1 \choose 2k} (\sin \theta)^{2n + 1 -2k} (1 -\sin^2\theta)^{k} \end{align*}

Here we did a change of variables k = n – ℓ.

The final expression is the expression we began with, only evaluated at sin θ instead of cos θ. That is,

 \sin((2n+1)\theta) = (-1)^n T_{2n+1}(\sin\theta)

So for all n we have

\cos n\theta = T_n(\cos\theta)

and for odd n we also have

 \sin n\theta = \pm \,\,T_n(\sin\theta)

The sign is positive when n is congruent to 1 mod 4 and negative when n is congruent to 3 mod 4.

More Chebyshev posts

Product of Chebyshev polynomials

Chebyshev polynomials satisfy a lot of identities, much like trig functions do. This point will look briefly at just one such identity.

Chebyshev polynomials Tn are defined for n = 0 and 1 by

T0(x) = 1
T1(x) = x

and for larger n using the recurrence relation

Tn+1(x) = 2xTn(x) – Tn-1(x)

This implies

T2(x) = 2xT1(x) – T0(x) = 2x2 – 1
T3(x) = 2xT2(x) – T1(x) = 4x3 – 3x
T4(x) = 2xT3(x) – T2(x) = 8x4 – 8x2 + 1

and so forth.

Now for the identity for this post. If mn, then

2 Tm Tn  = Tm+n  + Tmn.

In other words, the product of the mth and nth Chebyshev polynomials is the average of the (m + n)th and (mn)th Chebyshev polynomials. For example,

2 T3(x) T1(x) = 2 (4x3 – 3x) x = T4(x) + T2(x)

The identity above is not at all apparent from the recursive definition of Chebyshev polynomials, but it follows quickly from the fact that

Tn(cos θ) = cos nθ.

Proof: Let θ = arccos x. Then

2 Tm(x) Tn(x)
= 2 Tm(cos θ) Tn(cos θ)
= 2 cos mθ cos nθ
= cos (m+n)θ + cos (mn
= Tm+n(cos θ)  + Tmn(cos θ)
= Tm+n(x)  + Tmn(x)

You might object that this only shows that the first and last line are equal for values of x that are cosines of some angle, i.e. values of x in [-1, 1]. But if two polynomials agree on an interval, they agree everywhere. In fact, you don’t need an entire interval. For polynomials of degree m+n, as above, it is enough that they agree on m + n + 1 points. (Along those lines, see Binomial coefficient trick.)

The close association between Chebyshev polynomials and cosines means you can often prove Chebyshev identities via trig identities as we did above.

Along those lines, we could have taken

Tn(cos θ) = cos nθ

as the definition of Chebyshev polynomials and then proved the recurrence relation above as a theorem, using trig identities in the proof.

Forman Acton suggested in this book Numerical Methods that Work that you should think of Chebyshev polynomials as “cosine curves with a somewhat disturbed horizontal scale.”