Lebesgue constants

I alluded to Lebesgue constants in the previous post without giving them a name. There I said that the bound on order n interpolation error has the form

ch^{n+1} + \lambda \delta

where h is the spacing between interpolation points and δ is the error in the tabulated values. The constant c depends on the function f being interpolated, and to a lesser extent on n. The constant λ is independent of f but depends on n and on the relative spacing between the interpolation nodes. This post will look closer at λ.

Given a set of n + 1 nodes T

a = x_0 < x_1 < x_2 < \cdots < x_{n-1} < x_n = b

define

\ell_j(x) := \prod_{\begin{smallmatrix}i=0\\ j\neq i\end{smallmatrix}}^{n} \frac{x-x_i}{x_j-x_i}

Then the Lebesgue function is defined by

\lambda_n(x) = \sum_{j=0}^n |\ell_j(x)|

and the Lebesgue constant for the grid is the maximum value of the Lebesgue function

\Lambda_n(T)=\max_{x\in[a,b]} \lambda_n(x)

The values of Λ are difficult to compute, but there are nice asymptotic expressions for Λ when the grid is evenly spaced:

\Lambda_n \sim \frac{2^{n+1}}{n \log n}

When the grid points are at the roots of a Chebyshev polynomial then

\Lambda_n \approx \frac{2}{\pi} \log(n + 1) + 1

The previous post mentioned the cases n = 11 and n = 29 for evenly spaced grids. The corresponding values of Λ are approximately 155 and 10995642. So 11th order interpolation is amplifying the rounding error in the tabulated points by a factor of 155, which might be acceptable. But 29th order interpolation is amplifying the rounding error by a factor of over 10 million.

The corresponding values of Λ for Chebyshev-spaced nodes are 2.58 and 3.17. Chebyshev spacing is clearly better for high-order interpolation, when you have that option.

How much precision can you squeeze out of a table?

Richard Feynman said that almost everything becomes interesting if you look into it deeply enough. Looking up numbers in a table is certainly not interesting, but it becomes more interesting when you dig into how well you can fill in the gaps.

If you want to know the value of a tabulated function between values of x given in the table, you have to use interpolation. Linear interpolation is often adequate, but you could get more accurate results using higher-order interpolation.

Suppose you have a function f(x) tabulated at x = 3.00, 3.01, 3.02, …, 3.99, 4.00 and you want to approximate the value of the function at π. You could approximate f(π) using the values of f(3.14) and f(3.15) with linear interpolation, but you could also take advantage of more points in the table. For example, you could use cubic interpolation to calculate f(π) using f(3.13), f(3.14), f(3.15), and f(3.16). Or you could use 29th degree interpolation with the values of f at 3.00, 3.01, 3.02, …, 3.29.

The Lagrange interpolation theorem lets you compute an upper bound on your interpolation error. However, the theorem assumes the values at each of the tabulated points are exact. And for ordinary use, you can assume the tabulated values are exact. The biggest source of error is typically the size of the gap between tabulated x values, not the precision of the tabulated values. Tables were designed so this is true [1].

The bound on order n interpolation error has the form

c hn + 1 + λ δ

where h is the spacing between interpolation points and δ is the error in the tabulated values. The value of c depends on the derivatives of the function you’re interpolating [2]. The value of λ is at least 1 since λδ is the “interpolation” error at the tabulated points.

The accuracy of an interpolated value cannot be better than δ in general, and so you pick the value of n that makes c hn + 1 less than δ. Any higher value of n is not helpful. And in fact higher values of n are harmful since λ grows exponentially as a function of n [3].

See the next post for mathematical details regarding the λs.

Examples

Let’s look at a specific example. Here’s a piece of a table for natural logarithms from A&S.

Here h = 10−3, and so linear interpolation would give you an error on the order of h² = 10−6. You’re never going to get error less than 10−15 since that’s the error in the tabulated values, so 4th order interpolation gives you about as much precision as you’re going to get. Carefully bounding the error would require using the values of c and λ above that are specific to this context. In fact, the interpolation error is on the order of 10−8 using 5th order interpolation, and that’s the best you can do.

I’ll briefly mention a couple more examples from A&S. The book includes a table of sine values, tabulated to 23 decimal places, in increments of h = 0.001 radians. A rough estimate would suggest 7th order interpolation is as high as you should go, and in fact the book indicates that 7th order interpolation will give you 9 figures of accuracy,

Another table from A&S gives values of the Bessel function J0 in with 15 digit values in increments of h = 0.1. It says that 11th order interpolation will give you four decimal places of precision. In this case, fairly high-order interpolation is useful and even necessary. A large number of decimal places are needed in the tabulated values relative to the output precision because the spacing between points is so wide.

Related posts

[1] I say were because of course people rarely look up function values in tables anymore. Tables and interpolation are still widely used, just not directly by people; computers do the lookup and interpolation on their behalf.

[2] For functions like sine, the value of c doesn’t grow with n, and in fact decreases slowly as n increases. But for other functions, c can grow with n, which can cause problems like Runge phenomena.

[2] The constant λ grows exponentially with n for evenly spaced interpolation points, and values in a table are evenly spaced. The constant λ grows only logarithmically for Chebyshev spacing, but this isn’t practical for a general purpose table.

Mendeleev’s inequality

Dmitri Mendeleev is best known for creating the first periodic table of chemical elements. He also discovered an interesting mathematical theorem. Empirical research led him to a question about interpolation, which in turn led him to a theorem about polynomials and their derivatives.

I ran across Mendeleev’s theorem via a paper by Boas [1]. The opening paragraph describes what Mendeleev was working on.

Some years after the chemist Mendeleev invented the periodic table of the elements he made a study of the specific gravity of a solution as a function of the percentage of the dissolved substance. This function is of some practical importance: for example, it is used in testing beer and wine for alcoholic content, and in testing the cooling system of an automobile for concentration of anti-freeze; but present-day physical chemists do not seem to find it as interesting as Mendeleev did.

Mendeleev fit his data by patching together quadratic polynomials, i.e. he used quadratic splines. A question about the slopes of these splines lead to the following.

Theorem (Mendeleev): Let P(x) be a quadratic polynomial on [−1, 1] such that |P(x)| ≤ 1. Then |P′(x)| ≤ 4.

Mendeleev showed his result to mathematician Andrey Markov who generalized it to the following.

Theorem (Markov): If P(x) is a real polynomial of degree n, and |P(x)| ≤ 1 on [−1, 1] then |P′(x)| ≤ n² on [−1, 1].

Both inequalities are sharp with equality if and only if P(x) = ±Tn(x), the nth Chebyshev polynomial. In the special case of Mendeleev’s inequality, equality holds for

T2(x) = 2x² − 1.

Andrey Markov’s brother Vladimir proved an extension of Andrey’s theorem to higher derivatives,

Related posts

[1] R. P. Boas, Jr. Inequalities for the Derivatives of Polynomials. Mathematics Magazine, Vol. 42, No. 4 (Sep., 1969), pp. 165–174

Patching functions together

The previous post looked at a function formed by patching together the function

f(x) = log(1 + x)

for positive x and

f(x) = x

for negative x. The functions have a mediocre fit, which may be adequate for some applications, such as plotting data points, but not for others, such as visual design. Here’s a plot.

There’s something not quite smooth about the plot. It’s continuous, even differentiable, at 0, but still there’s sort of a kink at 0.

The way to quantify smoothness is to count how many times you can differentiate a function and have a continuous result. We would say the function above is C¹ but not C² because its first derivative is continuous but not differentiable. If two functions have the same number of derivatives, you can distinguish their smoothness by looking at the size of the derivatives. Or if you really want to get sophisticated, you can use Sobolev norms.

As a rule of thumb, your vision can detect a discontinuity in the second derivative, maybe the third derivative in some particularly sensitive contexts. For physical applications, you also want continuous second derivatives (acceleration). For example, you want a roller coaster track to be more than just continuous and once differentiable.

Natural cubic splines are often used in applications because they patch cubic polynomials together so that the result is C².

You can patch together smooth functions somewhat arbitrarily, but not analytic functions. If you wanted to extend the function log(1 + x) analytically, your only choice is log(1 + x). More on that here.

If you patch together a flat line and a circle, as in the corners of a rounded rectangle, the curve is continuous but not C². The second derivative jumps from 0 to 1/r where r is the radius of the circle. A squircle is similar to a rounded rectangle but has smoothly varying curvature.

Related posts

Interpolation instability

You would think that interpolating at more points would give you a more accurate approximation. There’s a famous example by Runge that proves this is not the case. If you interpolate the function 1/(1 + x²) over the interval [−5, 5], as you add more interpolation points the maximum interpolation error actually increases.

Here’s an example from a post I wrote a few years ago, using 16 interpolation points. This is the same function, rescaled to live on the interval [−1, 1].

There are two things that make Runge’s example work.

  1. The interpolation nodes are evenly spaced.
  2. Singularities in the complex plane too close to the domain.

If you use Chebyshev nodes rather than evenly spaced nodes, the problem goes away.

And if the function being interpolated is analytic in a certain region around the unit interval (described here) the problem also goes away, at least in theory. In practice, using floating point arithmetic, you can see rapid divergence even though in theory the interpolants converge [1]. That is point of this post.

Example

So let’s try interpolating exp(−9x²) on the interval [−1, 1]. This function is analytic everywhere, so the interpolating polynomials should converge as the number of interpolation points n goes to infinity. Here’s a plot of what happens when n = 50.

Looks like all is well. No need to worry. But let’s press our luck and try n = 100.

Hmm. Something is wrong.

In fact things are far worse than the plot suggests. The largest interpolant value is over 700,000,000. It just doesn’t show in the plot.

Interpolating at evenly spaced points is badly conditioned. There would be no problem if we could compute with infinite precision, but with finite precision interpolation errors can grow exponentially.

Personal anecdote

Years ago I learned that someone was interpolating exp(−x²) at a large number of evenly spaced points in an application. If memory serves, they wanted to integrate the interpolant in order to compute probabilities.

I warned them that this was a bad idea because of Runge phenomena.

I was wrong on theoretical grounds, because exp(−x²) is analytic. I didn’t know about Runge’s convergence theorem.

But I was right on numerical grounds, because I also didn’t know about the numerical instability demonstrated above.

So I made two errors that sorta canceled out.

Related posts

[1] Approximation Theory and Approximation Practice by Lloyd N. Trefethen

Discrete Taylor series

Newton’s interpolation formula looks awfully complicated until you introduce the right notation. With the right notation, it looks like a Taylor series. Not only is this notation simpler and more memorable, it also suggests extensions.

The notation we need comes in two parts. First, we need the forward difference operator Δ defined by

\Delta f(x) = f(x+1) - f(x)

and its extension Δk defined by applying Δ to a function k times.

The other piece of notation we need is falling powers, denoted with a little bar under the exponent:

 x^{\underline{k}} = x(x-1)(x-2) \cdots (x - k + 1)

The Δ operator is meant to resemble the differential operator D and falling powers are meant to resemble ordinary powers. With this notation, Newton’s interpolation formula based at a

f(x) = \sum_{k=0}^\infty \frac{\Delta^k f(a)}{k!} (x - a)^{\underline{k}}

looks analogous to Taylor’s formula for a power series based at a

f(x) = \sum_{k=0}^\infty \frac{D^k f(a)}{k!} (x - a)^k.

Newton’s formula applies to polynomials, and the infinite sum is actually a finite sum because Δk f(a) is 0 for all k beyond some point.

Newton’s formula is a discrete analog of Taylor’s formula because it only uses the values of f at discrete points, i.e. at the integers (shifted by a) and only involves finite operations: finite differences do not involve limits.

Convergence

As I mentioned on @AlgebraFact this morning,

It’s often useful to write a finite series as an infinite series with a finite number of non-zero coefficients.

This eliminates the need to explicitly track the number of terms, and may suggest what to do next.

Writing Newton’s formula as an infinite series keeps us from having to write down one version for linear interpolation, another version for quadratic interpolation, another for cubic interpolation, etc. (It’s a good exercise to write out these special cases when you’re new to the topic, but then remember the infinite series version going forward.)

As for suggesting what to do next, it’s natural to explore what happens if the infinite series really is infinite, i.e. if f is not a polynomial. Under what circumstances does the series converge? If it does converge to something, does it necessarily converge to f(x) at each x?

The example f(x) = sin(πx) shows that Newton’s theorem can’t always hold, because for this function, with a = 0, the series on right hand side of Newton’s theorem is identically zero because all the Δk terms are zero. But Carlson’s theorem [1] essentially says that for an entire function that grows slower than sin(πx) along the imaginary axis the series in Newton’s theorem converges to f.

Saying that a function is “entire” means that it is analytic in the entire complex plane. This means that Taylor’s series above has to converge everywhere in order for Newton’s series to converge [2].

Related posts

[1] Carlson with no e. Not to be confused with Carleson’s theorem on the convergence of Fourier series.

[2] Carlson’s original theorem requires f to be entire. Later refinements show that it’s sufficient for f to be analytic in the open right half plane and continuous on the closed right half plane.

Interpolation and the cotanc function

This weekend I wrote three posts related to interpolation:

The first post looks at reducing the size of mathematical tables by switching for linear to quadratic interpolation. The immediate application is obsolete, but the principles apply to contemporary problems.

The second post looks at alternatives to Lagrange interpolation that are much better suited to hand calculation. The final post is a tangent off the middle post.

Tau and sigma functions

In the process of writing the posts above, I looked at Chambers Six Figure Mathematical Tables from 1964. There I saw a couple curious functions I hadn’t run into before, functions the author called τ and σ.

τ(x) = x cot x
σ(x) = x csc x

So why introduce these two functions? The cotangent and cosecant functions have a singularity at 0, and so its difficult to tabulate and interpolate these functions. I touched on something similar in my recent post on interpolating the gamma function: because the function grows rapidly, linear interpolation gives bad results. Interpolating the log of the gamma function gives much better results.

Chambers tabulates τ(x) and σ(x) because these functions are easy to interpolate.

The cotanc function

I’ll refer to Chambers’ τ function as the cotanc function. This is a whimsical choice, not a name used anywhere else as far as I know. The reason for the name is as follows. The sinc function

sinc(x) = sin(x)/x

comes up frequently in signal processing, largely because it’s the Fourier transform of the indicator function of an interval. There are a few other functions that tack a c onto the end of a function to indicate it has been divided by x, such as the jinc function.

The function tan(x)/x is sometimes called the tanc function, though this name is far less common than sinc. The cotangent function is the reciprocal of the tangent function, so I’m calling the reciprocal of the tanc function the cotanc function. Maybe it should be called the cotank function just for fun. For category theorists this brings up images of a tank that fires backward.

Practicality of the cotanc function

As noted above, the cotangent function is ill-behaved near 0, but the cotanc function is very nicely behaved near 0. The cotanc function has singularities at non-zero multiples of π but multiplying by x removes the singularity at 0.

As noted here, interpolation error depends on the size of the derivatives of the function being interpolated. Since the cotanc function is flat and smooth, it has small derivatives and thus small interpolation error.

Binomial coefficients with non-integer arguments

When n and r are positive integers, with nr, there is an intuitive interpretation of the binomial coefficient C(n, r), namely the number of ways to select r things from a set of n things. For this reason C(n, r) is usually pronounced “n choose r.”

But what might something like C(4.3, 2)? The number of ways to choose two giraffes out of a set of 4.3 giraffes?! There is no combinatorial interpretation for binomial coefficients like these, though they regularly come up in applications.

It is possible to define binomial coefficients when n and r are real or even complex numbers. These more general binomial coefficients are in this liminal zone of topics that come up regularly, but not so regularly that they’re widely known. I wrote an article about this a decade ago, and I’ve had numerous occasions to link to it ever since.

The previous post implicitly includes an application of general binomial coefficients. The post alludes to coefficients that come up in Bessel’s interpolation formula but doesn’t explicitly say what they are. These coefficients Bk can be defined in terms of the Gaussian interpolation coefficients, which are in turn defined by binomial coefficients with non-integer arguments.

\begin{eqnarray*} G_{2n} &=& {p + n - 1 \choose 2n} \\ G_{2n+1} &=& {p + n \choose 2n + 1} \\ B_{2n} &=& \frac{1}{2}G_{2n} \\ B_{2n+1} &=& G_{2n+1} - \frac{1}{2} G_{2n} \end{eqnarray*}

Note that 0 < p < 1.

The coefficients in Everett’s interpolation formula can also be expressed simply in terms of the Gauss coefficients.

\begin{eqnarray*} E_{2n} &=& G_{2n} - G_{2n+1} \\ F_{2n} &=& G_{2n+1} \\ \end{eqnarray*}

Bessel, Everett, and Lagrange interpolation

I never heard of Bessel or Everett interpolation until long after college. I saw Lagrange interpolation several times. Why Lagrange and not Bessel or Everett?

First of all, Bessel interpolation and Everett interpolation are not different kinds of interpolation; they are different algorithms for carrying out the same interpolation as Lagrange. There is a unique polynomial of degree n fitting a function at n + 1 points, and all three methods evaluate this same polynomial.

The advantages of Bessel’s approach or Everett’s approach to interpolation are practical, not theoretical. In particular, these algorithms are practical when interpolating functions from tables, by hand. This was a lost art by the time I went to college. Presumably some of the older faculty had spent hours computing with tables, but they never said anything about it.

Bessel interpolation and Everett interpolation are minor variations on the same theme. This post will describe both at such a high level that there’s no difference between them. This post is high-level because that’s exactly what seems to be missing in the literature. You can easily find older books that go into great detail, but I believe you’ll have a harder time finding a survey presentation like this.

Suppose you have a function f(x) that you want to evaluate at some value p. Without loss of generality we can assume our function has been shifted and scaled so that we have tabulated f at integers our value p lies between 0 and 1.

Everett’s formula (and Bessel’s) cleanly separates the parts of the interpolation formula that depend on f and the parts that depend on p.

The values of the function f enter through the differences of the values of f at consecutive integers, and differences of these differences, etc. These differences are easy to calculate by hand, and sometimes were provided by the table publisher.

The functions of p are independent of the function being interpolated, so these functions, the coefficients in Bessel’s formula and Everett’s formula, could be tabulated as well.

If the function differences are tabulated, and the functions that depend on p are tabulated, you could apply polynomial interpolation without ever having to explicitly evaluate a polynomial. You’d just need to evaluate a sort of dot product, a sum of numbers that depend on f multiplied by numbers that depend on p.

Another advantage of Bessel’s and Everett’s interpolation formulas is that they can be seen as a sequence of refinements. First you obtain the linear interpolation result, then refine it to get the quadratic interpolation result, then add terms to get the cubic interpolation result, etc.

This has several advantages. First, you have the satisfaction of seeing progress along the way; Lagrange interpolation may give you nothing useful until you’re done. Related to this is a kind of error checking: you have a sense of where the calculations are going, and intermediate results that violate your expectations are likely to be errors. Finally, you can carry out the calculations for the smallest terms in with less precision. You can use fewer and fewer digits in your hand calculations as you compute successive refinements to your result.

Related posts

Compression and interpolation

Data compression is everywhere. We’re unaware of it when it is done well. We only become aware of it when it is pushed too far, such as when a photo looks grainy or fuzzy because it was compressed too much.

The basic idea of data compression is to not transmit the raw data but to transmit some of the data along with instructions for how to approximately reconstruct the rest [1].

Fifty years ago scientists were concerned with a different application of compression: reducing the size of mathematical tables. Books of tabulated functions are obsolete now, but the principles used in producing these tables are still very much relevant. We use compression and interpolation far more often now, though it’s almost always invisibly executed by software.

Compressing tables

In this post I want to expand on comments by Forman Acton from his book Numerical Methods That Work on compression.

Many persons are unaware of the considerable compression in a table that even the use of quadratic interpolation permits. A table of sin x covering the first quadrant, for example, requires 541 pages if it is to be linearly interpolable to eight decimal places. If quadratic interpolation is used, the same table takes only one page having entries at one-degree intervals with functions of the first and second differences being recorded together with the sine itself.

Acton goes on to mention the advantage of condensing shelf space by a factor of 500. We no longer care about saving shelf space, but we may care very much about saving memory in an embedded device.

Quadratic interpolation does allow more compression than linear interpolation, but not by a factor of 500. I admire Acton’s numerical methods book, but I’m afraid he got this one wrong.

Interpolation error bound

In order to test Acton’s claim we will need the following theorem on interpolation error [2].

Let f be a function so that f(n+1) is continuous on [a, b] and satisfies |f(n+1) (x)| ≤ M. Let p be the polynomial of degree ≤ n that interpolates f at n + 1 equally spaced nodes in [a, b], including the end points. Then on [a, b],

|f(x) - p(x)| \leq \frac{1}{4(n+1)} M \left(\frac{b-a}{n}\right)^{n+1}

Quadratic interpolation error

Acton claims that quadratic interpolation at intervals of one degree is adequate to produce eight decimal places of accuracy. Quadratic interpolation means n = 2.

We have our function tabulated at evenly spaced points a distance h = π/180 radians apart. Quadratic interpolation requires function values at three points, so ba = 2h = π/90. The third derivative of sine is negative cosine, so M = 1.

This gives an error bound of 4.43 × 10−7, so this would give slightly better than six decimal place accuracy, not eight.

Linear interpolation error

Suppose we wanted to create a table of sine values so that linear interpolation would give results accurate to eight decimal places.
In the interpolation error formula we have M = 1 as before, and now n = 1. We would need to tabulate sine at enough points that h = b − a is small enough that the error is less than 5 × 10−9. It follows that h = 0.0002 radians. Covering a range of π/2 radians in increments of 0.0002 radians would require 7854 function values. Acton implicitly assumes 90 values to a page, so this would take about 87 pages.

Abramowitz and Stegun devotes 32 pages to tabulating sine and cosine at increments of 0.001 radian. This does not always guarantee eight decimal place accuracy using linear interpolation, but it does guarantee at least seven places (more on that here), which is better than a table at one degree increments would deliver using quadratic interpolation. So it would have been more accurate for Acton to say quadratic interpolation reduces the number of pages by a factor of 30 rather than 500.

Cubic interpolation error

If we have a table of sine values at one degree increments, how much accuracy could we get using cubic interpolation? In that case we’d apply the interpolation error theorem with n = 3 and ba = 3(π/180) = π/60. Then the error bound is 5.8 × 10−9. This would usually give you eight decimal place accuracy, so perhaps Acton carried out the calculation for cubic interpolation rather than quadratic interpolation.

Related posts

[1] This is what’s known as lossy compression; some information is lost in the compression process. Lossless compression also replaces the original data with a description that can be used to reproduce the data, but in this case the reconstruction process is perfect.

[2] Ward Cheney and David Kincaid. Numerical Methods and Computation. Third edition.