Conceptual vs Numerical

One of the things that makes numerical computation interesting is that it often reverses the usual conceptual order of things, using advanced math to compute things that are introduced earlier.

Here’s an example I recently ran across [1]. The hyperbolic functions are defined in terms of the exponential function:

But it may be more efficient to compute the exponential function in terms of hyperbolic functions. Specifically,

Why would you want to compute the simple thing on the left in terms of the more complicated thing on the right? Because hyperbolic sine is an odd function. This means that half its power series coefficients are zero: an odd function only has odd powers in its power series.

Suppose you need to compute the exponential function in an environment where you only have a limited number of registers to store constants. You can get more accuracy out of the same number of registers by using them to store coefficients in the power series for hyperbolic sine than for exp.

If we store n coefficients for sinh, we can include powers of x up to 2n – 1. And since the coefficient of x2n is zero, the error in our Taylor approximation is O(x2n+1). See this post for more on this trick.

If we stored n coefficients for exp, we could include powers of x up to n-1, and our error would be O(xn).

To make things concrete, suppose n = 3 and x = 0.01. The error in the exp function would be on the order of 10-6, but the error in the sinh function would be on the order of 10-14. That is, we could almost compute exp to single precision and sinh to almost double precision.

(I’m glossing over a detail here. Would you really need to store, for example, the 1 coefficient in front of x in either series? For simplicity in the argument above I’ve implicitly said yes. Whether you’d actually need to store it depends on the details of your implementation.)

The error estimate above uses big-oh arguments. Let’s do an actual calculation to see what we get.

    from math import exp, sinh, sqrt

def exp_taylor(x):
return 1 + x + 0.5*x**2

def sinh_taylor(x):
return x + x**3/6 + x**5/120

def exp_via_sinh(x):
s = sinh_taylor(x)
return s + sqrt(s*s + 1)

def print_error(approx, exact):
print(f"Computed: {approx}")
print(f"Exact:    {exact}")
print(f"Error:    {approx - exact}")

x = 0.01
approx1 = exp_taylor(x)
approx2 = exp_via_sinh(x)
exact   = exp(x)

print("Computing exp directly:\n")
print_error(approx1, exact)

print()

print("Computing exp via sinh:\n")
print_error(approx2, exact)


This produces

    Computing exp directly:

Computed: 1.0100500000000001
Exact:    1.010050167084168
Error:    -1.6708416783473012e-07

Computing exp via sinh:

Computed: 1.0100501670841682
Exact:    1.010050167084168
Error:    2.220446049250313e-16


Our errors are roughly what we expected from our big-oh arguments.

More numerical analysis posts

[1] I saw this identity in Matters Computational: Ideas, Algorithms, Source Code by Jörg Arndt.

Banned math book

Courant & Hilbert is a classic applied math textbook, still in print nearly a century after the first edition came out. The actual title of the book is Methods of Mathematical Physics, but everyone calls it Courant & Hilbert after the authors, Richard Courant and David Hilbert. I was surprised to find out recently that this was once a banned book. How could there be anything controversial about a math book? It doesn’t get into any controversial applications of math; it applies math to physics problems, but doesn’t apply the physics to anything in particular.

The book was first published in Germany in 1924 under the title Methoden der mathematischen Physik. Courant says in the preface

… I had been forced to leave Germany and was fortunate and grateful to be given the opportunities open in the United States. During the Second World War the German book became unavailable and was even suppressed by the National Socialist rulers of Germany. The survival of the book was secured when the United States Government seized the copyright and licensed a reprint issued by Interscience Publishers.

Courant’s language is remarkably restrained under the circumstances.

I wondered why the book was banned. Was Courant Jewish? I’d never considered this before, because I couldn’t care less about the ethnicity of authors. Jew or Greek, bond or free, male or female, I just care about their content. The Nazis, however, did care. According to his Wikipedia biography, Courant fled Germany not because of his Jewish ancestry but because of his affiliation with the wrong political party.

***

I never had Courant & Hilbert as a textbook, but I was familiar with it as a student. I vaguely remember that the library copy was in high demand and that I considered buying a copy, though it was too expensive for my means at the time. I recently bought a copy now that the book is cheaper and my means have improved.

I covered most of the material in Courant & Hilbert in graduate school, albeit in a more abstract form. As I mentioned the other day, my education was somewhat top-down; I learned about things first in an abstract setting and got down to particulars later, moving from soft analysis to hard analysis.

One quick anecdote along these lines. I read somewhere that David Hilbert was at a conference where someone referred to a Hilbert space and he asked the speaker what such a thing was. Hilbert’s work had motivated the definition of a Hilbert space, but Mr. Hilbert thought in more concrete terms.

Sinc approximation

If a function is smooth and has thin tails, it can be well approximated by sinc functions. These approximations are frequently used in applications, such as signal processing and numerical integration. This post will illustrate sinc approximation with the function exp(-x²).

The sinc approximation for a function f(x) is given by

where sinc(x) = sin(πx)/πx.

Do you get more accuracy from sampling more densely or by sampling over a wider range? You need to do both. As the number of sample points n increases, you want h to decrease something like 1/√n and the range to increase something like √n.

According to [1], the best trade-off between smaller h and larger n depends on the norm you use to measure approximation error. If you’re measuring error with the Lebesgue p-norm [2], choose h to be

where q is the conjugate exponent to p, i.e.

When p = 2, q is also 2. For the sup norm, p = ∞ and q = 1.

So the range between the smallest and largest sample will be

Here are a few plots to show how quickly the sinc approximations converge for exp(-x²). We’ll look at n = 1 (i.e. three sample points) , n = 2 (five sample points) and n = 4 (nine sample points). For n > 1, the sinc approximations are so good that the plots are hard to distinguish. So we’ll show the plot of exp(-x²) and its approximation for n = 1 then show the error curves.

And now the error plots.

Note that the vertical scales are different in each subplot. The error for n = 3 is two orders of magnitude smaller than the error for n = 1.

Related posts

[1] Masaaki Sugihara. Near Optimality of the Sinc Approximation. Mathematics of Computation, Vol. 72, No. 242 (Apr., 2003), pp. 767-786.

[2] The reference above doesn’t use the p-norms on the real line but on a strip in the complex plane containing the real line, and with a weight function that penalizes errors exponentially in the tails.

Manipulating sums

This post is a list of five spinoffs of my previous post. Except for the last point it doesn’t build on the previous post per se, but I’ll use a sum from that post to illustrate five things:

2. Simplifying sums by generalizing binomial coefficients
3. A bit of notation from Iverson
4. Changing variables in a sum
5. Chebyshev polynomials come up again.

Let’s get started. The equation I want to focus on is the following.

Putting things under a sum in LaTeX

I said in the previous post that you could equate the real parts of the left and right side to show that cos nθ can be written as sums and products of cos θ and sin θ. To write this in LaTeX we’d say

The command that makes it possible to put two lines of stuff under the summation is \substack. Here’s the LaTeX code that produce the summation sign and the text below it.

    \sum_{\substack{0 \leq j \leq n \\ j \text{ even}}}

Binomial coefficients

We can simplify the sum by removing the limits on j and implicitly letting j run over all integers:

This is because the binomial coefficient term is zero when j > n or j < 0. (See this post for an explanation of more general binomial coefficients.)

Indicator functions

It’s often helpful to turn a sum over a complicated region into a more complicated sum over a simpler region. That is, it’s often easier to deal with complications in the summand than in the domain of summation.

In our case, instead of summing over even integers, we could sum over all integers, if we multiply the summand by a function that is 0 on odd numbers and 1 on even numbers. That is, we multiply by the indicator function of the even integers. The indicator function of a set is 1 on that set and 0 everywhere else.

Kenneth Iverson’s notation uses a Boolean expression in brackets to indicate the function that is 1 if the condition is true and 0 otherwise. So [j even] means the function that is 1 when j is even and 0 when j is odd. So we could write our sum as follows.

Change of variables

We could get rid of the requirement that j is even by replacing j with 2k for a new variable k. Now our sum is

Notice a couple things. For one thing we were table to write (-1)k rather than i2k.

More importantly, the change of variables was trivial because the sum runs over all integers. If we had explicit limits on j, we would have to change them to different explicit limits on k.

Changing limits of summation is error-prone. This happens a lot, for example, when computing power series solutions for differential equations, and there are mnemonics for reducing errors such as “limits and exponents move in opposite directions.” These complications go away when you sum over all integers.

Chebyshev strikes again

GlennF left a comment on the previous post to the effect that the sum we’ve been talking about reduces to a Chebyshev polynomial.

Since the powers of sin θ are all even, we can replace sin²θ with 1 – cos²θ and get the following.

Now the left side is a polynomial in cos θ, call it P(cos θ). Then P = Tn, the nth Chebyshev polynomial because as explained here, one way to define the Chebyshev polynomials is by

If you don’t like that definition, you could use another definition and the equation becomes a theorem.

Analogy between Fibonacci and Chebyshev

Quick observation: I recently noticed that Chebyshev polynomials and Fibonacci numbers have analogous formulas.

The nth Chebyshev polynomial satisfies

for |x| ≥ 1, and the nth Fibonacci number is given by

There’s probably a way to explain the similarity in terms of the recurrence relations that both sequences satisfy.

When is round-trip floating point radix conversion exact?

Suppose you store a floating point number in memory, print it out in human-readable base 10, and read it back in. When can the original number be recovered exactly?

D. W. Matula answered this question more generally in 1968 [1].

Suppose we start with base β with p places of precision and convert to base γ with q places of precision, rounding to nearest, then convert back to the original base β. Matula’s theorem says that if there are no positive integers i and j such that

βi = γj

then a necessary and sufficient condition for the round-trip to be exact (assuming no overflow or underflow) is that

γq-1 > βp.

In the case of floating point numbers (type double in C) we have β = 2 and p = 53. (See Anatomy of a floating point number.) We’re printing to base γ = 10. No positive power of 10 is also a power of 2, so Matula’s condition on the two bases holds.

If we print out q = 17 decimal places, then

1016 > 253

and so round-trip conversion will be exact if both conversions round to nearest. If q is any smaller, some round-trip conversions will not be exact.

You can also verify that for a single precision floating point number (p = 24 bits precision) you need q = 9 decimal digits, and for a quad precision number (p = 113 bits precision) you need q = 36 decimal digits [2].

Looking back at Matula’s theorem, clearly we need

γq ≥ βp.

Why? Because the right side is the number of base β fractions and the left side is the number of base γ fractions. You can’t have a one-to-one map from a larger space into a smaller space. So the inequality above is necessary, but not sufficient. However, it’s almost sufficient. We just need one more base γ figure, i.e. we Matula tells us

γq-1 > βp

is sufficient. In terms of base 2 and base 10, we need at least 16 decimals to represent 53 bits. The surprising thing is that one more decimal is enough to guarantee that round-trip conversions are exact. It’s not obvious a priori that any finite number of extra decimals is always enough, but in fact just one more is enough; there’s no “table maker’s dilemma” here.

Here’s an example to show the extra decimal is necessary. Suppose p = 5. There are more 2-digit numbers than 5-bit numbers, but if we only use two digits then round-trip radix conversion will not always be exact. For example, the number 17/16 written in binary is 1.0001two, and has five significant bits. The decimal equivalent is 1.0625ten, which rounded to two significant digits is 1.1ten. But the nearest binary number to 1.1ten with 5 significant bits is 1.0010two = 1.125ten. In short, rounding to nearest gives

1.0001two -> 1.1ten -> 1.0010two

and so we don’t end up back where we started.

More floating point posts

[1] D. W. Matula. In-and-out conversions. Communications of the ACM, 11(1):47–50. January 1968. Cited in Handbook of Floating-point Arithmetic by Jean-Mihel Muller et al.

[2] The number of bits allocated for the fractional part of a floating point number is 1 less than the precision: the leading figure is always 1, so IEEE formats save one bit by not storing the leading bit, leaving it implicit. So, for example, a C double has 53 bits precision, but 52 bits of the 64 bits in a double are allocated to storing the fraction.

Chebyshev approximation

In the previous post I mentioned that Remez algorithm computes the best polynomial approximation to a given function f as measured by the maximum norm. That is, for a given n, it finds the polynomial p of order n that minimizes the absolute error

|| f – p ||.

The Mathematica function MiniMaxApproximation minimizes the relative error by minimizing

|| (fp) / f ||.

As was pointed out in the comments to the previous post, Chebyshev approximation produces a nearly optimal approximation, coming close to minimizing the absolute error. The Chebyshev approximation can be computed more easily and the results are easier to understand.

To form a Chebyshev approximation, we expand a function in a series of Chebyshev polynomials, analogous to expanding a function in a Fourier series, and keep the first few terms. Like sines and cosines, Chebyshev polynomials are orthogonal functions, and so Chebyshev series are analogous to Fourier series. (If you find it puzzling to hear of functions being orthogonal to each other, see this post.)

Here is Mathematica code to find and plot the Chebyshev approximation to ex over [-1, 1]. First, here are the coefficients.

    weight[x_] := 2/(Pi Sqrt[1 - x^2])
c = Table[
Integrate[ Exp[x] ChebyshevT[n, x] weight[x], {x, -1, 1}],
{n, 0, 5}]


The coefficients turn out to be exactly expressible in terms of Bessel functions, but typically you’d need to do a numerical integration with NIntegrate.

Now we use the Chebyshev coefficients to evaluate the Chebyshev approximation.

    p[x_] := Evaluate[c . Table[ChebyshevT[n - 1, x], {n, Length[c]}]]
- First[c]/2


You could see the approximating polynomial with

    Simplify[N[p[x]]]

which displays

    1.00004 + 1.00002 x + 0.499197 x^2 + 0.166489 x^3 + 0.0437939 x^4 +
0.00868682 x^5

The code

    Plot[Exp[x] - p[x], {x, -1, 1}]

shows the error in approximating the exponential function with the polynomial above.

Note that the plot has nearly equal ripples; the optimal approximation would have exactly equal ripples. The Chebyshev approximation is not optimal, but it is close. The absolute error is smaller than that of MiniMaxApproximation by a factor of about e.

There are bounds on how different the error can be between the best polynomial approximation and the Chebyshev series approximation. For polynomials of degree n, the Chebyshev error is never more than

4 + 4 log(n + 1)/π

times the Chebyshev series approximation error. See Theorem 16.1 in Approximation Theory and Approximation Practice by Lloyd N. Trefethen.

Generalization of power polynomials

A while back I wrote about the Mittag-Leffler function which is a sort of generalization of the exponential function. There are also Mittag-Leffler polynomials that are a sort of generalization of the powers of x; more on that shortly.

Recursive definition

The Mittag-Leffler polynomials can be defined recursively by M0(x) = 1
and

for n > 0.

Contrast with orthogonal polynomials

This is an unusual recurrence if you’re used to orthogonal polynomials, which come up more often in application. For example, Chebyshev polynomials satisfy

and Hermite polynomials satisfy

as I used as an example here.

All orthogonal polynomials satisfy a two-term recurrence like this where the value of each polynomial can be found from the value of the previous two polynomials.

Notice that with orthogonal polynomial recurrences the argument x doesn’t change, but the degrees of polynomials do. But with Mittag-Leffler polynomials the opposite is true: there’s only one polynomial on the right side, evaluated at three different points: x+1, x, and x-1.

Generalized binomial theorem

Here’s the sense in which the Mittag-Leffler polynomials generalize the power function. If we let pn(x) = xn be the power function, then the binomial theorem says

Something like the binomial theorem holds if we replace pn with Mn:

Exponential Christmas wreath

Today’s exponential sum looks sorta like a Christmas wreath with candles.

Let me back up and say what “today’s exponential sum” means. I have a page that plots a graph each day, where the function being graphed takes parameters from the date. It plots the partial sums of

where m is the month, d is the day, and y is the last two digits of the year.

Sometimes by coincidence the plots happen to look like something associated with the date, such as wreaths around Christmas. The plot for Christmas Eve this year looks like a branch of holly, and there’s a nice wreath on New Year’s Eve.

As I blogged about before, the images on Easter sometimes look like crosses. The image for Easter 2018 looked like an Iron Cross, and the image for Easter 2019 was a much more ornate cross. But the image for Easter 2020 looks nothing like a cross.

Since you can describe a point in the plane with two numbers, why would you choose to use three numbers? Why would you ever want to use a coordinate system with more coordinates than necessary?

Barycentric coordinates

One way to indicate the location of a point inside a triangle is to give the distance to each of the vertices. These three distances are called barycentric coordinates. Why would you use three numbers when two would do?

Barycentric coordinates make some things much simpler. For example, the coordinates of the three vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1) for any triangle. The points inside are written as convex combinations of the vertices. The coordinates of the center of mass, the barycenter, are (1/3, 1/3, 1/3). The vertices are treated symmetrically, even if the triangle is far from symmetric.

Barycentric coordinates are useful in applications, such as computer graphics and finite element analysis, because they are relative coordinates. When a triangle moves or is rescaled, you only need to keep track of where the vertices went; the coordinates of the points inside relative to the vertices haven’t changed.

This can be generalized to more dimensions. For example, you could describe a point in a tetrahedron with four coordinates, more in higher dimensions you could describe a point in an n-simplex by the convex combination coefficients of the n vertices.

Barycentric coordinates are related to Dirichlet probability distributions. When you have n probabilities that sum to 1, you’ve got n-1 degrees of freedom. But it often simplifies things to work with n variables. As with the discussion of triangles above, the extra variable makes expressions more symmetric.

Quaternions and rotations

A point in three dimensional space can be described with three numbers, but it’s often useful to think of the usual three coordinates as the vector part of a quaternion, a set of four numbers.

Suppose you have a point

a = (x, y, z)

and you want to rotate it by an angle θ around an axis given by a unit vector

b = (u, v, w).

You can compute the rotation by associating the point a with the quaternion

p = (0, x, y, z)

and the axis b with the quaternion

q = (cos(θ/2), sin(θ/2) u, sin(θ/2) v, sin(θ/2) w)

The image of a is then given by the quaternion

q p q-1.

This quaternion will have zero real part, and so the Euclidean coordinates are given by the vector part, the last three components.

Of course the product above is a quaternion product, which is not commutative. That’s why the q and the q-1 don’t cancel out.

Using quaternions for rotations has several advantages over using rotation matrices. First, the quaternion representation is more compact, describing a rotation with four real numbers rather than nine. Second, the quaternion calculation can be better behaved numerically. But most importantly, the quaternion approach avoids gimbal lock, a sort of singularity in representing rotations.

Projective planes

In applications of algebra, such as elliptic curve cryptography, you often need to add “points at infinity” to make things work out. To formalize this, you add an extra coordinate. So while an elliptic curve is usually presented as an equation such as

y² = x³ + ax + b,

it’s more formally an equation in three variables

y²z = x³ + axz² + bz³.

Points in the projective plane have coordinates (x, y, z) where points are considered equivalent if they differ by a non-zero multiple, i.e. (x, y, z) is considered the same point as (λx, λy, λz) for any non-zero λ.

You can often ignore the z, choosing λ so that the z coordinate is 1. But when you need to work with the point at infinity in a uniform way, you bring out the full coordinates. Now the “point at infinity” is not some mysterious entity, but simply the point (0, 1, 0).

Common themes

Projective coordinates, like barycentric coordinates, introduce symmetry. With the addition of an extra coordinate, the three coordinates all behave similarly, with no reason to distinguish any coordinate as special. And as with quanternion rotations, projective coordinates make singularities go away, which is consequence of symmetry.