# Manipulating sums

This post is a list of five spinoffs of my previous post. Except for the last point it doesn’t build on the previous post per se, but I’ll use a sum from that post to illustrate five things:

1. Putting multiple things under a summation sign in LaTeX
2. Simplifying sums by generalizing binomial coefficients
3. A bit of notation from Iverson
4. Changing variables in a sum
5. Chebyshev polynomials come up again.

Let’s get started. The equation I want to focus on is the following.

## Putting things under a sum in LaTeX

I said in the previous post that you could equate the real parts of the left and right side to show that cos nθ can be written as sums and products of cos θ and sin θ. To write this in LaTeX we’d say

The command that makes it possible to put two lines of stuff under the summation is \substack. Here’s the LaTeX code that produce the summation sign and the text below it.

    \sum_{\substack{0 \leq j \leq n \\ j \text{ even}}}

## Binomial coefficients

We can simply the sum by removing the limits on j and implicitly letting j run over all integers:

This is because the binomial coefficient term is zero when j > n or j < 0. (See this post for an explanation of more general binomial coefficients.)

## Indicator functions

It’s often helpful to turn a sum over a complicated region into a more complicated sum over a simpler region. That is, it’s often easier to deal with complications in the summand than in the domain of summation.

In our case, instead of summing over even integers, we could sum over all integers, if we multiply the summand by a function that is 0 on odd numbers and 1 on even numbers. That is, we multiply by the indicator function of the even integers. The indicator function of a set is 1 on that set and 0 everywhere else.

Kenneth Iverson’s notation uses a Boolean expression in brackets to indicate the function that is 1 if the condition is true and 0 otherwise. So [j even] means the function that is 1 when j is even and 0 when j is odd. So we could write our sum as follows.

## Change of variables

We could get rid of the requirement that j is even by replacing j with 2k for a new variable k. Now our sum is

Notice a couple things. For one thing we were table to write (-1)k rather than i2k.

More importantly, the change of variables was trivial because the sum runs over all integers. If we had explicit limits on j, we would have to change them to different explicit limits on k.

Changing limits of summation is error-prone. This happens a lot, for example, when computing power series solutions for differential equations, and there are mnemonics for reducing errors such as “limits and exponents move in opposite directions.” These complications go away when you sum over all integers.

## Chebyshev strikes again

GlennF left a comment on the previous post to the effect that the sum we’ve been talking about reduces to a Chebyshev polynomial.

Since the powers of sin θ are all even, we can replace sin²θ with 1 – cos²θ and get the following.

Now the left side is a polynomial in cos θ, call it P(cos θ). Then P = Tn, the nth Chebyshev polynomial because as explained here, one way to define the Chebyshev polynomials is by

If you don’t like that definition, you could use another definition and the equation becomes a theorem.

# Analogy between Fibonacci and Chebyshev

Quick observation: I recently noticed that Chebyshev polynomials and Fibonacci numbers have analogous formulas.

The nth Chebyshev polynomial satisfies

for |x| ≥ 1, and the nth Fibonacci number is given by

There’s probably a way to explain the similarity in terms of the recurrence relations that both sequences satisfy.

# When is round-trip floating point radix conversion exact?

Suppose you store a floating point number in memory, print it out in human-readable base 10, and read it back in. When can the original number be recovered exactly?

D. W. Matula answered this question more generally in 1968 [1].

Suppose we start with base β with p places of precision and convert to base γ with q places of precision, rounding to nearest, then convert back to the original base β. Matula’s theorem says that if there are no positive integers i and j such that

βi = γj

then a necessary and sufficient condition for the round-trip to be exact (assuming no overflow or underflow) is that

γq-1 > βp.

In the case of floating point numbers (type double in C) we have β = 2 and p = 53. (See Anatomy of a floating point number.) We’re printing to base γ = 10. No positive power of 10 is also a power of 2, so Matula’s condition on the two bases holds.

If we print out q = 17 decimal places, then

1016 > 253

and so round-trip conversion will be exact if both conversions round to nearest. If q is any smaller, some round-trip conversions will not be exact.

You can also verify that for a single precision floating point number (p = 24 bits precision) you need q = 9 decimal digits, and for a quad precision number (p = 113 bits precision) you need q = 36 decimal digits [2].

Looking back at Matula’s theorem, clearly we need

γq ≥ βp.

Why? Because the right side is the number of base β fractions and the left side is the number of base γ fractions. You can’t have a one-to-one map from a larger space into a smaller space. So the inequality above is necessary, but not sufficient. However, it’s almost sufficient. We just need one more base γ figure, i.e. we Matula tells us

γq-1 > βp

is sufficient. In terms of base 2 and base 10, we need at least 16 decimals to represent 53 bits. The surprising thing is that one more decimal is enough to guarantee that round-trip conversions are exact. It’s not obvious a priori that any finite number of extra decimals is always enough, but in fact just one more is enough; there’s no “table maker’s dilemma” here.

Here’s an example to show the extra decimal is necessary. Suppose p = 5. There are more 2-digit numbers than 5-bit numbers, but if we only use two digits then round-trip radix conversion will not always be exact. For example, the number 17/16 written in binary is 1.0001two, and has five significant bits. The decimal equivalent is 1.0625ten, which rounded to two significant digits is 1.1ten. But the nearest binary number to 1.1ten with 5 significant bits is 1.0010two = 1.125ten. In short, rounding to nearest gives

1.0001two -> 1.1ten -> 1.0010two

and so we don’t end up back where we started.

## More floating point posts

[1] D. W. Matula. In-and-out conversions. Communications of the ACM, 11(1):47–50. January 1968. Cited in Handbook of Floating-point Arithmetic by Jean-Mihel Muller et al.

[2] The number of bits allocated for the fractional part of a floating point number is 1 less than the precision: the leading figure is always 1, so IEEE formats save one bit by not storing the leading bit, leaving it implicit. So, for example, a C double has 53 bits precision, but 52 bits of the 64 bits in a double are are allocated to storing the fraction.

# Chebyshev approximation

In the previous post I mentioned that Remez algorithm computes the best polynomial approximation to a given function f as measured by the maximum norm. That is, for a given n, it finds the polynomial p of order n that minimizes the absolute error

|| f – p ||.

The Mathematica function MiniMaxApproximation minimizes the relative error by minimizing

|| (fp) / f ||.

As was pointed out in the comments to the previous post, Chebyshev approximation produces a nearly optimal approximation, coming close to minimizing the absolute error. The Chebyshev approximation can be computed more easily and the results are easier to understand.

To form a Chebyshev approximation, we expand a function in a series of Chebyshev polynomials, analogous to expanding a function in a Fourier series, and keep the first few terms. Like sines and cosines, Chebyshev polynomials are orthogonal functions, and so Chebyshev series are analogous to Fourier series. (If you find it puzzling to hear of functions being orthogonal to each other, see this post.)

Here is Mathematica code to find and plot the Chebyshev approximation to ex over [-1, 1]. First, here are the coefficients.

    weight[x_] := 2/(Pi Sqrt[1 - x^2])
c = Table[
Integrate[ Exp[x] ChebyshevT[n, x] weight[x], {x, -1, 1}],
{n, 0, 5}]


The coefficients turn out to be exactly expressible in terms of Bessel functions, but typically you’d need to do a numerical integration with NIntegrate.

Now we use the Chebyshev coefficients to evaluate the Chebyshev approximation.

    p[x_] := Evaluate[c . Table[ChebyshevT[n - 1, x], {n, Length[c]}]]
- First[c]/2


You could see the approximating polynomial with

    Simplify[N[p[x]]]

which displays

    1.00004 + 1.00002 x + 0.499197 x^2 + 0.166489 x^3 + 0.0437939 x^4 +
0.00868682 x^5

The code

    Plot[Exp[x] - p[x], {x, -1, 1}]

shows the error in approximating the exponential function with the polynomial above.

Note that the plot has nearly equal ripples; the optimal approximation would have exactly equal ripples. The Chebyshev approximation is not optimal, but it is close. The absolute error is smaller than that of MiniMaxApproximation by a factor of about e.

There are bounds on how different the error can be between the best polynomial approximation and the Chebyshev series approximation. For polynomials of degree n, the Chebyshev error is never more than

4 + 4 log(n + 1)/π

times the Chebyshev series approximation error. See Theorem 16.1 in Approximation Theory and Approximation Practice by Lloyd N. Trefethen.

# Generalization of power polynomials

A while back I wrote about the Mittag-Leffler function which is a sort of generalization of the exponential function. There are also Mittag-Leffler polynomials that are a sort of generalization of the powers of x; more on that shortly.

## Recursive definition

The Mittag-Leffler polynomials can be defined recursively by M0(x) = 1
and

for n > 0.

## Contrast with orthogonal polynomials

This is an unusual recurrence if you’re used to orthogonal polynomials, which come up more often in application. For example, Chebyshev polynomials satisfy

and Hermite polynomials satisfy

as I used as an example here.

All orthogonal polynomials satisfy a two-term recurrence like this where the value of each polynomial can be found from the value of the previous two polynomials.

Notice that with orthogonal polynomial recurrences the argument x doesn’t change, but the degrees of polynomials do. But with Mittag-Leffler polynomials the opposite is true: there’s only one polynomial on the right side, evaluated at three different points: x+1, x, and x-1.

## Generalized binomial theorem

Here’s the sense in which the Mittag-Leffler polynomials generalize the power function. If we let pn(x) = xn be the power function, then the binomial theorem says

Something like the binomial theorem holds if we replace pn with Mn:

# Exponential Christmas wreath

Today’s exponential sum looks sorta like a Christmas wreath with candles.

Let me back up and say what “today’s exponential sum” means. I have a page that plots a graph each day, where the function being graphed takes parameters from the date. It plots the partial sums of

where m is the month, d is the day, and y is the last two digits of the year.

Sometimes by coincidence the plots happen to look like something associated with the date, such as wreaths around Christmas. The plot for Christmas Eve this year looks like a branch of holly, and there’s a nice wreath on New Year’s Eve.

As I blogged about before, the images on Easter sometimes look like crosses. The image for Easter 2018 looked like an Iron Cross, and the image for Easter 2019 was a much more ornate cross. But the image for Easter 2020 looks nothing like a cross.

# Advantages of redundant coordinates

Since you can describe a point in the plane with two numbers, why would you choose to use three numbers? Why would you ever want to use a coordinate system with more coordinates than necessary?

## Barycentric coordinates

One way to indicate the location of a point inside a triangle is to give the distance to each of the vertices. These three distances are called barycentric coordinates. Why would you use three numbers when two would do?

Barycentric coordinates make some things much simpler. For example, the coordinates of the three vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1) for any triangle. The points inside are written as convex combinations of the vertices. The coordinates of the center of mass, the barycenter, are (1/3, 1/3, 1/3). The vertices are treated symmetrically, even if the triangle is far from symmetric.

Barycentric coordinates are useful in applications, such as computer graphics and finite element analysis, because they are relative coordinates. When a triangle moves or is rescaled, you only need to keep track of where the vertices went; the coordinates of the points inside relative to the vertices haven’t changed.

This can be generalized to more dimensions. For example, you could describe a point in a tetrahedron with four coordinates, more in higher dimensions you could describe a point in an n-simplex by the convex combination coefficients of the n vertices.

Barycentric coordinates are related to Dirichlet probability distributions. When you have n probabilities that sum to 1, you’ve got n-1 degrees of freedom. But it often simplifies things to work with n variables. As with the discussion of triangles above, the extra variable makes expressions more symmetric.

## Quaternions and rotations

A point in three dimensional space can be described with three numbers, but it’s often useful to think of the usual three coordinates as the vector part of a quaternion, a set of four numbers.

Suppose you have a point

a = (x, y, z)

and you want to rotate it by an angle θ around an axis given by a unit vector

b = (u, v, w).

You can compute the rotation by associating the point a with the quaternion

p = (0, x, y, z)

and the axis b with the quaternion

q = (cos(θ/2), sin(θ/2) u, sin(θ/2) v, sin(θ/2) w)

The image of a is then given by the quaternion

q p q-1.

This quaternion will have zero real part, and so the Euclidean coordinates are given by the vector part, the last three components.

Of course the product above is a quaternion product, which is not commutative. That’s why the q and the q-1 don’t cancel out.

Using quaternions for rotations has several advantages over using rotation matrices. First, the quaternion representation is more compact, describing a rotation with four real numbers rather than nine. Second, the quaternion calculation can be better behaved numerically. But most importantly, the quaternion approach avoids gimbal lock, a sort of singularity in representing rotations.

## Projective planes

In applications of algebra, such as elliptic curve cryptography, you often need to add “points at infinity” to make things work out. To formalize this, you add an extra coordinate. So while an elliptic curve is usually presented as an equation such as

y² = x³ + ax + b,

it’s more formally an equation in three variables

y²z = x³ + axz² + bz³.

Points in the projective plane have coordinates (x, y, z) where points are considered equivalent if they differ by a non-zero multiple, i.e. (x, y, z) is considered the same point as (λx, λy, λz) for any non-zero λ.

You can often ignore the z, choosing λ so that the z coordinate is 1. But when you need to work with the point at infinity in a uniform way, you bring out the full coordinates. Now the “point at infinity” is not some mysterious entity, but simply the point (0, 1, 0).

## Common themes

Projective coordinates, like barycentric coordinates, introduce symmetry. With the addition of an extra coordinate, the three coordinates all behave similarly, with no reason to distinguish any coordinate as special. And as with quanternion rotations, projective coordinates make singularities go away, which is consequence of symmetry.

# Illustrating Cayley-Hamilton with Python

If you take a square matrix M, subtract x from the elements on the diagonal, and take the determinant, you get a polynomial in x called the characteristic polynomial of M. For example, let

Then

The characteristic equation is the equation that sets the characteristic polynomial to zero. The roots of this polynomial are eigenvalues of the matrix.

The Cayley-Hamilton theorem says that if you take the original matrix and stick it into the polynomial, you’ll get the zero matrix.

In brief, a matrix satisfies its own characteristic equation. Note that for this to hold we interpret constants, like 12 and 0, as corresponding multiples of the identity matrix.

You could verify the Cayley-Hamilton theorem in Python using scipy.linalg.funm to compute a polynomial function of a matrix.

>>> from scipy import array
>>> from scipy.linalg import funm
>>> m = array([[5, -2], [1, 2]])
>>> funm(m, lambda x: x**2 - 7*x + 12)


This returns a zero matrix.

I imagine funm is factoring M into something like PDP-1 where D is a diagonal matrix. Then

f(M) = P f(D) P-1.

This is because f can be applied to a diagonal matrix by simply applying f to each diagonal entry independently. You could use this to prove the Cayley-Hamilton theorem for diagonalizable matrices.

# Fractal via bit twiddling

The Sierpinski triangle has come up several times on this blog. Here’s yet another way to produce a Sierpinski triangle, this time by taking the bitwise-and of two integers. The ith bit of x&y is 1 if and only if the ith bit of x and the ith bit of y are both 1.

The following C program prints an asterisk when the bitwise-and of i and j is zero.

    #include <stdio.h>

int main() {
int N = 32;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++)
printf("%c", (i&j) ? ' ' : '*');
printf("\n");
}
}


Here’s a screenshot of the output.

# Stochastic rounding and privacy

Suppose ages in some database are reported in decades: 0, 10, 20, etc. You need to add a 27 year old woman to the data set. How do you record her age? A reasonable approach would to round-to-nearest. In this case, 27 would be rounded up to 30.

Another approach would be stochastic rounding. In our example, we would round this woman’s age up to 30 with 70% probability and round it down to 20 with 30% probability. The recorded value is a random variable whose expected value is exactly 27.

Suppose we were to add a large number of 27 year olds to the database. With round-to-nearest, the average value would be 30 because all the values are 30. With stochastic rounding, about 30% of the ages would be recorded as 20 and about 70% would be recorded as 30. The average would likely be close to 27.

Next, suppose we add people to the database of varying ages. Stochastic rounding would record every person’s age using a random variable whose expected value is their age. If someone’s age is a d+x where d is a decade, i.e. a multiple of 10, and 0 < x < 10, then we would record their age as d with probability 1-x/10 and d+10 with probability x/10. There would be no bias in the reported age.

Round-to-nearest will be biased unless ages are uniformly distributed in each decade. Suppose, for example, our data is on undergraduate students. We would expect a lot more students in their early twenties than in their late twenties.

Now let’s turn things around. Instead of looking at recorded age given actual age, let’s look at actual age given recorded age. Suppose someone’s age is recorded as 30. What does that tell you about them?

With round-to-nearest, it tells you that they certainly are between 25 and 35. With stochastic rounding, they could be anywhere between 20 and 40. The probability distribution on this interval could be computed from Bayes’ theorem, depending on the prior distribution of ages on this interval. That is, if you know in general how ages are distributed over the interval (20, 40), you could use Bayes’ theorem to compute the posterior distribution on age, given that age was recorded as 30.

Stochastic rounding preserves more information than round-to-nearest on average, but less information in the case of a particular individual.