Generalization of power polynomials

A while back I wrote about the Mittag-Leffler function which is a sort of generalization of the exponential function. There are also Mittag-Leffler polynomials that are a sort of generalization of the powers of x; more on that shortly.

Recursive definition

The Mittag-Leffler polynomials can be defined recursively by M0(x) = 1
and

M_{n+1}(x) = \frac{x}{2}\left(M_n(x+1) + 2M_n(x) + M_n(x-1) \right )

for n > 0.

Contrast with orthogonal polynomials

This is an unusual recurrence if you’re used to orthogonal polynomials, which come up more often in application. For example, Chebyshev polynomials satisfy

T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)

and Hermite polynomials satisfy

 H_{n+1}(x) = x H_n(x) - n H_{n-1}(x)

as I used as an example here.

All orthogonal polynomials satisfy a two-term recurrence like this where the value of each polynomial can be found from the value of the previous two polynomials.

Notice that with orthogonal polynomial recurrences the argument x doesn’t change, but the degrees of polynomials do. But with Mittag-Leffler polynomials the opposite is true: there’s only one polynomial on the right side, evaluated at three different points: x+1, x, and x-1.

Generalized binomial theorem

Here’s the sense in which the Mittag-Leffler polynomials generalize the power function. If we let pn(x) = xn be the power function, then the binomial theorem says

p_n(x+y) = \sum_{k=0}^n {n\choose k}\, p_{k}(x)\, p_{n-k}(y).

Something like the binomial theorem holds if we replace pn with Mn:

M_n(x+y) = \sum_{k=0}^n {n\choose k}\, M_{k}(x)\, M_{n-k}(y).

Related posts

Exponential Christmas wreath

Today’s exponential sum looks sorta like a Christmas wreath with candles.

Exponential sum 2019-12-20

Let me back up and say what “today’s exponential sum” means. I have a page that plots a graph each day, where the function being graphed takes parameters from the date. It plots the partial sums of

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

where m is the month, d is the day, and y is the last two digits of the year.

Sometimes by coincidence the plots happen to look like something associated with the date, such as wreaths around Christmas. The plot for Christmas Eve this year looks like a branch of holly, and there’s a nice wreath on New Year’s Eve.

As I blogged about before, the images on Easter sometimes look like crosses. The image for Easter 2018 looked like an Iron Cross, and the image for Easter 2019 was a much more ornate cross. But the image for Easter 2020 looks nothing like a cross.

Advantages of redundant coordinates

Since you can describe a point in the plane with two numbers, why would you choose to use three numbers? Why would you ever want to use a coordinate system with more coordinates than necessary?

Barycentric coordinates

One way to indicate the location of a point inside a triangle is to give the distance to each of the vertices. These three distances are called barycentric coordinates. Why would you use three numbers when two would do?

Barycentric coordinates make some things much simpler. For example, the coordinates of the three vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1) for any triangle. The points inside are written as convex combinations of the vertices. The coordinates of the center of mass, the barycenter, are (1/3, 1/3, 1/3). The vertices are treated symmetrically, even if the triangle is far from symmetric.

Barycentric coordinates are useful in applications, such as computer graphics and finite element analysis, because they are relative coordinates. When a triangle moves or is rescaled, you only need to keep track of where the vertices went; the coordinates of the points inside relative to the vertices haven’t changed.

This can be generalized to more dimensions. For example, you could describe a point in a tetrahedron with four coordinates, more in higher dimensions you could describe a point in an n-simplex by the convex combination coefficients of the n vertices.

Barycentric coordinates are related to Dirichlet probability distributions. When you have n probabilities that sum to 1, you’ve got n-1 degrees of freedom. But it often simplifies things to work with n variables. As with the discussion of triangles above, the extra variable makes expressions more symmetric.

Quaternions and rotations

A point in three dimensional space can be described with three numbers, but it’s often useful to think of the usual three coordinates as the vector part of a quaternion, a set of four numbers.

Suppose you have a point

a = (x, y, z)

and you want to rotate it by an angle θ around an axis given by a unit vector

b = (u, v, w).

You can compute the rotation by associating the point a with the quaternion

p = (0, x, y, z)

and the axis b with the quaternion

q = (cos(θ/2), sin(θ/2) u, sin(θ/2) v, sin(θ/2) w)

The image of a is then given by the quaternion

q p q-1.

This quaternion will have zero real part, and so the Euclidean coordinates are given by the vector part, the last three components.

Of course the product above is a quaternion product, which is not commutative. That’s why the q and the q-1 don’t cancel out.

Using quaternions for rotations has several advantages over using rotation matrices. First, the quaternion representation is more compact, describing a rotation with four real numbers rather than nine. Second, the quaternion calculation can be better behaved numerically. But most importantly, the quaternion approach avoids gimbal lock, a sort of singularity in representing rotations.

Projective planes

In applications of algebra, such as elliptic curve cryptography, you often need to add “points at infinity” to make things work out. To formalize this, you add an extra coordinate. So while an elliptic curve is usually presented as an equation such as

y² = x³ + ax + b,

it’s more formally an equation in three variables

y²z = x³ + axz² + bz³.

Points in the projective plane have coordinates (x, y, z) where points are considered equivalent if they differ by a non-zero multiple, i.e. (x, y, z) is considered the same point as (λx, λy, λz) for any non-zero λ.

You can often ignore the z, choosing λ so that the z coordinate is 1. But when you need to work with the point at infinity in a uniform way, you bring out the full coordinates. Now the “point at infinity” is not some mysterious entity, but simply the point (0, 1, 0).

Common themes

Projective coordinates, like barycentric coordinates, introduce symmetry. With the addition of an extra coordinate, the three coordinates all behave similarly, with no reason to distinguish any coordinate as special. And as with quanternion rotations, projective coordinates make singularities go away, which is consequence of symmetry.

Related posts

Illustrating Cayley-Hamilton with Python

If you take a square matrix M, subtract x from the elements on the diagonal, and take the determinant, you get a polynomial in x called the characteristic polynomial of M. For example, let

M = \left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right]

Then

\left| \begin{matrix} 5-x & -2 \\ 1 & 2-x \end{matrix} \right| = x^2 - 7x + 12

The characteristic equation is the equation that sets the characteristic polynomial to zero. The roots of this polynomial are eigenvalues of the matrix.

The Cayley-Hamilton theorem says that if you take the original matrix and stick it into the polynomial, you’ll get the zero matrix.

\left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right]^2 - 7\left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right] + 12\left[ \begin{matrix} 1 & 0 \\ 0 & 1\end{matrix} \right] = \left[ \begin{matrix} 0 & 0 \\ 0 & 0\end{matrix} \right]

In brief, a matrix satisfies its own characteristic equation. Note that for this to hold we interpret constants, like 12 and 0, as corresponding multiples of the identity matrix.

You could verify the Cayley-Hamilton theorem in Python using scipy.linalg.funm to compute a polynomial function of a matrix.

>>> from scipy import array
>>> from scipy.linalg import funm
>>> m = array([[5, -2], [1, 2]])
>>> funm(m, lambda x: x**2 - 7*x + 12)

This returns a zero matrix.

I imagine funm is factoring M into something like PDP-1 where D is a diagonal matrix. Then

f(M) = P f(D) P-1.

This is because f can be applied to a diagonal matrix by simply applying f to each diagonal entry independently. You could use this to prove the Cayley-Hamilton theorem for diagonalizable matrices.

Related posts

Fractal via bit twiddling

The Sierpinski triangle has come up several times on this blog. Here’s yet another way to produce a Sierpinski triangle, this time by taking the bitwise-and of two integers. The ith bit of x&y is 1 if and only if the ith bit of x and the ith bit of y are both 1.

The following C program prints an asterisk when the bitwise-and of i and j is zero.

    #include <stdio.h>

    int main() {
        int N = 32;
        for (int i = 0; i < N; i++) {
            for (int j = 0; j < N; j++)
                printf("%c", (i&j) ? ' ' : '*');
            printf("\n");
        }
    }

Here’s a screenshot of the output.

screen shot that looks like Sierpinski triangle

More posts on Sierpinski triangle

Stochastic rounding and privacy

Suppose ages in some database are reported in decades: 0, 10, 20, etc. You need to add a 27 year old woman to the data set. How do you record her age? A reasonable approach would to round-to-nearest. In this case, 27 would be rounded up to 30.

Another approach would be stochastic rounding. In our example, we would round this woman’s age up to 30 with 70% probability and round it down to 20 with 30% probability. The recorded value is a random variable whose expected value is exactly 27.

Suppose we were to add a large number of 27 year olds to the database. With round-to-nearest, the average value would be 30 because all the values are 30. With stochastic rounding, about 30% of the ages would be recorded as 20 and about 70% would be recorded as 30. The average would likely be close to 27.

Next, suppose we add people to the database of varying ages. Stochastic rounding would record every person’s age using a random variable whose expected value is their age. If someone’s age is a d+x where d is a decade, i.e. a multiple of 10, and 0 < x < 10, then we would record their age as d with probability 1-x/10 and d+10 with probability x/10. There would be no bias in the reported age.

Round-to-nearest will be biased unless ages are uniformly distributed in each decade. Suppose, for example, our data is on undergraduate students. We would expect a lot more students in their early twenties than in their late twenties.

Now let’s turn things around. Instead of looking at recorded age given actual age, let’s look at actual age given recorded age. Suppose someone’s age is recorded as 30. What does that tell you about them?

With round-to-nearest, it tells you that they certainly are between 25 and 35. With stochastic rounding, they could be anywhere between 20 and 40. The probability distribution on this interval could be computed from Bayes’ theorem, depending on the prior distribution of ages on this interval. That is, if you know in general how ages are distributed over the interval (20, 40), you could use Bayes’ theorem to compute the posterior distribution on age, given that age was recorded as 30.

Stochastic rounding preserves more information than round-to-nearest on average, but less information in the case of a particular individual.

More privacy posts

Crinkle Crankle Calculus

A crinkle crankle wall

A crinkle crankle wall, also called a serpentine wall, is a wavy wall that may seem to sacrifice some efficiency for aesthetics. The curves add visual interest, but they use more material than a straight way. Except they don’t! They use more bricks than a straight wall of the same thickness but they don’t have to be as thick.

Crinkle crankle walls resist horizontal forces, like wind, more than straight wall would. So if the alternative to a crinkle crankle wall one-brick thick is a straight wall two or more bricks thick, the former saves material. How much material?

The amount of material used in the wall is proportional to the product of its length and thickness. Suppose the wall is shaped like a sine wave and consider a section of wall 2π long. If the wall is in the shape of a sin(θ), then we need to find the arc length of this curve. This works out to the following integral.

\int_0^{2\pi} \sqrt{1 + a^2 \cos^2(x)}\, dx

The parameter a is the amplitude of the sine wave. If a = 0, we have a flat wave, i.e. a straight wall, as so the length of this segment is 2π = 6.2832. If a = 1, the integral is 7.6404. So a section of wall is 22% longer, but uses 50% less material per unit length as a wall two bricks thick.

The integral above cannot be computed in closed form in terms of elementary functions, so this would make a good homework exercise for a class covering numerical integration.

The integral can be computed in terms of special functions. It equals 4 E(-a²) where E is the “complete elliptic integral of the second kind.” This function is implemented as EllipticE in Mathematica and as scipy.special.ellipe in Python.

As the amplitude a increases, the arc length of a section of wall increases. You could solve for the value of a to give you whatever arc length you like. For example, if a = 1.4422 then the length is twice that of a straight line. So a crinkle crankle wall with amplitude 1.4422 uses about as many bricks as a straight wall twice as thick.

***

Photo “Crinkle crankle wall, Fulbourn” by Bob Jones is licensed under CC BY-SA 2.0

Inverse trig function implementations

Programming languages differ in the names they use for inverse trig functions, and in some cases differ in their return values for the same inputs.

Choice of range

If sin(θ) = x, then θ is the inverse sine of x, right? Maybe. Depends on how you define the inverse sine. If -1 ≤ x ≤ 1, then there are infinitely many θ’s whose sine is x. So you have to make a choice.

The conventional choice is for inverse sine to return a value of θ in the closed interval

[-π/2, π/2].

Inverse tangent uses the same choice. However, the end points aren’t possible outputs, so inverse tangent typically returns a value in the open interval

(-π/2, π/2).

For inverse cosine, the usual choice is to return values in the closed interval

[0, π].

Naming conventions

Aside from how to define inverse trig functions, there’s the matter of what to call them. The inverse of tangent, for example, can be written tan-1, arctan, or atan. You might even see arctg or atn or some other variation.

Most programming languages I’m aware of use atan etc. Python uses atan, but SciPy uses arctan. Mathematica uses ArcTan.

Software implementations typically follow the conventions above, and will return the same value for the same inputs, as long as inputs and outputs are real numbers.

Arctangent with two arguments

Many programming languages have a function atan2 that takes two arguments, y and x, and returns an angle whose tangent is y/x. Note that the y argument to atan2 comes first: top-down order, numerator then denominator, not alphabetical order.

However, unlike the one-argument atan function, the return value may not be in the interval (-π/2, π/2). Instead, atan2 returns an angle in the same quadrant as x and y. For example,

    atan2(-1, 1)

returns -π/4; because the point (1, -1) is in the 4th quadrant, it returns an angle in the 4th quadrant. But

    atan2(1, -1)

returns 3π/4. Here the point (-1, 1) and the angle 3π/4 are in the 2nd quadrant.

In Common Lisp, there’s no function named atan2; you just call atan with two arguments. Mathematica is similar: you call ArcTan with two arguments. However, Mathematica uses the opposite convention and takes x as its first argument and y as its second.

Complex values

Some software libraries support complex numbers and some do not. And among those that do support complex numbers, the return values may differ. This is because, as above, you have choices to make when extending these functions to complex inputs and outputs.

For example, in Python,

    math.asin(2)

returns a domain error and

    scipy.arcsin(2)

returns 1.5707964 + 1.3169578i.

In Common Lisp,

    (asin 2)

returns 1.5707964 – 1.3169578i. Both SciPy and Common Lisp return complex numbers whose sine equals 2, but they follow different conventions for what number to chose. That is, both

    scipy.sin(1.5707964 - 1.3169578j)
    scipy.sin(1.5707964 + 1.3169578j)

in Python and

    (sin #C(1.5707964 -1.3169578))
    (sin #C(1.5707964 +1.3169578))

in Common Lisp both return 2, aside from rounding error.

In C99, the function casin (complex arc sine) from <complex.h>

    double complex z = casin(2);

returns 1.5707964 + 1.3169578i. The Mathematica call

    ArcSin[2.]

returns 1.5707964 – 1.3169578i.

Branch cuts

The Common Lisp standardization committee did a very careful job of defining math functions for complex arguments. I’ve written before about this here. You don’t need to know anything about Lisp to read that post.

The committee ultimately decided to first rigorously define the two-argument arctangent function and bootstrap everything else from there. The post above explains how.

Other programming language have made different choices and so may produce different values, as demonstrated above. I mention the Common Lisp convention because they did a great job of considering every detail, such as how to handle +0 and -0 in IEEE floating point.

arctan( k tan(x) )

I recently had to work with the function

f(x; k) = arctan( k tan(x) )

The semicolon between x and k is meant to imply that we’re thinking of x as the argument and k as a parameter.

This function is an example of the coming full circle theme I’ve written about several times. Here’s how a novice, a journeyman, and an expert might approach studying our function.

  • Novice: arctan( k tan(x) ) = kx.
  • Journeyman: You can’t do that!
  • Expert: arctan( k tan(x) ) ≈ kx for small x.

Novices often move symbols around without thinking about their meaning, and so someone might pull the k outside (why not?) and notice that arctan( tan(x) ) = x.

Someone with a budding mathematical conscience might conclude that since our function is nonlinear in x and in k that there’s not much that can be said without more work.

Someone with more experience might see that both tan(x) and arctan(x) have the form x + O(x³) and so

arctan( k tan(x) ) ≈ kx

should be a very good approximation for small x.

Here’s a plot of our function for varying values of k.

Plot of atan( k tan(x) ) for varying k

Each is fairly flat in the middle, with slope equal to its value of k.

As k increases, f(x; k) becomes more and more like a step function, equal to -π/2 for negative x and π/2 for positive x, i.e.

arctan( k tan(x) ) ≈ sgn(x) π/2

for large k. Here again we might have discussion like above.

  • Novice: Set k = ±∞. Then ±∞ tan(x) = ±∞ and arctan(±∞) = ±π/2.
  • Journeyman: You can’t do that!
  • Expert: Well, you can if you interpret everything in terms of limits.

Related posts

The hardest logarithm to compute

Suppose you want to compute the natural logarithms of every floating point number, correctly truncated to a floating point result. Here by floating point number we mean an IEEE standard 64-bit float, what C calls a double. Which logarithm is hardest to compute?

We’ll get to the hardest logarithm shortly, but we’ll first start with a warm up problem. Suppose you needed to compute the first digit of a number z. You know that z equals 2 + ε where ε is either 10-100 or -10-100. If ε is positive, you should return 2. If ε is negative, you should return 1. You know z to very high precision a priori, but to return the first digit correctly, you need to compute z to 100 decimal places.

In this post we’re truncating floating point numbers, i.e. rounding down, but similar considerations apply when rounding to nearest. For example, if you wanted to compute 0.5 + ε rounded to the nearest integer.

In general, it may not be possible to know how accurately you need to compute a value in order to round it correctly. William Kahan called this the table maker’s dilemma.

Worst case for logarithm

Lefèvre and Muller [1] found that the floating point number whose logarithm is hardest to compute is

x = 1.0110001010101000100001100001001101100010100110110110 × 2678

where the fractional part is written in binary. The log of x, again written in binary, is

111010110.0100011110011110101110100111110010010111000100000000000000000000000000000000000000000000000000000000000000000111 …

The first 53 significant bits, all the bits a floating point number can represent [2], are followed by 65 zeros. We have to compute the log with a precision of more than 118 bits in order to know whether the last bit of the fraction part of the float representation should be a 0 or a 1. Usually you don’t need nearly that much precision, but this is the worst case.

Mathematica code

Here is Mathematica code to verify the result above. I don’t know how to give Mathematica floating point numbers in binary, but I know how to give it integers, so I’ll multiply the fraction part by 252 and take 52 from the exponent.

n = FromDigits[
  "10110001010101000100001100001001101100010100110110110", 2]
x = n  2^(678 - 52)
BaseForm[N[Log[x], 40], 2]

This computes the log of x to 40 significant decimal figures, between 132 and 133 bits, and gives the result above.

Related posts

A few days ago I wrote about another example by Jean-Michel Muller: A floating point oddity.

William Kahan also came up recently in my post on summing an array of floating point numbers.

***

[1] Vincent Lefèvre, Jean-Michel Muller. Worst Cases for Correct Rounding of the Elementary Functions in Double Precision. November 2000. Research Report 2000-35. Available here.

[2] There are 52 bits in the the faction of an IEEE double. The first bit of a fraction is 1, unless a number is subnormal, so the first bit is left implicit, squeezing out one extra bit of precision. That is, storing 52 bits gives us 53 bits of precision.