Generalization of power polynomials

A while back I wrote about the Mittag-Leffler function which is a sort of generalization of the exponential function. There are also Mittag-Leffler polynomials that are a sort of generalization of the powers of x; more on that shortly.

Recursive definition

The Mittag-Leffler polynomials can be defined recursively by M0(x) = 1
and

M_{n+1}(x) = \frac{x}{2}\left(M_n(x+1) + 2M_n(x) + M_n(x-1) \right )

for n > 0.

Contrast with orthogonal polynomials

This is an unusual recurrence if you’re used to orthogonal polynomials, which come up more often in application. For example, Chebyshev polynomials satisfy

T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)

and Hermite polynomials satisfy

 H_{n+1}(x) = x H_n(x) - n H_{n-1}(x)

as I used as an example here.

All orthogonal polynomials satisfy a two-term recurrence like this where the value of each polynomial can be found from the value of the previous two polynomials.

Notice that with orthogonal polynomial recurrences the argument x doesn’t change, but the degrees of polynomials do. But with Mittag-Leffler polynomials the opposite is true: there’s only one polynomial on the right side, evaluated at three different points: x+1, x, and x-1.

Generalized binomial theorem

Here’s the sense in which the Mittag-Leffler polynomials generalize the power function. If we let pn(x) = xn be the power function, then the binomial theorem says

p_n(x+y) = \sum_{k=0}^n {n\choose k}\, p_{k}(x)\, p_{n-k}(y).

Something like the binomial theorem holds if we replace pn with Mn:

M_n(x+y) = \sum_{k=0}^n {n\choose k}\, M_{k}(x)\, M_{n-k}(y).

Related posts

A different view of the Lorenz system

The Lorenz system is a canonical example of chaos. Small changes in initial conditions eventually lead to huge changes in the solutions.

And yet discussions of the Lorenz system don’t simply show this. Instead, they show trajectories of the system, which make beautiful images, but do not demonstrate the effect of small changes to initial conditions. Or they demonstrate it in two or three dimensions where it’s harder to see.

If you’ve seen the Lorenz system before, this is probably the image that comes to mind.

Lorenz x,z trajectories

This plots (x(t), z(t)) for the solutions to the system of differential equations

x‘= σ(yx)
y‘ = x(ρ – z) – y
z‘ = xy – βz

where σ = 10, ρ = 28, β = 8/3. You could use other values of these parameters, but these were the values originally used by Edward Lorenz in 1963.

The following plots, while not nearly as attractive, are more informative regarding sensitive dependence on initial conditions.

x component for slighly different initial conditions

In this plot, x1 is the x-component of the solution to the Lorenz system with initial condition

(1, 1, 1)

and x2 the x-component corresponding to initial condition

(1, 1, 1.00001).

The top plot is x1 and the bottom plot is x1x2.

Notice first how erratic the x component is. That might not be apparent from looking at plots such as the plot of (x, z) above.

Next, notice that for two solutions that start off slightly different in the z component, the solutions are nearly identical at first: the difference between the two solutions is zero as far as the eye can tell. But soon the difference between the two solutions has about the same magnitude as the solutions themselves.

Below is the Python code used to make the two plots.

    from scipy import linspace
    from scipy.integrate import solve_ivp
    import matplotlib.pyplot as plt

    def lorenz(t, xyz):
        x, y, z = xyz
        s, r, b = 10, 28, 8/3. # parameters Lorentz used
        return [s*(y-x), x*(r-z) - y, x*y - b*z]

    a, b = 0, 40
    t = linspace(a, b, 4000)

    sol1 = solve_ivp(lorenz, [a, b], [1,1,1], t_eval=t)
    sol2 = solve_ivp(lorenz, [a, b], [1,1,1.00001], t_eval=t)

    plt.plot(sol1.y[0], sol1.y[2])
    plt.xlabel("$x$")
    plt.ylabel("$z$")
    plt.savefig("lorenz_xz.png")
    plt.close()

    plt.subplot(211)
    plt.plot(sol1.t, sol1.y[0])
    plt.xlabel("$t$")
    plt.ylabel("$x_1(t)$")
    plt.subplot(212)
    plt.plot(sol1.t, sol1.y[0] - sol2.y[0])
    plt.ylabel("$x_1(t) - x_2(t)$")
    plt.xlabel("$t$")
    plt.savefig("lorenz_x.png")

One important thing to note about the Lorenz system is that it was not contrived to show chaos. Meteorologist and mathematician Edward Lorenz was lead to the system of differential equations that bears his name in the course of his work modeling weather. Lorenz understandably assumed that small changes in initial conditions would lead to small changes in the solutions until numerical solutions convinced him otherwise. Chaos was a shocking discovery, not his goal.

More dynamical systems posts

ODE bifurcation example

A few days ago I wrote about bifurcation for a discrete system. This post looks at a continuous example taken from the book Exploring ODEs.

We’ll consider solutions to the differential equation

y'' = 2\left( \frac{t}{150}-2\right)y  - 4y^3

for two different initial conditions: y(0) = 0.02, y‘(0) = 0 and y(0) = 0.05, y‘(0) = 0.

We’ll solve the ODE with both initial conditions for 0 ≤ t ≤ 600 with the following Python code.

    from scipy import linspace
    from scipy.integrate import solve_ivp
    import matplotlib.pyplot as plt

    def ode(t, z):
        y, yp = z
        return [yp, 2*(-2 + t/150)*y - 4*y**3]

    a, b = 0, 600
    t = linspace(a, b, 900)

    sol1 = solve_ivp(ode, [a, b], [0.02, 0], t_eval=t)
    sol2 = solve_ivp(ode, [a, b], [0.05, 0], t_eval=t)

First let’s look at the solutions for 0 ≤ t ≤ 200.

Here’s the code that produced the plots.

    plt.subplot(211)
    plt.plot(sol1.t[:300], sol1.y[0][:300])
    plt.subplot(212)
    plt.plot(sol2.t[:300], sol2.y[0][:300])
    plt.show()

Note that the vertical scales of the two plots are different in order to show both solutions relative to their initial value for y. The latter solution starts out 2.5x larger, and essentially stays 2.5x larger. The two solutions are qualitatively very similar.

Something unexpected happens if we continue the solutions for larger values of t.

Here’s the code that produced the plots.

    plt.subplot(211)
    plt.plot(sol1.t, sol1.y[0])
    plt.ylim([-1,1])
    plt.subplot(212)
    plt.plot(sol2.t, sol2.y[0])
    plt.ylim([-1,1])
    plt.show()

Now the vertical scales of the two plots are the same. The solutions hit a bifurcation point around t = 300.

More differential equations posts

Software metric outliers

Goodhart’s law says “When a measure becomes a target, it ceases to be a good measure.” That is, when people are rewarded on the basis of some metric, they’ll learn how to improve that metric, but not necessarily in a way that increases what you’re after. Here are three examples of Goodhart’s law related to software development.

  • If you use lines of code to measure software developer productivity, Goodhart’s law predicts that you’ll get more lines of code, but not more productivity. In fact, you decrease productivity by discouraging code reuse.
  • If you evaluate developers by number of comments, you’re likely to get verbose, unhelpful comments that make code harder to maintain.

Despite their flaws and the potential for perverse incentives, I claim it’s worth looking at metric outliers.

When I managed a software development team, I ran software that computed the complexity [1] of all the functions in our code base. One function was 100x more complex than anything else anyone had written. That drew my attention to a problem I was unaware of, or at least had underestimated.

If one function is 50% more complex than another, as measured by the software metric, that does not mean that the former is more complex in terms of the mental burden it places on human readers. But a difference of 10,000% is worth investigating. Maybe there’s a good explanation—maybe it’s machine-generated code, for example—but more likely it’s an indication that there’s a problem.

Code reviews are far better than code metrics. But code metrics could suggest code that should be a priority to review.

More software complexity posts

[1] McCabe complexity, a.k.a. cyclomatic complexity. Essentially the number of paths through a function.

Scaling up and down

There’s a worn-out analogy in software development that you cannot build a skyscraper the same way you build a dog house. The idea is that techniques that will work on a small scale will not work on a larger scale. You need more formality to build large software systems.

The analogy is always applied in one direction: up. It’s always an exhortation to use techniques appropriate for larger projects.

But the analogy works in the other direction as well: it’s inappropriate to build a dog house the same way you’d build a skyscraper. It would be possible to build a dog house the way you’d build a skyscraper, but it would be very expensive. Amateur carpentry methods don’t scale up, but professional construction methods don’t scale down economically.

Bias for over-engineering

There’s a bias toward over-engineering because it works, albeit inefficiently, whereas under-engineering does not. You can use a sledgehammer to do a hammer’s job. It’ll be clumsy, and you might hurt yourself, but it can work. And there are tasks where a hammer just won’t get the job done.

Another reason for the bias toward over-engineering is asymmetric risk. If an over-engineered approach fails, you’ll face less criticism than if a simpler approach fails. As the old saying goes, nobody got fired for choosing IBM.

Context required

Simple solutions require context to appreciate. If you do something simple, you’re open to the criticism “But that won’t scale!” You have to defend your solution by explaining that it will scale far enough, and that it avoids costs associated with scaling further than necessary.

Suppose a group is debating whether to walk or drive to lunch. Someone advocating driving requires less context to make his point. He can simply say “Driving is faster than walking,” which is generally true. The burden is on the person advocating walking to explain why walking would actually be faster under the circumstances.

Writing prompt

I was using some database-like features in Emacs org-mode this morning and that’s what prompted me to write this post. I can just hear someone say “That won’t scale!” I often get this reaction from someone when I write about a simple, low-tech way to do something on a small scale.

Using a text file as a database doesn’t scale. But I have 88 rows, so I think I’ll be OK. A relational database would be better for storing million of records, but that’s not what I’m working on at the moment.

More posts on scale

Cobweb plots

Cobweb plots are a way of visualizing iterations of a function.

cobweb plot for iterating cosine

For a function f and a starting point x, you plot (x, f(x)) as usual. Then since f(x) will be the next value of x, you convert it to an x by drawing a horizontal line from (x, f(x)) to (f(x), f(x)). In other words, you convert the previous y value to an x value by moving to where a horizontal line intersects the line y = x. Then you go up from the new x to f applied to the new x. The Python code below makes this all explicit.

Update: I made a couple changes after this post was first published. I added the dotted line y = x to the plots, and I changed the aspect ratio from the default to 1 to make the horizontal and vertical scales the same.

    import matplotlib.pyplot as plt
    from scipy import cos, linspace
    
    def cobweb(f, x0, N, a=0, b=1):
        # plot the function being iterated
        t = linspace(a, b, N)
        plt.plot(t, f(t), 'k')

        # plot the dotted line y = x
        plt.plot(t, t, "k:")
    
        # plot the iterates
        x, y = x0, f(x0)
        for _ in range(N):
            fy = f(y)        
            plt.plot([x, y], [y,  y], 'b', linewidth=1)
            plt.plot([y, y], [y, fy], 'b', linewidth=1)
            x, y = y, fy
            
        plt.axes().set_aspect(1) 
        plt.show()
        plt.close()

The plot above was made by calling

    cobweb(cos, 1, 20)

to produce the cobweb plot for 20 iterations of cosine starting with x = 1. There’s one fixed point, and the cobweb plot spirals into that fixed point.

Next let’s look at several iterations of the logistic map f(x) = rx(1 – x) for differing values of r.

    # one fixed point
    cobweb(lambda x: 2.9*x*(1-x), 0.1, 100)

    # converging to two-point attractor
    cobweb(lambda x: 3.1*x*(1-x), 0.1, 100)

    # starting exactly on the attractor
    cobweb(lambda x: 3.1*x*(1-x), 0.558, 100)

    # in the chaotic region.
    cobweb(lambda x: 4.0*x*(1-x), 0.1, 100)

The logistic map also has one stable fixed point if r ≤ 3. In the plot below, r = 2.9.

cobweb plot for logistic map, r = 2.9

Next we set r = 3.1. We start at x = 0.1 and converge to the two attractor points.

cobweb plot for logistic map, r = 3.1

If we start exactly on one of the attractor points the cobweb plot is simply a square.

cobweb plot for logistic map, r = 3.1, starting on attractor point

Finally, when we set r = 4 we’re in the chaotic region.

cobweb plot for logistic map, r = 4

More posts on iterated functions

CCPA and expert determination

California’s new CCPA (California Consumer Privacy Act) may become more like HIPAA. In particular, a proposed amendment would apply HIPAA’s standards of expert determination to CCPA.

According to this article,

The California State Senate’s Health Committee recently approved California AB 713, which would amend the California Consumer Privacy Act (CCPA) to except from CCPA requirements additional categories of health information, including data de-identified in accordance with HIPAA and certain medical research data.

Some businesses have been looking to HIPAA by analogy for how to comply with CCPA. HIPAA has been on the books much longer, and what it means to comply with HIPAA is more clearly stated, in regulation itself and in guidance documents. AB 713 would make this appeal to HIPAA more than an analogy.

In particular, CCPA would now have a notion of expert determination. AB 713 explicitly refers to

The deidentification methodology described in Section 164.514(b)(1) of Title 45 of the Code of Federal Regulations, commonly known as the HIPAA expert determination method.

Emphasis added. Taken from 1798.130 (a)(5)(D)(i).

Parsing AB 713

The amendment is hard to read because it doesn’t contain many complete sentences. The portion quoted above doesn’t have a verb. We have to go to up to (a) in the hierarchy before we can find a clear subject and verb:

… a business shall …

It’s not clear to me what the amendment is saying. Rather than trying to parse this myself, I’ll quote what the article linked above says.

AB 713 would except from CCPA requirements de-identified health information when … The information is de-identified in accordance with a HIPAA de-identification method [and two other conditions].

Expert determination

I am not a lawyer; I advise lawyers on statistical matters. I offer statistical advice, not legal advice.

If your lawyer determines that you need HIPAA-style expert determination to comply with CCPA, I can help. I have provided expert determination for many companies and would welcome the opportunity to provide this service for your company as well.

If you’d like discuss expert determination, either for HIPAA or for CCPA, let’s talk.

Variation on cosine fixed point

If you enter any number into a calculator and repeatedly press the cosine key, you’ll eventually get 0.73908513, assuming your calculator is in radian mode [1]. And once you get this value, taking the cosine more times won’t change the number. This is the first example of a fixed point of an iteration that many people see.

Sam Walter posted a variation on the cosine fixed point problem on Twitter recently. If you add up the distance from each iteration value to the fixed point, the sum converges. Furthermore, if you consider this sum as a function of the starting point x, it defines a differentiable function of x.

I was curious what this function looks like, so I wrote the following Python code. First we find the fixed point.

    from math import cos

    epsilon = 1e-16
    x, cx = 1, cos(1)
    while abs(x - cx) > epsilon:
        x, cx = cx, cos(cx)

    alpha = x # fixed point

Now let’s evaluate the function described above.

    def f(x):
        s = 0
        cx = cos(x)
        delta = alpha - cx
        while abs(delta) > epsilon:
            s += delta
            x, cx = cx, cos(cx)
            delta = alpha - cx
        return s

Here’s what the plot of our function looks like.

As we’d expect, it crosses the x-axis at the fixed point.

If we sum the absolute distance to the fixed point at each iteration, we get the following function with a minimum at the fixed point.

Python’s new walrus operator

Incidentally, we can make the code slightly more compact if we take advantage of a new feature in Python 3.8, affectionately known as the “walrus operator.” The := operator lets you do assignments inside a loop test, among other places, much like is commonly done in C.

    def f(x):
        s, cx = 0, cos(x)
        while abs(delta := alpha - cx) > 1e-16:
            s += delta
            x, cx = cx, cos(cx)
        return s

Now we don’t need to compute delta before the loop, and we can update delta and check its absolute value in the same place.

I’m not sure how I feel about the walrus operator. It could make code easier to read or harder to read, depending on context.

One advantage of Python’s walrus over its C analog is that it prevents bugs due to typing = when you mean to type ==. If you really want to do an assignment inside a Boolean test, you’ll need an extra character, the colon in front of the equal sign; using an ordinary equal sign alone is a syntax error.

More fixed point posts

[1] If your calculator is in degree mode, you’ll still get a fixed point, but the fixed point will be 0.99984774 degrees. Note that this is not simply the fixed point for cosine of an angle in radians, converted to degrees. Cosine of x degrees is a different function than cosine of x radians and it has a different fixed point.

Stable and unstable recurrence relations

The previous post looked at computing recurrence relations. That post ends with a warning that recursive evaluations may nor may not be numerically stable. This post will give examples that illustrate stability and instability.

There are two kinds of Bessel functions, denoted J and Y. These are called Bessel functions of the first and second kinds respectively. These functions carry a subscript n denoting their order. Both kinds of Bessel functions satisfy the same recurrence relation:

fn+1 – (2n/x) fn + fn-1 = 0

where f is J or Y.

If you apply the recurrence relation in the increasing direction, it is unstable for J but stable for Y.

If you apply the recurrence relation in the opposite direction, it is stable for J and unstable for Y.

We will illustrate the above claims using the following Python code. Since both kinds of Bessel function satisfy the same recurrence, we pass the Bessel function in as a function argument. SciPy implements Bessel functions of the first kind as jv and Bessel functions of the second kind as yv. [1]

    from scipy import exp, pi, zeros
    from scipy.special import jv, yv

    def report(k, computed, exact):
        print(k, computed, exact, abs(computed - exact)/exact)

    def bessel_up(x, n, f):
        a, b = f(0, x), f(1, x)
        for k in range(2, n+1):
            a, b = b, 2*(k-1)*b/x - a
            report(k, b, f(k,x))     

    def bessel_down(x, n, f):
        a, b = f(n,x), f(n-1,x)
        for k in range(n-2, -1, -1):
            a, b = b, 2*(k+1)*b/x - a
            report(k, b, f(k,x))

We try this out as follows:

    bessel_up(1, 20, jv)
    bessel_down(1, 20, jv)
    bessel_up(1, 20, yv)
    bessel_down(1, 20, yv)

When we compute Jn(1) using bessel_up, the relative error starts out small and grows to about 1% when n = 9. The relative error increases rapidly from there. When n = 10, the relative error is 356%.

For n = 20, the recurrence gives a value of 316894.36 while the true value is 3.87e-25, i.e. the computed value is 30 orders of magnitude larger than the correct value!

When we use bessel_down, the results are correct to full precision.

Next we compute Yn(1) using bessel_up the results are correct to full precision.

When we compute Yn(1) using bessel_down, the results are about as bad as computing Jn(1) using bessel_up. We compute Y0(1) as 0 5.7e+27 while the correct value is roughly 0.088.

There are functions, such as Legendre polynomials, whose recurrence relations are stable in either direction, at least for some range of inputs. But it would be naive to assume that a recurrence is stable without some exploration.

Miller’s algorithm

There is a trick for using the downward recurrence for Bessel functions known as Miller’s algorithm. It sounds crazy at first: Assume JN(x) = 1 and JN+1(x) = 0 for some large N, and run the recurrence downward.

Since we don’t know JN(x) our results be off by some constant proportion. But there’s a way to find out what that proportionality constant is using the relation described here.

1 = J_0(x) + 2J_2(x) + 2J_4(x) + 2J_6(x) + \cdots

We add up out computed values for the terms on the right side, then divide by the sum to normalize our estimates. Miller’s recurrence algorithm applies more generally to other recurrences where the downward recurrence is stable and there exists a normalization identity analogous to the one for Bessel functions.

The following code lets us experiment with Miller’s algorithm.

    def miller(x, N):
        j = zeros(N) # array to store values
        
        a, b = 0, 1
        for k in range(N-1, -1, -1):
            a, b = b, 2*(k+1)*b/x - a
            j[k] = b
            
        norm = j[0] + sum(2*j[k] for k in range(2,N,2))
        j /= norm
        
        for k in range(N-1, -1, -1):
            expected, computed = j[k], jv(k,x)
            report(k, j[k], jv(k,x))

When we call miller(pi, 20) we see that Miller’s method computes Jn(π) accurately. The error starts out moderately small and decreases until the results are accurate to floating point precision.

    |----+------------|
    |  k | rel. error |
    |----+------------|
    | 19 |   3.91e-07 |
    | 17 |   2.35e-09 |
    | 16 |   2.17e-11 |
    | 15 |   2.23e-13 |
    | 14 |   3.51e-15 |
    |----+------------|

For smaller k the relative error is also around 10-15, i.e. essentially full precision.

[1] Why do the SciPy names end in “v”? The order of a Bessel function does not have to be an integer. It could be any real number, and the customary mathematical notation is to use a Greek letter ν (nu) as a subscript rather than n as a reminder that the subscript might not represent an integer. Since a Greek ν looks similar to an English v, SciPy uses v as a sort of approximation of ν.

Recurrence relations and Python

A video by Raymond Hettinger points out that simultaneous assignment makes it much easier to understand code that evaluates a recurrence relation. His examples were in Python, but the same principle applies to any language supporting simultaneous evaluation.

The simplest example of simultaneous evaluation is swapping two variables:

   a, b = b, a

Compare this to

    temp = a
    a = b
    b = temp

The latter is more code, but more importantly it exposes intermediate steps that could be confusing if the code were more complicated. This is the case when evaluating recurrence relations.

The most famous recurrence relation is the Fibonacci sequence, but I’ll use a difference example because Fibonacci is overdone. Also, I think it helps to see a slightly more complicated example.

As I wrote earlier here, the first two Hermite polynomials are given by H0(x) = 1 and H1(x) = x. The rest are given via the recurrence relation

Hn+1(x) = x Hnn Hn-1(x).

If we have a particular value of x, say x = 3, and we want to find H10(x) we could do so as follows.

    x = 3
    a, b = 1, x
    for n in range(2, 11):
        a, b = b, x*b - n*a

After this code runs, b will contain H10(3). [1]

At each iteration, the calculations on the right side of the equal sign are carried out, then the assignments to the elements on the left side are made. You don’t have to write explicit and confusing code with variables like a_old and a_new.

Three-term recurrence relations come up constantly in application. For example, all orthogonal polynomials satisfy some three-term recurrence analogous to the one given above for Hermite polynomials.

Power series solutions for differential equations lead to recurrence relations for the coefficients. Second order ODEs give rise to three-term recurrence relations. In general nth order ODEs give rise to n+1 term relations. By far the most common value of n in application is 2, but higher values come up occasionally.

A third-order ODE that leads to a four-term recurrence could be implemented in Python with code of the form

   a, b, c = b, c, f(a,b,c)

See this post for an example of a recently discovered three-term recurrence relation.

One final comment on recurrence relations: recurrences that hold in theory may not be useful in practice. Repeated evaluation of a recurrence might have problems with numerical stability. That’s the topic for the next post.

***

[1] If you’re not used to Python’s range generator, you might be surprised that the second argument is 11 rather than 10. This is because range(a,b) uses half-open intervals, i.e. it generates values n with an < b. This has its advantages. For example, it composes well under unions: the union of range(a,b) and range(b,c) is range(a,c). This wouldn’t be the case if range returns its upper limit.