Mathematical stability vs numerical stability

Is 0 a stable fixed point of f(x) = 4x (1-x)?

If you set x = 0 and compute f(x) you will get exactly 0. Apply f a thousand times and you’ll never get anything but zero.

But this does not mean 0 is a stable attractor, and in fact it is not stable.

It’s easy to get mathematical stability and numerical stability confused, because the latter often illustrates the former. In this post I point out that the points on a particular unstable orbit cannot be represented exactly in floating point numbers, so iterations that start at the floating  point representations of these points will drift away.

In the example here, 0 can be represented exactly, and so we do have a computationally stable fixed point. But 0 is not a stable fixed point in the mathematical sense. Any starting point close to 0 but not exactly 0 will eventually go all over the place.

If we start at some very small ε > 0, then f(ε) ≈ 4ε. Every iteration multiplies the result by approximately 4, and eventually the result is large enough that the approximation no longer holds.

For example, suppose we start with x = 10-100. After 100 iterations we have

x = 4100 10-100 = 1.6 × 10-40.

If we were to plot this, we wouldn’t see any movement. But by the time we iterate our function 200 times, the result is chaotic. I get the following:

Does chaos imply period 3?

Sharkovskii’s theorem says that if a continuous map f from an interval I to itself has a point with period 3, then it has a point with period 5. And if it has a point with period 5, then it has points with order 7, etc. The theorem has a chain of implications that all begins with 3. If you have points with period 3, you have points of all periods.

Now suppose you want to go “upstream” in Sharkovskii’s chain of implications. If you have a point with period 5, do you have a point with period 3? The answer is no: a map can have points with period 5 without having points of period 3, though it necessarily has points with all other positive integer periods except 3.

There’s an example illustrating the claim above that I’ve seen in multiple places, but I haven’t seen it presented graphically. You could work through the example analytically, but here I present it graphically.

This is the example function written in Python.

    def f(x):
        assert(1 <= x <= 5)
        if x < 2:
            return 1 + 2*x
        if x < 3:
            return 7 - x
        if x < 4:
            return 10 - 2*x
        if x <= 5:
            return 6 - x

Here’s a graphical demonstration that f has a fixed point, but no points of period 3.

The only point fixed under applying f three times is the point that was already fixed under applying f once.

This graph shows that f has points with period 5:

By Sharkovskii’s theorem f must have points with all other periods, except 3. Here’s a demonstration that it has points with period 6.

The map f is chaotic, but it does not have a point with period 3.

Logistic map

Let’s look at the most famous chaotic map, the logistic map.

f(x) = rx (1 – x)

where x is in [0, 1] and r is in [0. 4].

logistic bifurcation diagram

The images above shows orbits as r ranges over [0, 4]. Clearly f has points with period 2. There’s a whole interval of values of r that lead to points with period 2, roughly for r between 3 and 3.5. And we can see for r a little bigger there are points of period 4. But is there any point with period 3?

We can look for points of period 3 at the end of the plot, where r = 4, using Mathematica.


    f[x_] := 4 x (1 - x)

and look for points where f³(x) = x using

    Factor[f[f[f[x]]] - x]

This shows that the solutions are the roots of

x (-3 + 4 x) (-7 + 56x – 112x² + 64 x³) (-3 + 36x – 96x² + 64x³)

The first two roots are fixed points, points of period 1, but the roots of the two cubic factors are points with period 3.

The cubics clearly have all their roots in the interval [0,1] and we could find their numerical values with

    NSolve[f[f[f[x]]] == x, x]

Although the roots are fixed points, they are unstable fixed points, as demonstrated at the bottom the next post.

Related posts

Sarkovsky’s theorem

The previous post explained what is meant by period three implies chaos. This post is a follow-on that looks at Sarkovsky’s theorem, which is mostly a generalization of that theorem, but not entirely [1].

First of all, Mr. Sarkovsky is variously known Sharkovsky, Sharkovskii, etc. As with many Slavic names, his name can be anglicized multiple ways. You might use the regular expression Sh?arkovsk(ii|y) in a search.

The theorem in the previous post, by Li and Yorke, says that if a continuous function from a closed interval to itself has a point with period three, it has points with all positive periods. This was published in 1975.

Unbeknownst to Li and Yorke, and everyone else in the West at the time, Sarkovsky had published a more general result in 1964 in a Ukrainian journal. He demonstrated a total order on the positive integers so that the existence of a point with a given period implies the existence of points with all periods further down the sequence. The sequence starts with 3, and every other positive integer is in the sequence somewhere, so period 3 implies the rest.

Sarkivsky showed that period 3 implies period 5, period 5 implies period 7, period 7 implies period 9, etc. If a continuous map of an interval to itself has a point of odd period n > 1, it has points with order given by all odd numbers larger than n. That is, Sarkovsky’s order starts out

3 > 5 > 7 > …

The sequence continues

… 2×3 > 2×5 > 2×7 > …


… 2²×3 > 2²×5 > 2²×7 > …


… 2³×3 > 2³×5 > 2³×7 > …

and so on for all powers of 2 times odd numbers greater than 1.

The sequence ends with the powers of 2 in reverse order

… 2³ > 2² > 1.

Here’s Python code to determine whether period m implies period n, assuming m and n are not equal.

from sympy import factorint

# Return whether m comes befor n in Sarkovsky order
def before(m, n):

    assert(m != n)

    if m == 1 or n == 1:
        return m > n

    m_factors = factorint(m)
    n_factors = factorint(n)

    m_odd = 2 not in m_factors
    n_odd = 2 not in n_factors
    m_power_of_2 = len(m_factors) == 1 and not m_odd
    n_power_of_2 = len(n_factors) == 1 and not n_odd

    if m_odd:
        return m < n if n_odd else True if m_power_of_2: return m > n if n_power_of_2 else False

    # m is even and not a power of 2    
    if n_odd:
        return False
    if n_power_of_2:
        return True
    if m_factors[2] < n_factors[2]: return True if m_factors[2] > n_factors[2]:
        return False  
    return m < n

Next post: Can you swim “upstream” in Sarkovsky’s order?

[1] There are two parts to the paper of Li and Yorke. First, that period three implies all other periods. This is a very special case of Sarkovsky’s theorem. But Li and Yorke also proved that period three implies an uncountable number of non-periodic points, which is not part of Sarkovsky’s paper.

Period three implies chaos

One of the most famous theorems in chaos theory, maybe the most famous, is that “period three implies chaos.” This compact statement comes from the title of a paper [1] by the same name. But what does it mean?

This post will look at what the statement means, and the next post will look at a generalization.


First of all, the theorem refers to period in the sense of function iterations, not in the sense of translations. And it applies to particular points, not to the function as a whole.

The sine function is periodic in the sense that it doesn’t change if you shift it by 2π. That is,

sin(2π+x) = sin(x)

for all x. The sine function has period 2π in the sense of translations.

In dynamical systems, period refers to getting the same result, not when you shift a function, but when you apply it to itself. A point x has period n under a function f if applying the function n times gives you x back, but applying it any less than n times does not.

So, for example, the function f(x) = –x has period 2 for non-zero x, and period 1 for x = 0.


Period three specifically means

f( f( f(x) ) ) = x




xf( f(x) ).

Note that this is a property of x, not a property of f per se. That is, it is a property of f that one such x exists, but it’s not true of all points.

In fact it’s necessarily far from true for all points, which leads to what we mean by chaos.


If f is a continuous function from some interval I to itself, it cannot be the case that all points have period 3.

If one point has period 3, then some other point must have period 4. And another point must have period 97. And some point has period 1776.

If some point in I has period 3, then there are points in I that have period n for all positive n. Buy one, get infinitely many for free.

And there are some points that are not periodic, but every point in I is arbitrarily close to a point that is periodic. That is, the periodic points are dense in the interval. That is what is meant by chaos [2].

This is really an amazing theorem. It says that if there is one point that satisfies a simple condition (period three) then the function as a whole must have very complicated behavior as far as iteration is concerned.


If the existence of a point with period 3 implies the existence of points with every other period, what can you say about, for example, period 10? That’s the subject of the next post.

[1] Li, T.Y.; Yorke, J.A. (1975). “Period Three Implies Chaos” American Mathematical Monthly. 82 (10): 985–92.

[2] There is no single definition of chaos. Some take the existence of dense periodic orbits as the defining characteristic of a chaotic system. See, for example, [3].

[3] Bernd Aulbach and Bernd Kieninger. On Three Definitions of Chaos. Nonlinear Dynamics and Systems Theory, 1(1) (2001) 23–37

Unexpected square wave

Last night a friend from Vanderbilt, Father John Rickert, sent me a curious math problem. (John was a PhD student in the math department while I was a postdoc. He went on to be a Catholic priest after graduating.) He said that if you look at iterates of

f(x) = exp( sin(x) )

the plots become square.

Here’s a plot to start the discussion, looking at f(x), f(f(x)), f(f(f(x))), and f(f(f(f(x)))).

One thing that is apparent is that there’s a big difference between applying f once and applying it at least two times. The former can range between 1/e and e, but the latter must be between exp(1/e) and e.

The plots overlap in a way that’s hard to understand, so let’s spread them out, plotting one iteration at a time.

Now we can see the plots becoming flatter as the number of iterations increases. We can also see that even and odd iterations are roughly mirror images of each other. This suggests we should make a plot of at just even or just odd iterates. Here we go:

This shows more clearly how the plots are squaring off pretty quickly as the number of iterations increases.

My first thought when John showed me his plot was that it must have something to do with the contraction mapping theorem. And I suppose it does, but not in a simple way. The function f(x) is not a contraction mapping for any x. But f(f(x)) is a contraction mapping for some x, and further iterates are contractions for more values of x.

Update: See the next post calculates the fixed points of f(f(x)) and demonstrates how the stable fixed points converge and the unstable fixed point does not.

My second thought was that it would be interesting to look at the Fourier transform of these iterates. For any finite number of iterations, the result is a periodic, analytic function, and so eventually the Fourier coefficients must go to zero rapidly. But I suspect they initially go to zero slowly, like those of a square wave would. I intend to update this post after I’ve had a chance to look at the Fourier coefficients.

Update: As expected, the Fourier coefficients decay slowly. I plotted the Fourier sine coefficients for f(f(f(f(x)))) using Mathematica.

    f[x_] := Exp[Sin[x]]
    g[x_] := f[f[f[f[x]]]]
    ListPlot[ Table[
        NFourierSinCoefficient[g[x], x, i], 
        {i, 30}]

This produced the plot below.

The even-numbered coefficients are zero, but the odd-numbered coefficients are going to zero very slowly. Since the function g(x) is infinitely differentiable, for any k you can go far enough out in the Fourier series that the coefficients go to zero faster than nk. But initially the Fourier coefficients decay like 1/n, which is typical of a discontinuous function, like a square wave, rather than an infinitely differentiable function.

By the way, if you replace sine with cosine, you get similar plots, but with half as many flat spots per period.

Related posts

A brief comment on hysteresis

You might hear hysteresis described as a phenomena where the solution to a differential equation depends on its history. But that doesn’t make sense: the solution to a differential equation always depends on its history. The solution at any point in time depends (only) on its immediately preceding state. You can take the state at any particular time, say position and velocity, as the initial condition for the solution going forward.

But sometimes there’s something about the solution to a differential equation that does not depend on its history. For a driven linear oscillator, the frequency of the solution depends only on the frequency of the driving function. But this is not necessarily true of a nonlinear oscillator. The same driving frequency might correspond to two different solution frequencies, depending on whether the frequency of the driving function increased to its final value or decreased to that value.

Hysteresis is interesting, but it’s more remarkable that linear system does not exhibit hysteresis. Why should the solution frequency depend only on the driving frequency and not on the state of the system?

The future of system with hysteresis still depends only on its immediately preceding state. But there may be a thin set of states that lead to solutions with a given frequency, and the easiest way to arrive in such a set may be to sneak up on it slowly.

Related posts

Behold! The Brusselator!

Having watched a few episodes of Phineas and Ferb, when I see “Brusselator” I imagine Dr. Doofenschmertz saying “Behold! The Brusselator!”

But the Brusselator is considerably older than Phineas and Ferb. It goes back to Belgian scientists René Lefever and Grégoire Nicolis in 1971 [1] who combined “Brussels” and “oscillator” to name the system after their institution, Université Libre de Bruxelles. It’s a dynamical system that has its origins in modeling chemical reactions.

\begin{align*} x' &= A + x^2 y - (B+1)x \\ y' &= Bx - x^2 y \end{align*}

The phase plot below shows that as you start from multiple initial conditions, you always end up on the same limit cycle.

Brusselator phase plot

Here’s the Python code that produced the plot.

    from scipy import linspace
    from scipy.integrate import solve_ivp
    import matplotlib.pyplot as plt

    A, B = 1, 3

    def brusselator(t, z):
        x, y = z
        return [A + x*x*y - (B+1)*x, B*x - x*x*y]

    a, b = 0, 10
    t = linspace(a, b, 1000)

    for x0 in range(0, 6):
        for y0 in [0, 3]:
            sol = solve_ivp(brusselator, [a, b], [x0, y0], t_eval=t)
            plt.plot(sol.y[0], sol.y[1], ":", color="tab:blue")

More dynamical systems posts

[1] R. Lefever and G. Nicholis. Chemical instabilities and sustained oscillations. Journal of Theoretical Biology. Volume 30, Issue 2, February 1971, Pages 267-284

A different view of the Lorenz system

The Lorenz system is a canonical example of chaos. Small changes in initial conditions eventually lead to huge changes in the solutions.

And yet discussions of the Lorenz system don’t simply show this. Instead, they show trajectories of the system, which make beautiful images, but do not demonstrate the effect of small changes to initial conditions. Or they demonstrate it in two or three dimensions where it’s harder to see.

If you’ve seen the Lorenz system before, this is probably the image that comes to mind.

Lorenz x,z trajectories

This plots (x(t), z(t)) for the solutions to the system of differential equations

x‘= σ(yx)
y‘ = x(ρ – z) – y
z‘ = xy – βz

where σ = 10, ρ = 28, β = 8/3. You could use other values of these parameters, but these were the values originally used by Edward Lorenz in 1963.

The following plots, while not nearly as attractive, are more informative regarding sensitive dependence on initial conditions.

x component for slighly different initial conditions

In this plot, x1 is the x-component of the solution to the Lorenz system with initial condition

(1, 1, 1)

and x2 the x-component corresponding to initial condition

(1, 1, 1.00001).

The top plot is x1 and the bottom plot is x1x2.

Notice first how erratic the x component is. That might not be apparent from looking at plots such as the plot of (x, z) above.

Next, notice that for two solutions that start off slightly different in the z component, the solutions are nearly identical at first: the difference between the two solutions is zero as far as the eye can tell. But soon the difference between the two solutions has about the same magnitude as the solutions themselves.

Below is the Python code used to make the two plots.

    from scipy import linspace
    from scipy.integrate import solve_ivp
    import matplotlib.pyplot as plt

    def lorenz(t, xyz):
        x, y, z = xyz
        s, r, b = 10, 28, 8/3. # parameters Lorentz used
        return [s*(y-x), x*(r-z) - y, x*y - b*z]

    a, b = 0, 40
    t = linspace(a, b, 4000)

    sol1 = solve_ivp(lorenz, [a, b], [1,1,1], t_eval=t)
    sol2 = solve_ivp(lorenz, [a, b], [1,1,1.00001], t_eval=t)

    plt.plot(sol1.y[0], sol1.y[2])

    plt.plot(sol1.t, sol1.y[0])
    plt.plot(sol1.t, sol1.y[0] - sol2.y[0])
    plt.ylabel("$x_1(t) - x_2(t)$")

One important thing to note about the Lorenz system is that it was not contrived to show chaos. Meteorologist and mathematician Edward Lorenz was lead to the system of differential equations that bears his name in the course of his work modeling weather. Lorenz understandably assumed that small changes in initial conditions would lead to small changes in the solutions until numerical solutions convinced him otherwise. Chaos was a shocking discovery, not his goal.

More dynamical systems posts

Cobweb plots

Cobweb plots are a way of visualizing iterations of a function.

cobweb plot for iterating cosine

For a function f and a starting point x, you plot (x, f(x)) as usual. Then since f(x) will be the next value of x, you convert it to an x by drawing a horizontal line from (x, f(x)) to (f(x), f(x)). In other words, you convert the previous y value to an x value by moving to where a horizontal line intersects the line y = x. Then you go up from the new x to f applied to the new x. The Python code below makes this all explicit.

Update: I made a couple changes after this post was first published. I added the dotted line y = x to the plots, and I changed the aspect ratio from the default to 1 to make the horizontal and vertical scales the same.

    import matplotlib.pyplot as plt
    from scipy import cos, linspace
    def cobweb(f, x0, N, a=0, b=1):
        # plot the function being iterated
        t = linspace(a, b, N)
        plt.plot(t, f(t), 'k')

        # plot the dotted line y = x
        plt.plot(t, t, "k:")
        # plot the iterates
        x, y = x0, f(x0)
        for _ in range(N):
            fy = f(y)        
            plt.plot([x, y], [y,  y], 'b', linewidth=1)
            plt.plot([y, y], [y, fy], 'b', linewidth=1)
            x, y = y, fy

The plot above was made by calling

    cobweb(cos, 1, 20)

to produce the cobweb plot for 20 iterations of cosine starting with x = 1. There’s one fixed point, and the cobweb plot spirals into that fixed point.

Next let’s look at several iterations of the logistic map f(x) = rx(1 – x) for differing values of r.

    # one fixed point
    cobweb(lambda x: 2.9*x*(1-x), 0.1, 100)

    # converging to two-point attractor
    cobweb(lambda x: 3.1*x*(1-x), 0.1, 100)

    # starting exactly on the attractor
    cobweb(lambda x: 3.1*x*(1-x), 0.558, 100)

    # in the chaotic region.
    cobweb(lambda x: 4.0*x*(1-x), 0.1, 100)

The logistic map also has one stable fixed point if r ≤ 3. In the plot below, r = 2.9.

cobweb plot for logistic map, r = 2.9

Next we set r = 3.1. We start at x = 0.1 and converge to the two attractor points.

cobweb plot for logistic map, r = 3.1

If we start exactly on one of the attractor points the cobweb plot is simply a square.

cobweb plot for logistic map, r = 3.1, starting on attractor point

Finally, when we set r = 4 we’re in the chaotic region.

cobweb plot for logistic map, r = 4

More posts on iterated functions

Logistic trajectories

This post is a follow-on to the post on how to make the logistic bifurcation diagram below.

logistic bifurcation diagram

That post plotting the attractors for iterations of

f(x) = r x(1 – x).

This post will plot a few trajectories over time for differing starting points and varying values of r.

For values of r less than 3, the iterations converge to the same point regardless of where they start. The fixed point only depends on r. In the plot below, r = 2.9. We start from two different initial points, x = 0.55 and x = 0.71. Both trajectories oscillate toward a common limiting value. (As explained in the earlier post, the limiting value is (r-1)/r = 0.6786.)

For values of r a little greater than 3, the iterations oscillate between two values. Here we chose r = 3.1, and again we start from x = 0.55 and x = 0.71.

two-point attractor

For some larger value of r the iterations rotate between four values, then eight, etc. But for even larger values of r the trajectories are chaotic. Nearby starting points lead to divergent iterations. In the plot below, r = 3.7, which is in the chaotic region. We start with points close to each other, x = 0.5001 and x = 0.5002.

chaotic trajectories

Notice how the blue crosses and green x’s are on top of each other for the first 30 iterations or so, then they diverge.

These plots were created using the following Python code.

import numpy as np
import matplotlib.pyplot as plt

def f(x, r):
    return r*x*(1-x)

def time_plot(x0, y0, r, name):
    N = 80
    x = np.zeros(N)
    y = np.zeros(N)
    x[0] = x0
    y[0] = y0
    for i in range(1,N):
        x[i] = f(x[i-1], r)
        y[i] = f(y[i-1], r)

    t = np.arange(N)
    plt.plot(t, x, "b+")
    plt.plot(t, y, "gx")
    plt.legend([str(x0), str(y0)])
    plt.title(f"r = {r}")

More on sensitivity