Period three implies chaos

One of the most famous theorems in chaos theory, maybe the most famous, is that “period three implies chaos.” This compact statement comes from the title of a paper [1] by the same name. But what does it mean?

This post will look at what the statement means, and the next post will look at a generalization.


First of all, the theorem refers to period in the sense of function iterations, not in the sense of translations. And it applies to particular points, not to the function as a whole.

The sine function is periodic in the sense that it doesn’t change if you shift it by 2π. That is,

sin(2π+x) = sin(x)

for all x. The sine function has period 2π in the sense of translations.

In dynamical systems, period refers to getting the same result, not when you shift a function, but when you apply it to itself. A point x has period n under a function f if applying the function n times gives you x back, but applying it any less than n times does not.

So, for example, the function f(x) = –x has period 2 for non-zero x, and period 1 for x = 0.


Period three specifically means

f( f( f(x) ) ) = x




xf( f(x) ).

Note that this is a property of x, not a property of f per se. That is, it is a property of f that one such x exists, but it’s not true of all points.

In fact it’s necessarily far from true for all points, which leads to what we mean by chaos.


If f is a continuous function from some interval I to itself, it cannot be the case that all points have period 3.

If one point has period 3, then some other point must have period 4. And another point must have period 97. And some point has period 1776.

If some point in I has period 3, then there are points in I that have period n for all positive n. Buy one, get infinitely many for free.

And there are some points that are not periodic, but every point in I is arbitrarily close to a point that is periodic. That is, the periodic points are dense in the interval. That is what is meant by chaos [2].

This is really an amazing theorem. It says that if there is one point that satisfies a simple condition (period three) then the function as a whole must have very complicated behavior as far as iteration is concerned.


If the existence of a point with period 3 implies the existence of points with every other period, what can you say about, for example, period 10? That’s the subject of the next post.

More posts on dynamical systems and chaos.

[1] Li, T.Y.; Yorke, J.A. (1975). “Period Three Implies Chaos” American Mathematical Monthly. 82 (10): 985–92.

[2] There is no single definition of chaos. Some take the existence of dense periodic orbits as the defining characteristic of a chaotic system. See, for example, [3].

[3] Bernd Aulbach and Bernd Kieninger. On Three Definitions of Chaos. Nonlinear Dynamics and Systems Theory, 1(1) (2001) 23–37

Unexpected square wave

Last night a friend from Vanderbilt, Father John Rickert, sent me a curious math problem. (John was a PhD student in the math department while I was a postdoc. He went on to become a Catholic priest after graduating.) He said that if you look at iterates of

f(x) = exp( sin(x) )

the plots become square.

Here’s a plot to start the discussion, looking at f(x), f(f(x)), f(f(f(x))), and f(f(f(f(x)))).

One thing that is apparent is that there’s a big difference between applying f once and applying it at least two times. The former can range between 1/e and e, but the latter must be between exp(1/e) and e.

The plots overlap in a way that’s hard to understand, so let’s spread them out, plotting one iteration at a time.

Now we can see the plots becoming flatter as the number of iterations increases. We can also see that even and odd iterations are roughly mirror images of each other. This suggests we should make a plot of at just even or just odd iterates. Here we go:

This shows more clearly how the plots are squaring off pretty quickly as the number of iterations increases.

My first thought when John showed me his plot was that it must have something to do with the contraction mapping theorem. And I suppose it does, but not in a simple way. The function f(x) is not a contraction mapping for any x. But f(f(x)) is a contraction mapping for some x, and further iterates are contractions for more values of x.

Update: See the next post calculates the fixed points of f(f(x)) and demonstrates how the stable fixed points converge and the unstable fixed point does not.

My second thought was that it would be interesting to look at the Fourier transform of these iterates. For any finite number of iterations, the result is a periodic, analytic function, and so eventually the Fourier coefficients must go to zero rapidly. But I suspect they initially go to zero slowly, like those of a square wave would. I intend to update this post after I’ve had a chance to look at the Fourier coefficients.

Update: As expected, the Fourier coefficients decay slowly. I plotted the Fourier sine coefficients for f(f(f(f(x)))) using Mathematica.

    f[x_] := Exp[Sin[x]]
    g[x_] := f[f[f[f[x]]]]
    ListPlot[ Table[
        NFourierSinCoefficient[g[x], x, i], 
        {i, 30}]

This produced the plot below.

The even-numbered coefficients are zero, but the odd-numbered coefficients are going to zero very slowly. Since the function g(x) is infinitely differentiable, for any k you can go far enough out in the Fourier series that the coefficients go to zero faster than nk. But initially the Fourier coefficients decay like 1/n, which is typical of a discontinuous function, like a square wave, rather than an infinitely differentiable function.

By the way, if you replace sine with cosine, you get similar plots, but with half as many flat spots per period.

Related posts

A brief comment on hysteresis

You might hear hysteresis described as a phenomena where the solution to a differential equation depends on its history. But that doesn’t make sense: the solution to a differential equation always depends on its history. The solution at any point in time depends (only) on its immediately preceding state. You can take the state at any particular time, say position and velocity, as the initial condition for the solution going forward.

But sometimes there’s something about the solution to a differential equation that does not depend on its history. For a driven linear oscillator, the frequency of the solution depends only on the frequency of the driving function. But this is not necessarily true of a nonlinear oscillator. The same driving frequency might correspond to two different solution frequencies, depending on whether the frequency of the driving function increased to its final value or decreased to that value.

Hysteresis is interesting, but it’s more remarkable that linear system does not exhibit hysteresis. Why should the solution frequency depend only on the driving frequency and not on the state of the system?

The future of system with hysteresis still depends only on its immediately preceding state. But there may be a thin set of states that lead to solutions with a given frequency, and the easiest way to arrive in such a set may be to sneak up on it slowly.

Related posts

Behold! The Brusselator!

Having watched a few episodes of Phineas and Ferb, when I see “Brusselator” I imagine Dr. Doofenschmertz saying “Behold! The Brusselator!”

But the Brusselator is considerably older than Phineas and Ferb. It goes back to Belgian scientists René Lefever and Grégoire Nicolis in 1971 [1] who combined “Brussels” and “oscillator” to name the system after their institution, Université Libre de Bruxelles. It’s a dynamical system that has its origins in modeling chemical reactions.

\begin{align*} x' &= A + x^2 y - (B+1)x \\ y' &= Bx - x^2 y \end{align*}

The phase plot below shows that as you start from multiple initial conditions, you always end up on the same limit cycle.

Brusselator phase plot

Here’s the Python code that produced the plot.

    from scipy import linspace
    from scipy.integrate import solve_ivp
    import matplotlib.pyplot as plt

    A, B = 1, 3

    def brusselator(t, z):
        x, y = z
        return [A + x*x*y - (B+1)*x, B*x - x*x*y]

    a, b = 0, 10
    t = linspace(a, b, 1000)

    for x0 in range(0, 6):
        for y0 in [0, 3]:
            sol = solve_ivp(brusselator, [a, b], [x0, y0], t_eval=t)
            plt.plot(sol.y[0], sol.y[1], ":", color="tab:blue")

More dynamical systems posts

[1] R. Lefever and G. Nicholis. Chemical instabilities and sustained oscillations. Journal of Theoretical Biology. Volume 30, Issue 2, February 1971, Pages 267-284

A different view of the Lorenz system

The Lorenz system is a canonical example of chaos. Small changes in initial conditions eventually lead to huge changes in the solutions.

And yet discussions of the Lorenz system don’t simply show this. Instead, they show trajectories of the system, which make beautiful images, but do not demonstrate the effect of small changes to initial conditions. Or they demonstrate it in two or three dimensions where it’s harder to see.

If you’ve seen the Lorenz system before, this is probably the image that comes to mind.

Lorenz x,z trajectories

This plots (x(t), z(t)) for the solutions to the system of differential equations

x‘= σ(yx)
y‘ = x(ρ – z) – y
z‘ = xy – βz

where σ = 10, ρ = 28, β = 8/3. You could use other values of these parameters, but these were the values originally used by Edward Lorenz in 1963.

The following plots, while not nearly as attractive, are more informative regarding sensitive dependence on initial conditions.

x component for slighly different initial conditions

In this plot, x1 is the x-component of the solution to the Lorenz system with initial condition

(1, 1, 1)

and x2 the x-component corresponding to initial condition

(1, 1, 1.00001).

The top plot is x1 and the bottom plot is x1x2.

Notice first how erratic the x component is. That might not be apparent from looking at plots such as the plot of (x, z) above.

Next, notice that for two solutions that start off slightly different in the z component, the solutions are nearly identical at first: the difference between the two solutions is zero as far as the eye can tell. But soon the difference between the two solutions has about the same magnitude as the solutions themselves.

Below is the Python code used to make the two plots.

    from scipy import linspace
    from scipy.integrate import solve_ivp
    import matplotlib.pyplot as plt

    def lorenz(t, xyz):
        x, y, z = xyz
        s, r, b = 10, 28, 8/3. # parameters Lorentz used
        return [s*(y-x), x*(r-z) - y, x*y - b*z]

    a, b = 0, 40
    t = linspace(a, b, 4000)

    sol1 = solve_ivp(lorenz, [a, b], [1,1,1], t_eval=t)
    sol2 = solve_ivp(lorenz, [a, b], [1,1,1.00001], t_eval=t)

    plt.plot(sol1.y[0], sol1.y[2])

    plt.plot(sol1.t, sol1.y[0])
    plt.plot(sol1.t, sol1.y[0] - sol2.y[0])
    plt.ylabel("$x_1(t) - x_2(t)$")

One important thing to note about the Lorenz system is that it was not contrived to show chaos. Meteorologist and mathematician Edward Lorenz was lead to the system of differential equations that bears his name in the course of his work modeling weather. Lorenz understandably assumed that small changes in initial conditions would lead to small changes in the solutions until numerical solutions convinced him otherwise. Chaos was a shocking discovery, not his goal.

More dynamical systems posts

Cobweb plots

Cobweb plots are a way of visualizing iterations of a function.

cobweb plot for iterating cosine

For a function f and a starting point x, you plot (x, f(x)) as usual. Then since f(x) will be the next value of x, you convert it to an x by drawing a horizontal line from (x, f(x)) to (f(x), f(x)). In other words, you convert the previous y value to an x value by moving to where a horizontal line intersects the line y = x. Then you go up from the new x to f applied to the new x. The Python code below makes this all explicit.

Update: I made a couple changes after this post was first published. I added the dotted line y = x to the plots, and I changed the aspect ratio from the default to 1 to make the horizontal and vertical scales the same.

    import matplotlib.pyplot as plt
    from scipy import cos, linspace
    def cobweb(f, x0, N, a=0, b=1):
        # plot the function being iterated
        t = linspace(a, b, N)
        plt.plot(t, f(t), 'k')

        # plot the dotted line y = x
        plt.plot(t, t, "k:")
        # plot the iterates
        x, y = x0, f(x0)
        for _ in range(N):
            fy = f(y)        
            plt.plot([x, y], [y,  y], 'b', linewidth=1)
            plt.plot([y, y], [y, fy], 'b', linewidth=1)
            x, y = y, fy

The plot above was made by calling

    cobweb(cos, 1, 20)

to produce the cobweb plot for 20 iterations of cosine starting with x = 1. There’s one fixed point, and the cobweb plot spirals into that fixed point.

Next let’s look at several iterations of the logistic map f(x) = rx(1 – x) for differing values of r.

    # one fixed point
    cobweb(lambda x: 2.9*x*(1-x), 0.1, 100)

    # converging to two-point attractor
    cobweb(lambda x: 3.1*x*(1-x), 0.1, 100)

    # starting exactly on the attractor
    cobweb(lambda x: 3.1*x*(1-x), 0.558, 100)

    # in the chaotic region.
    cobweb(lambda x: 4.0*x*(1-x), 0.1, 100)

The logistic map also has one stable fixed point if r ≤ 3. In the plot below, r = 2.9.

cobweb plot for logistic map, r = 2.9

Next we set r = 3.1. We start at x = 0.1 and converge to the two attractor points.

cobweb plot for logistic map, r = 3.1

If we start exactly on one of the attractor points the cobweb plot is simply a square.

cobweb plot for logistic map, r = 3.1, starting on attractor point

Finally, when we set r = 4 we’re in the chaotic region.

cobweb plot for logistic map, r = 4

More posts on iterated functions

Logistic trajectories

This post is a follow-on to the post on how to make the logistic bifurcation diagram below.

logistic bifurcation diagram

That post plotting the attractors for iterations of

f(x) = r x(1 – x).

This post will plot a few trajectories over time for differing starting points and varying values of r.

For values of r less than 3, the iterations converge to the same point regardless of where they start. The fixed point only depends on r. In the plot below, r = 2.9. We start from two different initial points, x = 0.55 and x = 0.71. Both trajectories oscillate toward a common limiting value. (As explained in the earlier post, the limiting value is (r-1)/r = 0.6786.)

For values of r a little greater than 3, the iterations oscillate between two values. Here we chose r = 3.1, and again we start from x = 0.55 and x = 0.71.

two-point attractor

For some larger value of r the iterations rotate between four values, then eight, etc. But for even larger values of r the trajectories are chaotic. Nearby starting points lead to divergent iterations. In the plot below, r = 3.7, which is in the chaotic region. We start with points close to each other, x = 0.5001 and x = 0.5002.

chaotic trajectories

Notice how the blue crosses and green x’s are on top of each other for the first 30 iterations or so, then they diverge.

These plots were created using the following Python code.

import numpy as np
import matplotlib.pyplot as plt

def f(x, r):
    return r*x*(1-x)

def time_plot(x0, y0, r, name):
    N = 80
    x = np.zeros(N)
    y = np.zeros(N)
    x[0] = x0
    y[0] = y0
    for i in range(1,N):
        x[i] = f(x[i-1], r)
        y[i] = f(y[i-1], r)

    t = np.arange(N)
    plt.plot(t, x, "b+")
    plt.plot(t, y, "gx")
    plt.legend([str(x0), str(y0)])
    plt.title(f"r = {r}")

More on sensitivity

Logistic bifurcation diagram in detail

The image below is famous. I’ve seen it many times, but rarely with a detailed explanation. If you’ve ever seen this image but weren’t sure exactly what it means, this post is for you.

logistic bifurcation diagram

This complex image comes from iterating a simple function

f(x) = r x(1 – x)

known as the logistic map. The iterates of the function can behave differently for different values of the parameter r.

We’ll start by fixing a value of r, with 0 ≤ r < 1. For any starting point 0 ≤ x ≤ 1, f(x) is going to be smaller than x by at least a factor of r, i.e.

0 ≤ f(x) ≤ rx.

Every time we apply f the result decreases by a factor of r.

0 ≤ f( f(x) ) ≤ r²x.

As we do apply f more and more times, the result converges to 0.

For r ≥ 1 it’s not as clear what will happen as we iterate f, so let’s look at a little bit of code to help see what’s going on.

    def f(x, r):
        return r*x*(1-x)

    def iter(x, r, n):
        for _ in range(n):
            x = f(x, r)
        return x

We can see that for 1 ≤ r ≤ 3, and 0 ≤ x ≤ 1, the iterations of f converge to a fixed point, and the value of that fixed point increases with r.

    >>> iter(0.1, 1.1, 100)

    # more iterations have little effect
    >>> iter(0.1, 1.1, 200)

    # different starting point has little effect
    >>> iter(0.2, 1.1, 200)

    # increasing r increases the fixed point
    >>> iter(0.2, 1.2, 200)

Incidentally, for 1 ≤ r ≤ 3, the fixed point equals (r – 1)/r. [1]

When r is a little bigger than 3, things get more interesting. Instead of a single fixed point, the iterates alternate between a pair of points, an attractor consisting of two points.

    >>> iter(0.2, 3.1, 200) 
    >>> iter(0.2, 3.1, 201) 
    >>> iter(0.2, 3.1, 202) 

How can we write code to detect an attractor? We can look at the set of points when we get, say when we iterate 1000, then 1001, etc. up to 1009 times. If this set has less than 10 elements, the iterates must have returned to one of the values. We’ll round the elements in our set to four digits so we actually get repeated values, not just values that differ by an extremely small amount.

    def attractors(x, r):
        return {round(iter(x, r, n), 4) for n in range(1000,1010)}

This is crude but it’ll do for now. We’ll make it more efficient, and we’ll handle the possibility of more than 10 points in an attractor.

Somewhere around r = 3.5 we get not just two points but four points.

    >>> attractors(0.1, 3.5)
    {0.3828, 0.5009, 0.8269, 0.875}

As r gets larger the number of points keeps doubling [2] until chaos sets in.

The function attractors is simple but not very efficient. After doing 1000 iterations, it starts over from scratch to do 1001 iterations etc. And it assumes there are no more than 10 fixed points. The following revision speeds things up significantly.

    def attractors2(x, r):
        x = iter(x, r, 100)
        x0 = round(x, 4)
        ts = {x0}
        for c in range(1000):
            x = f(x, r)
            xr = round(x, 4)
            if xr in ts:
                return ts

Notice we put a cap of 1000 on the number of points in the attractor for a given r. For some values of r and x, there is no finite set of attracting points.

Finally, here’s the code that produced the image above.

    import numpy as np
    import matplotlib.pyplot as plt
    rs = np.linspace(0, 4, 1000)
    for r in rs:
        ts = attractors2(0.1, r)
        for t in ts:
            plt.plot(r, t, "ko", markersize=1)

Update: See this post for graphs showing the trajectory of points over time for varying values of r.

More dynamical system posts

[1] Why is this only for 1 ≤ r ≤ 3? Shouldn’t (r – 1)/r be a fixed point for larger r as well? It is, but it’s not a stable fixed point. If x is ever the slightest bit different from (r – 1)/r the iterates will diverge from this point. This post has glossed over some fine points, such as what could happen on a set of measure zero for r > 3.

[2] The spacing between bifurcations decreases roughly geometrically until the system becomes chaotic. The ratio of one spacing to the next reaches a limit known as Feigenbaum’s constant, approximately 4.669. Playing with the code in this post and trying to estimate this constant directly will not get you very far. Feigenbaum’s constant has been computed to many decimal places, but by using indirect methods.

Calculating the period of Van der Pol oscillators

A few days ago I wrote about how to solve differential equations with SciPy’s ivp_solve function using Van der Pol’s equation as the example. Van der Pol’s equation is

{d^2x \over dt^2}-\mu(1-x^2){dx \over dt}+x= 0

The parameter μ controls the amount of nonlinear damping. For any initial condition, the solution approach a periodic solution. The limiting periodic function does not depend on the initial condition [1] but does depend on μ. Here are the plots for μ  = 0, 1, and 2 from the previous post.

Van der Pol oscillator solutions as a function of time

A couple questions come to mind. First, how quickly do the solutions become periodic? Second, how does the period depend on μ? To address these questions, we’ll use an optional argument to ivp_solve we didn’t need in the earlier post.

Using events in ivp_solve

For ivp_solve an event is a function of the time t and the solution y whose roots the solver will report. To determine the period, we’ll look at where the solution is zero; our event function is trivial since we want to find the roots of the solution itself.

Recall from the earlier post that we cast our second order ODE as a pair of first order ODEs, and so our solution is a vector, the function x and its derivative. So to find roots of the solution, we look at what the solver sees as the first component of the solver. So here’s our event function:

    def root(t, y): return y[0]

Let’s set μ = 2 and find the zeros of the solution over the interval [0, 40], starting from the initial condition x(0) = 1, x‘(0) = 0.

    mu = 2
    sol = solve_ivp(vdp, [0, 40], [1, 0], events=root)
    zeros = sol.t_events[0]

Here we reuse the vdp function from the previous post about the Van der Pol oscillator.

To estimate the period of the limit cycle we look at the spacing between zeros, and how that spacing is changing.

    spacing = zeros[1:] - zeros[:-1]
    deltas = spacing[1:] - spacing[:-1]

If we plot the deltas we see that the zero spacings quickly approach a constant value. Zero crossings are half periods, so the period of the limit cycle is twice the limiting spacing between zeros.

Van der pol period deltas

Theoretical results

If μ = 0 the Van der Pol oscillator reduces to a simple harmonic oscillator and the period is 2π. As μ increases, the period increases.

For relatively small μ we can calculate the period as above, but as μ increases this becomes more difficult numerically [2]. But one can easily show that the period is asymptotically

T ~ (3 – 2 log 2) μ

as μ goes to infinity. A more refined estimate due to Mary Cartwright is

T ~ (3 – 2 log 2) μ + 2π/μ1/3

for large μ.

More VdP posts

[1] There is a trivial solution, x = 0, corresponding to the initial conditions x(0) = x‘(0) = 0. Otherwise, every set of initial conditions leads to a solution that converges to the periodic attractor.

[2] To see why large values of μ are a problem numerically, here’s a plot of a solution for μ = 100.

Solution to Van der Pol for large damping parameter mu

The solution is differentiable everywhere, but the derivative changes so abruptly at the maxima and minima that it is discontinuous for practical purposes.

Solving Van der Pol equation with ivp_solve

Van der Pol’s differential equation is

{d^2x \over dt^2}-\mu(1-x^2){dx \over dt}+x= 0

The equation describes a system with nonlinear damping, the degree of nonlinearity given by μ. If μ = 0 the system is linear and undamped, but as μ increases the strength of the nonlinearity increases. We will plot the phase portrait for the solution to Van der Pol’s equation in Python using SciPy’s new ODE solver ivp_solve.

The function ivp_solve does not solve second-order systems of equations directly. It solves systems of first-order equations, but a second-order differential equation can be recast as a pair of first-order equations by introducing the first derivative as a new variable.

\begin{align*} {dx \over dt} &= y \\ {dy \over dt}&= \mu(1-x^2)y -x \\ \end{align*}

Since y is the derivative of x, the phase portrait is just the plot of (x, y).

Phase portait of Van der Pol oscillator

If μ = 0, we have a simple harmonic oscillator and the phase portrait is simply a circle. For larger values of μ the solutions enter limiting cycles, but the cycles are more complicated than just circles. These limiting cycles are periodic attractors: every non-trivial solution converges to the limit cycle.

Here’s the Python code that made the plot.

from scipy import linspace
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt

def vdp(t, z):
    x, y = z
    return [y, mu*(1 - x**2)*y - x]

a, b = 0, 10

mus = [0, 1, 2]
styles = ["-", "--", ":"]
t = linspace(a, b, 500)

for mu, style in zip(mus, styles):
    sol = solve_ivp(vdp, [a, b], [1, 0], t_eval=t)
    plt.plot(sol.y[0], sol.y[1], style)
# make a little extra horizontal room for legend
plt.legend([f"$\mu={m}$" for m in mus])

To plot the solutions as a function of time, rather than plotting phase portraits, change the line

    plt.plot(sol.y[0], sol.y[1], style)


    plt.plot(sol.t, sol.y[0], style)

and comment out the line setting xlim This gives the following plot.

Van der Pol oscillator solutions as a function of time

More dynamical system posts