Sensitive dependence on initial conditions

The following problem illustrates how the smallest changes to a problem can have large consequences. As explained at the end of the post, this problem is a little artificial, but it illustrates difficulties that come up in realistic problems.

Suppose you have a simple differential equation,

y'' - y = 0

where y(t) gives displacement in meters at a time measured in seconds. Assume initial conditions y(0) = 1 and y'(0) = -1.

Then the unique solution is y(t) = exp(-t). The solution is a decreasing, positive solution. But a numerical solution to the equation might turn around and start increasing, or it might go negative.

Suppose the initial conditions are y(0) = 1 as before but now y'(0) = -1 ± ε. You could think of the ε as a tiny measurement error in the initial velocity, or a limitation in representing the initial velocity as a number in a computer.

The following graph shows the solutions corresponding to ε = 10-6, 0, and -10-6.

plot of 3 solutions, epsilon positive, negative, zero

The exact solution is

 y(t) = \left(1 \mp \frac{\varepsilon}{2}\right)\exp(-t) \pm \frac{\varepsilon}{2}\exp(t)

If the initial velocity is exactly -1, then ε = 0 and we only have the solution exp(-t). But if the initial velocity is a little more than -1, there is an exponentially increasing component of the solution that will eventually overtake the exponentially decreasing component. And if the initial velocity is a little less than -1, there is a negative component increasing exponentially in magnitude. Eventually it will overtake the positive component and cause the solution to become negative.

The solution starts to misbehave (i.e. change direction or go negative) at

t = \frac{1}{2} \log\left( \frac{2}{\varepsilon} \mp 1 \right)

If ε is small, 2/ε is much larger than 1, and so the final 1 above makes little difference. So we could say the solution goes bad at approximately

t = \frac{1}{2} \log\left( \frac{2}{\varepsilon} \right)

whether the error is positive or negative. If ε is ±0.000001, for example, the solution goes bad around t = 7.25 seconds. (The term we dropped only effects the 6th decimal place.)

On a small time scale, less than 7.25 seconds, we would not notice the effect of the (initially) small exponentially growing component. But eventually it matters, a lot. We can delay the point where the solution goes bad by controlling the initial velocity more carefully, but this doesn’t buy us much time. If we make ε ten times smaller, we postpone the time when the solution goes bad to 8.41, only a little over a second later.

[Looking at the plot, the solution does not obviously go wrong at t = 7.25. But it does go qualitatively wrong at that point: a solution that should be positive and decreasing becomes either negative or increasing.]

Trying to extend the quality of the solution by making ε smaller is a fool’s errand. Every order of magnitude decrease in ε only prolongs the inevitable by an extra second or so.

This is a somewhat artificial example. The equation and initial conditions were selected to make the sensitive dependence obvious. The sensitive dependence is itself sensitive: small changes to the problem, such as adding a small first derivative term, make it better behaved.

However, sensitive dependence is a practical phenomena. Numerical methods can, for example, pick up an initially small component of an increasing solution while trying to compute a decreasing solution. Sometimes the numerical sensitivity is a reflection of a sensitive physical system. But if the physical system is stable and the numerical solution is not, the problem may not be numerical. The problem may be a bad model. The numerical difficulties may be trying to tell you something, in which case increasing the accuracy of the numerical method may hide a more basic problem.

Related post: Approximating a solution that doesn’t exist

Relating Airy and Bessel functions

The Airy functions Ai(x) and Bi(x) are independent solutions to the differential equation

y'' - xy = 0

For negative x they act something like sin(x) and cos(x). For positive x they act something like exp(x) and exp(-x). This isn’t surprising if you look at the differential equation. If you replace x with a negative constant, you get sines and cosines, and if you replace it with a positive constant, you get positive and negative exponentials.

The Airy functions can be related to Bessel functions as follows:

\mathrm{Ai}(x) = \left\{ 	\begin{array}{ll} 		\frac{1}{3}\sqrt{\phantom{-}x} \left(I_{-1/3}(\hat{x}) - I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br /><br /><br /><br />
                \\<br /><br /><br /><br />
		\frac{1}{3}\sqrt{-x} \left(J_{-1/3}(\hat{x}) + J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 	\end{array} \right.


\mathrm{Bi}(x) = \left\{ 	\begin{array}{ll} 		\sqrt{\phantom{-}x/3} \left(I_{-1/3}(\hat{x}) + I_{1/3}(\hat{x})\right) & \mbox{if } x > 0 \\<br />                 \\<br /> 		\sqrt{-x/3} \left(J_{-1/3}(\hat{x}) - J_{1/3}(\hat{x})\right) & \mbox{if } x < 0 	\end{array} \right.

Here J is a “Bessel function of the first kind” and I is a “modified Bessel function of the first kind.” Also

\hat{x} = \frac{2}{3} \left(\sqrt{|x|}\right)^3

To verify the equations above, and to show how to compute these functions in Python, here’s some code.

The SciPy function airy computes both functions, and their first derivatives, at once. I assume that’s because it doesn’t take much longer to compute all four functions than to compute one. The code for Ai2 and Bi2 below uses np.where instead of if ... else so that it can operate on NumPy vectors all at once. You can plot Ai and Ai2 and see that the two curves lie on top of each other. The same holds for Bi and Bi2.


from scipy.special import airy, jv, iv
from numpy import sqrt, where

def Ai(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return ai

def Bi(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return bi

def Ai2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        third*sqrt( x)*(iv(-third, hatx) - iv(third, hatx)),
        third*sqrt(-x)*(jv(-third, hatx) + jv(third, hatx)))

def Bi2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        sqrt( x/3.0)*(iv(-third, hatx) + iv(third, hatx)),
        sqrt(-x/3.0)*(jv(-third, hatx) - jv(third, hatx)))


There is a problem with Ai2 and Bi2: they return nan at 0. A more careful implementation would avoid this problem, but that’s not necessary since these functions are only for illustration. In practice, you’d simply use airy and it does the right thing at 0.

Related links:

Diagram of Bessel function relationships
Bessel functions in SciPy

Hilbert space methods for PDE

When I was in grad school, my advisor asked me to study his out-of-print book, Hilbert Space Methods in Partial Differential Equations. I believe I had a photocopy of a photocopy; I don’t recall ever seeing the original book. I pored over that stack of copies line by line while preparing for my qualifying exams.

Then this evening I was browsing a used book store and was shocked to find a copy of the book, a Dover reprint.

It was an odd feeling to find what was once a precious and mysterious book available for $5.99 as part of a rag-tag assortment of mostly elementary/popular used math books.

Related posts:

Odd little bookshops
Nonlinear is not a hypothesis
Mathematical genealogy

Mean residual time

If something has survived this far, how much longer is it expected to survive? That’s the question answered by mean residual time.

For a positive random variable X, the mean residual time for X is a function eX(t) given by

e_X(t) = E(X - t \mid X > t) = \int_t^\infty  \frac{1 - F_X(x)}{1-F_X(t)} \, dx

provided the expectation and integral converge. Here F(t) is the CDF, the probability that X is less than t.

For an exponential distribution, the mean residual time is constant. For a Pareto (power law) distribution, the mean residual time is proportional to t. This has an interesting consequence, known as the Lindy effect.

Now let’s turn things around. Given function a function e(t), can we find a density function for a positive random variable with that mean residual time? Yes.

The equation above yields a differential equation for F, the CDF of the distribution.

If we differentiate both sides of

e(t) (1 - F(t)) = \int_t^\infty 1 - F(x)\, dx

with respect to t and rearrange, we get the first order differential equation

F'(t) + g(t)\, F(t) = g(t)


g(t) = \frac{e'(t) + 1}{e(t)}

The initial condition must be F(0) = 0 because we’re looking for the distribution of a positive random variable, i.e. the probability of X being less than zero must be 0. The solution is then

F(t) = 1 - \frac{e(0)}{e(t)} \exp\left( -\int_0^t \frac{dx}{e(x)} \right)

This means that for a desired mean residual time, you can use the equation above to create a CDF function to match. The derivative of the CDF function gives the PDF function, so differentiate both sides to get the density.

Pulp science fiction and vibrations

This morning I ran across Pulp-o-mizer and decided my series of posts on mechanical vibrations could use a little sensational promotion.

The posts are

  • Part I: Introduction and free, undamped vibrations.
  • Part II: Free, damped vibrations (under-damping, critical damping, over-damping)
  • Part III: Forced, undamped vibrations (beats, resonance)
  • Part IV: Forced, damped vibrations (excitation frequency, response frequency, steady state solutions)

Free damped vibrations

This is the second post in a series on vibrations determine by the equation

m u'' + γ u' + k u = F cos ωt

The first post in the series looked at the simplest case, γ = 0 and F = 0. Now we’ll look at the more realistic and more interesting case of damping γ > 0. We won’t consider forcing yet, so F = 0 and our equation reduces to

m u'' + γ u' + k u = 0.

The effect of damping obviously depends on the amount of damping, i.e. the size of γ relative to other terms. Damping removes energy from the system and so the amplitude of the oscillations goes to zero over time, regardless of the amount of damping. However, the system can have three qualitatively different behaviors: under-damping, critical damping, and over-damping.

The characteristic polynomial of the equation is

m x2 + γ x + k

This polynomial has either two complex roots, one repeated real root, or two real roots depending on whether the discriminant γ2 – 4mk is negative, zero, or positive. These three states correspond to under-damping, critical damping, and over-damping.


When damping is small, the system vibrates at first approximately as if there were no damping, but the amplitude of the solutions decreases exponentially. As long as γ2 < 4mk the system is under-damped and the solution is

u(t) = R exp( –γt/2m ) cos( μt– φ )


μ = √(4mk – γ2) / 2m.

The amplitude R and the phase φ are determined by the initial conditions.

As an example, let γ =1, m = 1, and k = 5. This implies μ =  √19/2. Set the initial conditions so that R = 6 and φ = 1. The solid blue line below is the solution 6 exp(-t/2) cos(t– 1), and the dotted green lines above and below are the exponential envelope 6 exp(-t/2).

And here is a video of the mass-spring-dashpot system in action made using the code discussed here.

Note that the solution is not simply an oscillation at the natural frequency multiplied by an exponential term. The damping changes the frequency of the periodic term, though the change is small when the damping is small.

The factor μ is called the quasi-frequency. Recall from the previous post that the natural frequency, the frequency of the solution with no damping, is denoted ω0. The quasi-frequency is related to the natural frequency by

μ = √(1 – γ2/4mk) ω0 ≈ (1 – γ2/8mk) ω0.

The approximation above holds for small γ since a Taylor expansion shows that √(1 – x) ≈ 1 – x/2 for small x.

This says that the quasi-frequency decreases the natural frequency by a factor of approximately γ2/8mk. So when γ is small there is little change to the natural frequency, but the amount of change grows quadratically as γ increases.

Critical damping

Critical damping occurs when γ2 = 4mk. In this case the solution is given by

u(t) = (A + Bt) exp( – γt/2m).

The position of the mass will asymptotically approach 0. Depending on the initial velocity, it will either go monotonically to zero or reach some maximum displacement before it turns around and goes to 0.

For an example, let m = 1 and k = 5 as before. For critical damping, γ must equal √20. Choose initial conditions so that A = 1 and B = 10.


When γ2 > 4mk the system is over-damped. When damping is that large, the system does not oscillate at all.

The solution is

u(t) = A exp(r1 t) + B exp(r2 t)

where r1 and r2 are the two roots of

m x2 + γ x + k = 0.

Because γ > 0, both roots are negative and solutions decay exponentially to 0.

Coming up

The next two posts in the series will add a forcing term, i.e. F > 0 in our differential equation. The next post will look at undamped driven vibrations, and the final post will look at damped driven vibrations.

Fourier series before Fourier

I always thought that Fourier was the first to come up with the idea of expressing general functions as infinite sums of sines and cosines. Apparently this isn’t true.

The idea that various functions can be described in terms of Fourier series … was for the first time proposed by Daniel Bernoulli (1700–1782) to solve the one-dimensional wave equation (the equation of motion of a string) about 50 years before Fourier. … However, no one contemporaneous to D. Bernoulli accepted the idea as a general method, and soon the study was forgotten.

Source: The Nonlinear World

Perhaps Fourier’s name stuck to the idea because he developed it further than Bernoulli did.

Related posts:

Fourier’s personal heat problem
Generalized Fourier transforms

Mechanical vibrations

My favorite topic in an introductory differential equations course is mechanical and electrical vibrations. I enjoyed learning about it as a student and I enjoyed teaching it later. (Or more accurately, I enjoyed being exposed to it as a student and really learning it later when I had to teach it.)

I find this subject interesting for three reasons.

  1. The same equations describe a variety of mechanical and electrical systems.
  2. You can get practical use out of some relatively simple math.
  3. The solutions display wide variety of behavior as you vary the coefficients.

This is the first of a four-part series of posts on mechanical vibrations. The posts won’t be consecutive: I’ll write about other things in between.

Continue reading

Differential Equations and the City

This afternoon I got a review copy of X and the City: Modeling Aspects of Urban Life by John A. Adam. It’s a book about mathematical model, taking all its examples from urban life: public transportation, growth, pollution, etc. I’ve only skimmed through the book so far, but it looks like most of the applications involve differential equations. Some depend on algebra or probability.

The book looks interesting. I hope to say more about the book once I’ve had a chance to read it. The examples are all short, so it may be any easy book to read a little at a time.

I also got a review copy of The Book of Inkscape today, and I’m expecting several other books soon. It may take a while to get through these since this is a busy time for me. When it rains, it pours.

Castles and quantum mechanics

How are castles and quantum mechanics related? One connection is rook polynomials.

The rook is the chess piece that looks like a castle, and used to be called a castle. It can move vertically or horizontally, any number of spaces.

A rook polynomial is a polynomial whose coefficients give the number of ways rooks can be arranged on a chess board without attacking each other. The coefficient of xk in the polynomial Rm,n(x) is the number of ways you can arrange k rooks on an m by n chessboard such that no two rooks are in the same row or column.

The rook polynomials are related to the Laguerre polynomials by

Rm,n(x) = n! xn Lnm-n(-1/x)

where Lnk(x) is an “associated Laguerre polynomial.” These polynomials satisfy Laguerre’s differential equation

x y” + (n+1-x) y‘ + k y = 0,

an equation that comes up in numerous contexts in physics. In quantum mechanics, these polynomials arise in the solution of the Schrödinger equation for the hydrogen atom.


Relations between special functions

Easiest and hardest classes to teach

I’ve taught a variety of math classes, and statistics has been the hardest to teach. The thing I find most challenging is coming up with homework problems. Most exercises are either blatantly artificial or extremely tedious. It’s hard to find moderately realistic problems that don’t take too long to work out.

The course I’ve found easiest to teach has been differential equations. The course has a flat structure: there’s a list of techniques to cover, all roughly the same level of difficulty. There are no deep analytic or philosophical issues to skirt around as there are in statistics. And it’s not hard to come up with practical applications that can be worked out fairly easily.

Related post:

Impure math

Boundary conditions are the hard part

What we call “differential equations” are usually not just differential equations. They also have associated initial conditions or boundary conditions.

With ordinary differential equations (ODEs), the initial conditions are often an afterthought. First you find a full set of solutions, then you plug in initial conditions to get a specific solution.

Partial differential equations (PDEs) have boundary conditions (and maybe initial conditions too). Since people typically learn ODEs first, they come to PDEs expecting boundary values to play a role analogous to ODEs. In a very limited sense they do, but in general boundary values are quite different.

The hard part about PDEs is not the PDEs themselves; the hard part is the boundary conditions. Finding solutions to differential equations in the interior of a domain is easy compared to making the equations have the specified behavior on the boundary.

No model can take everything into account. You have to draw some box around that part of the world that you’re going to model and specify what happens when your imaginary box meets the rest of the universe. That’s the hard part.

Related posts:

Three views of differential equations
Nonlinear is not a hypothesis
Approximating a solution that does not exist