Period of a nonlinear pendulum

The term “nonlinear pendulum” is analogous to a retronym, a new name for an old thing to distinguish it from a new variation. For example, once upon a time a guitar was just a guitar. Now such a guitar is called an acoustic guitar to distinguish it from an electric guitar. Similarly, analog signal processing is a retronym to distinguish what was once the only kind of signal processing from the new arrival, digital signal processing.

The equation of motion for a pendulum is nonlinear. If the initial angle of displacement is sufficiently small, the linearized form of the equation is adequate for most applications. This linearized approximation is better known than the more accurate original equation, and so the un-linearized equation is known as the nonlinear pendulum equation.

The (nonlinear) equation of motion for a pendulum is the differential equation

\theta'' + \frac{g}{\ell}\sin \theta = 0

where g is the acceleration due to gravity and ℓ is the length of the pendulum. For small initial displacement θ0 the linear approximation

\theta'' + \frac{g}{\ell} \theta = 0

works well. The smaller θ0 is the more accurate the linear approximation is.

Linear and nonlinear period

The period of a pendulum obtained by solving the linearized equation is

T = 2\pi \sqrt{\frac{\ell}{g}}

The solution to the nonlinear pendulum equation is also periodic, though the solution is a combination of Jacobi functions rather than a combination of trig functions. The difference between the two solutions is small when θ0 is small, but becomes more significant as θ0 increases.

The difference in the periods is more evident than the difference in shape for the two waves. The period of the nonlinear solution is longer than that of the linearized solution. Here’s a plot of the solutions to the linear and nonlinear equations, with ℓ = g and θ0 = 1.

The period for the nonlinear pendulum is given by

T = 2\pi \sqrt{\frac{\ell}{g}}\, f(\theta_0)

where f is an increasing function, equal to 1 at θ = 0.

The exact form of f involves special functions, and so there is naturally a lot of interest in approximations to f. The exact value is given by

f(\theta) = \frac{2}{\pi}K(\sin(\theta/2)) = \frac{1}{\text{AGM}(1, \cos(\theta/2))}

where K is the “complete elliptic integral of the first kind” and AGM is the arithmetic-geometric mean.

The AGM of two numbers is found by taking their ordinary (arithmetic) mean and geometric mean, then repeating the process. This process converges very rapidly, and so doing one step of the iteration gives a good approximation. If that’s not good enough, doing two steps gives an even better approximation, and so on. In fact, a common approximation for f(θ) is to do half a step, taking the geometric mean of 1 and cos(θ/2), i.e.

f(\theta) \approx \frac{1}{\sqrt{\cos(\theta/2)}}

To see how accurate this approximation is, let’s plot the exact and approximate values of f.

The two curves can hardly be distinguished visually, so let’s look at a plot of their difference.

Here’s the code that produced the two plots above.

from scipy.special import ellipk
from numpy import sin, cos, pi, linspace
import matplotlib.pyplot as plt

def exact(θ):
    return 2*ellipk(sin(θ/2)**2)/pi

def approx(θ):
    return cos(θ/2)**-0.5

t = linspace(0, 1.5)
plt.plot(t, exact(t))
plt.plot(t, approx(t))
plt.legend(["exact", "approx"])

plt.plot(t, exact(t) - approx(t))
plt.ylabel("approximation error")

NB: There are two conventions for defining the complete elliptic integral of the first kind. SciPy uses a convention for K that requires us to square the argument.

Driven oscillations

The differences between the linearized and nonlinear equation become more apparent when there is a forcing function, i.e. when the right-hand side of the differential equations is not zero. Here are the solutions when the forcing function is cos(2t).

Now the solutions not only differ in their period, the shapes of the solutions are substantially different. The linear solutions are well-behaved but the nonlinear solutions can be chaotic with sensitive dependence on initial conditions. This remains true if a damping term is added.

Related posts

Rational solution to Korteweg–De Vries equation

Students seeing differential equations for the first time expect every equation to have a nice closed-form solution, because up to that point in their education nearly every problem they’ve seen has been contrived to have a nice closed-form solution.

Once you resign yourself to the fact that a differential equation will rarely have a closed form solution, it’s a treat when you run across one that does. This is especially true for nonlinear equations.

The Korteweg–De Vries (KdV) equation is

u_t - 6 u\, u_x + u_{xxx} = 0

is such a treat. I wrote a few days ago about the sech² solution to the KdV equation.

u(x,t) = -\frac{v}{2} \,\text{sech}^2\left(\frac{\sqrt{v}}{2} (x - vt - a)\right )

There’s also a rational solution:

u(x, t) = 6 x \frac{ \left(x^3-24 t\right)}{\left(x^3 + 12 t\right)^2}

We can verify this is a solution to the KdV equation reusing the Mathematica code from the earlier post.

    u[x_, t_] := u[x_, t_] := 6 x (x^3 - 24 t)/(x^3 + 12 t)^2
    Simplify[ D[u[x, t], {t, 1}] 
            - 6 u[x, t] D[u[x, t], {x, 1}] 
            + D[u[x, t], {x, 3}] ]

This simplifies to 0.

Here’s a plot:

The top of the plot looks like a two-lane road on top of a mountain ridge, with a sinkhole in the middle of the road.

The “road” is a artifact of plotting. The solution is singular along the curve x³ + 12t= 0, and Mathematica had to chop the top of the graph off because it can’t plot an infinitely tall function.


Solitons and the KdV equation

Rarely does a nonlinear differential equation, especially a nonlinear partial differential equation, have a closed-form solution. But that is the case for the Korteweg–De Vries equation.

(Technically I should say it’s rare for a naturally-occurring nonlinear differential equation to have a closed-form solution. You can always start with a solution and cook up a contrived differential equation that it satisfies, but that differential equation will not be one that is interesting in applications.)

The Korteweg–De Vries (KdV) equation is

u_t - 6 u\, u_x + u_{xxx} = 0

This equation is used to model shallow water waves.

The KdV equation is a third-order PDE, which is unusual but not unheard of. As I wrote about earlier in the context of ODEs, third order linear equations are virtually nonexistent in application, but third order nonlinear equations are not so uncommon.

The function

u(x,t) = -\frac{v}{2} \,\text{sech}^2\left(\frac{\sqrt{v}}{2} (x - vt - a)\right )

is a solution to the KdV equation. It is an example of a class of functions known as solitons.

You can increase the amplitude by increasing v, but when you do you also increase the velocity of the wave. You can’t vary amplitude and velocity independently because the KdV equation is nonlinear.

Verifying by hand that this is a solution is tedious, so I’ll show the verification in Mathematica.

    u[x_, t_] := - (v/2) Sech[Sqrt[v] (x - v t - a)/2]^2
    Simplify[  D[u[x, t], {t, 1}]  
               - 6 u[x, t] D[u[x, t], {x, 1}] 
               + D[u[x, t], {x, 3}]  ]

This returns 0. (Without the Simplify[] command it returns a mess, but the mess simplifies to 0.)

Here’s a plot of the soliton with a = 0 and v = 1.

Here’s a slice with x = 0.

This looks remarkably like the density function of a Gaussian turned upside down. Here’s a plot with the soliton (in blue) with the density of a normal random variable with variance 5/2, scaled to have the same amplitude at 0.

Related posts

When a function cannot be extended

The relation between a function and its power series is subtle. In a calculus class you’ll see equations of the form “series = function” which may need some footnotes. Maybe the series only represents the function over part of its domain: the function extends further than the power series representation.

Starting with the power series, we can ask whether the function it represents extends further than the series representation.

This video does a nice job of explaining why a particular function cannot be extended beyond the disk on which the series converges.

Toward the end, the video explains how its main example is a member of a broader class of functions that have no analytic continuation. The technical term, which the video does not use, is lacunary series [1]. When the gaps in a power series grow faster than linearly, the series cannot be extended beyond its radius of convergence.

Lacunary series make interesting images since the behavior of the function becomes complicated toward the edge of the domain. The video gives some nice examples. The image above comes from this post and the following image comes from this post.

Differential equations

The video mentions Hadamard’s gap theorem. I believe his gap theorem was a spin-off of his work on Laplace’s equation. See this post on Hadamard’s counterexample to the Dirichlet principle for the Laplacian.

The motivation for a LOT of classical math was differential equations. I didn’t realize this as a student. Years later I’d run into something and think “So that is why this person was interested in that problem,” such as why Hadamard would care about whether power series could be extended.

Hadamard wanted to solve a differential equation on a disk with boundary conditions specified on the rim. It’s going to be a problem if the series representation of the solution doesn’t extend to the rim.

Related posts

[1] Lacuna is the Latin word for a hole or a pit. The word came to be use metaphorically for a gap, such as a gap in a manuscript. Later mathematicians used this term for power series with increasing gaps between non-zero terms.

Test functions

Test functions are how you can make sense of functions that aren’t really functions.

The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended real numbers where a function can take on the value ∞.

You can make things like the delta function rigorous by saying they’re not functions of real numbers, but functions that operate on other functions, i.e. test functions. More on that here. These functions acting on test functions are called generalized functions or distributions. (This this post for how this kind of distribution differs from a probability distribution, but is analogous.)

Analogy with test charges

To say rigorously how these generalized functions behave you show how they act on test functions φ. Test functions are analogous to test charges: you can describe an electric field by saying what force it would exert on a test charge.

A test charge is somewhat idealized. It has to be so small that it tests a field without effecting it. This isn’t really possible, but you can think of it as a limit. You’re looking at the limit of the force per unit charge as the charge goes to zero.

Similarly, a test function is ideal in that it very well behaved so the generalized functions that act on it can be badly behaved. Test functions are infinitely differentiable, and they either have compact support or have extremely thin tails, depending on context.

Analogy with category theory

While writing the previous post I thought about an analogy between distribution theory and category theory. I worked with distribution theory a lot in grad school, and I found it natural that the definition of a distribution depended on how it acted in relation to something else, i.e. how it acted on all test functions.

But I found category definitions that involved extraneous objects puzzling. For example, the product of two objects is a third object such that for any fourth object (!) a certain diagram commutes. Why is this superfluous object doing injecting itself into the definition? If I’d thought of it as a test object then I would have found the definition more palatable.

As with distribution theory, you’re defining something by how it relates to all elements of some collection. But in distribution theory, your distributions and your test functions are very distinct things. In category theory, your test objects are peers of the thing you’re testing.

Software and the Allee effect

The Allee effect is named after Warder Clyde Allee who added a term to the famous logistic equation. His added term is highlighted in blue.

\frac{dN}{dt} = r N {\color{red}\left( \frac{N}{A} - 1 \right)} \left( 1 - \frac{N}{K} \right)

Here N is the population of a species over time, r is the intrinsic rate of increase, K is the carrying capacity, and A is the critical point.

If you remove Allee’s term, you get an equation saying that the rate of growth of a population is proportional to the current population size, and so growth starts out exponential, and a term (1 – N/K), which says growth slows down as the population approaches its carrying capacity.

Allee’s term (N/A – 1) says that the rate of growth becomes negative when the population falls below some threshold A. When there are too few individuals, survival becomes more difficult.

Software metaphor

I thought of the Allee effect as a metaphor for software technology after writing my previous post. In general, problems become easier to solve over time. Software development may become harder because the problems developers solve are changing, but solving old problems typically gets easier. Algorithms improve and get wrapped up for convenience. There’s something like logistic growth where tasks get easier to solve, but improvement slows down over time.

If a problem is specialized, it can run into something like the Allee effect. It becomes harder over time because fewer people are interested in it. Software isn’t maintained as fast as it degrades. Fewer people have experience with it. It’s harder to be a COBOL programmer, for example, than it used to be. But this can also apply to much more current problems. A problem that was hot five years ago can be harder to solve now than it was then, for reasons the previous post discusses.

Complex differential equations

Differential equations in the complex plane are different from differential equations on the real line.

Suppose you have an nth order linear differential equation as follows.

y^{(n)} + a_{n-1}y^{(n-1)} + a_{n-2}y^{(n-2)} + \cdots a_0y = 0

The real case

If the a‘s are continuous, real-valued functions on an open interval of the real line, then there are n linearly independent solutions over that interval. The coefficients do not need to be differentiable for the solutions to be differentiable.

The complex case

If the a‘s are continuous, complex-valued functions on an connected open of the real line, then there may not be n linearly independent solutions over that interval.

If the a‘s are all analytic, then there are indeed n independent solutions. But if any one of the a‘s is merely continuous and not analytic, there will be fewer than n independent solutions. There may be as many as n-1 solutions, or there may not be any solutions at all.

Source: A. K. Bose. Linear Differential Equations on the Complex Plane. The American Mathematical Monthly, Vol. 89, No. 4 (Apr., 1982), pp 244–246.

What good is a DE once you have the solution?

In an undergraduate course on differential equations, you see an equation, you find a solution. Wax on, wax off. It would be reasonable to assume that the differential equation is simply an obstacle to overcome and that once you have the solution, the differential equation has done its job and can be discarded.

It would seem backward to ask whether a function satisfies a differential equation. It is backward, but that doesn’t mean it’s a bad idea. It can be very useful to know that a function you’re interested in satisfies a nice differential equation. (It’s trivial to make up differential equations for any function; the key is to discover a nice differential equation, a simple relationship between a function and its derivatives.)

The previous post gave an example of this. There we said that although Halley’s root-finding method converges faster than Newton’s method, it also requires one more function evaluation, namely the second derivative of the function whose root you’re finding. But when that function satisfies a second order differential equation, you can get the second derivative for free once you’ve evaluated the function and its derivative. Well, at a deep discount at least if not free.

In that post we gave the example of a Bessel function. Once you have evaluated a Bessel function and its first derivative at a point. you can calculate the second derivative in much less time that it took to evaluate the Bessel function itself.

Bessel functions are not unique in this regard. Hypergeometric functions are a large and useful class of functions, and these functions all satisfy Riemann’s differential equation.

All the classic orthogonal polynomials—Chebyshev polynomials, Jacobi polynomials, Laguerre polynomials, Hermite polynomials, etc.—satisfy a linear second order differential equation. For example, the Legendre polynomials Pn satisfy

(1 - x^2) y''(x) - 2 x y'(x) + \nu (\nu + 1) y = 0

And its not just Bessel functions, hypergeometric functions, and orthogonal polynomials. Most special functions satisfy a useful differential equation. More specifically, a linear second order differential equation.

So far the only reason given for wanting to know what differential equation a function satisfies is to speed up calculations: evaluate two functions and get a third one for cheap. But there are many other reasons to care what differential equation a function satisfies. For example, a common way to show that two functions are equal is to show that they satisfy the same differential equation with the same initial conditions. This is applying a uniqueness theorem for differential equations, but many other equations about differential equations could be useful to apply.

Related posts

Halley’s variation on Newton’s method

Newton’s method for solving f(x) = 0 converges quickly once it starts to converge. Specifically, it converges quadratically: the error is roughly squared at each step. This means that the number of correct decimal places doubles at each iteration of Newton’s method.

Doubling the number of decimal places is nice, but what if we could triple the number of correct decimals at each step? Edmond Halley, best known for the comet that bears his name, came up with a way to do just that. Newton’s method converges quadratically, but Halley’s method converges cubically.

Newton’s method updates iterations using the rule

x_{n+1} = x_n + \frac{f(x_n)}{f'(x_n)}

Halley’s variation uses the rule

x_{n+1} = x_n - \frac{1}{\dfrac{f'(x_n)}{f(x_n)} - \dfrac{f''(x)}{2f'(x)}}

Halley’s method makes more progress per iteration, but it also takes more work per iteration, and so in practice it’s not usually an improvement over Newton’s method. But in some cases it could be.

Evaluating efficiency

In root-finding problems, the function f is usually relatively time-consuming to evaluate. We can assume nearly all the work in root-finding goes into evaluating f and its derivatives, and we can ignore the rest of the arithmetic in the method.

Newton’s method requires evaluating the function f and its derivative f ′. Halley’s method requires these function evaluates as well and also requires evaluating the second derivative f ′′.

Three iterations of Newton’s method make as much progress toward finding a root as two iterations of Halley’s method. But since Newton’s method requires two function evaluations and Halley’s method requires three, both require six function evaluations to make the same amount of progress. So in general Halley’s method is not an improvement over Newton’s method.

When Halley might be better

Suppose we had a way to quickly compute the second derivative of f at a point from the values of f and its first derivative at that point, rather than by having to evaluate the second derivative explicitly. In that case two iterations of Halley’s method might make as much progress as three iterations of Newton’s method but with less effort, four function evaluations rather than six.

Many of the functions that are used most in applied mathematics satisfy 2nd order differential equations, and so it is common to be able to compute the second derivative from the function and its derivative.

For example, suppose we want to find the zeros of Bessel functions Jν. Bessel functions satisfy Bessel’s differential equation; that’s where Bessel functions get their name.

x^2 y'' + x y' + \left(x^2 - \nu^2 \right) y = 0

This means that once we have evaluated Jν and its first derivative, we can compute the second derivative from these values as follows.

J_{\nu}''(x_n) = \frac{(\nu^2 - x_n^2)J_\nu(x_n) - x_nJ_\nu'(x_n)}{x_n^2}

Maybe we’re not working with a well-studied function like a Bessel function but rather we are numerically solving a differential equation of our own creation and we want to know where the solution is zero. Halley’s method would be a natural fit because our numerical method will give us the values of y and y ′ at each step.

Related posts

Drag equation exponent variation

The motion of a falling body of mass m is given by

m \frac{dv}{dt} = mg - kv^r

where the term −kvr accounts for drag due to air resistance. One can derive r = 2 under simple physical assumptions, but if I remember correctly other values of r may be more realistic in certain circumstances. I don’t know much about the physics here; if you know about the use of other values of r, please let me know by leaving a comment.

Terminal velocity

When r = 1 or r = 2 the differential equation above can be solved in terms of elementary functions, but otherwise it cannot. Nevertheless one can show that for all values of r the object reaches a terminal velocity, and calculate that velocity without explicitly solving the differential equation. William Waterhouse demonstrated this in a one-page article [1]. He rewrites the equation to look at time as a function of velocity rather than velocity as a function of time

\frac{dt}{dv} = \frac{1}{g} \frac{1}{1 - (k/mg)v^r}

and concludes

t = \frac{1}{g} \int_0^v \frac{dv}{1 - (k/mg)v^r} + t_0

He notes that the integral diverges as v approaches


and so that is the terminal velocity, i.e. it takes an infinite amount of time to achieve this velocity. Waterhouse recommends this derivation as “a good example of deriving information about a problem without knowing an explicit solution.”

I would add that such an approach is the norm, not an exception. A closed-form solution to a differential equation in nice when you can get it, but usually not possible. And even when you can find a closed-form solution, you may be able to achieve your goal more directly by not using it.

Hypergeometric solution

I suspected the differential equation could be solved for general values of r using special functions, and that is the case. Mathematica was able to evaluate the integral for t as a function of v in terms of a hypergeometric function.

v \, _2F_1\left(1,\frac{1}{r};1+\frac{1}{r};\frac{g k v^r}{m}\right)

When I asked Mathematica to solve the differential equation directly, it said that the solution is the inverse function of the function above. Apparently Waterhouse and Mathematica agree that it is easier to think of t as a function of v rather than the original formulation.

The notation 2F1 indicates there are two upper parameters and one lower parameter. In our application, the upper parameters are 1 and 1/r, the lower parameter is 1 + 1/r, and the function is evaluated at gkvr/m. You can find a brief introduction to hypergeometric functions here. A hypergeometric function 2F1 has a singularity at 1, and so we could derive the terminal velocity from the explicit solution.

Mathematica implementation

Let c = gk/m. Then we can express velocity as a function of time in Mathematica by

    f[r_, c_, v_] := InverseFunction
        #1 Hypergeometric2F1[1, 1/r, 1 + 1/r, c #1^r] &

and use this to make plots of the velocity for various values of c and r.

The following sets c = 2 and varies r over 1, 1.1, 1.2, … 2.

    Plot[Table[f[2, d/10, t], {d, 10, 20}], {t, 0, 4}, PlotRange -> All]

Here’s the output.

The terminal velocity decreases as r increases. The opposite is true for c < 1.

[1] William C. Waterhouse. A Fact about Falling Bodies. Mathematics Magazine, Vol. 44, No. 1 (Jan., 1971), pp. 33–34. The article straddles two pages, but takes up less than half of each page.