Closed-form solution to the nonlinear pendulum equation

The previous post looks at the nonlinear pendulum equation and what difference it makes to the solutions if you linearize the equation.

If the initial displacement is small enough, you can simply replace sin θ with θ. If the initial displacement is larger, you can improve the accuracy quite a bit by solving the linearized equation and then adjusting the period.

You can also find an exact solution, but not in terms of elementary functions; you have to use Jacobi elliptic functions. These are functions somewhat analogous to trig functions, though it’s not helpful to try to pin down the analogies. For example, the Jacobi function sn is like the sine function in some ways but very different in others, depending on the range of arguments.

We start with the differential equation

θ″(t) + c² sin( θ(t) ) = 0

where c² = g/L, i.e. the gravitational constant divided by pendulum length, and initial conditions θ(0) = θ0 and θ′(0) = 0. We assume −π < θ0 < π.

Then the solution is

θ(t) = 2 arcsin( a cd(ct | m ) )

where a = sin(θ0/2), ma², and cd is one of the 12 Jacobi elliptic functions. Note that cd, like all the Jacobi functions, has an argument and a parameter. In the equation above the argument is ct and the parameter is m.

The last plot in the previous post was misleading, showing roughly equal parts genuine difference and error from solving the differential equation numerically. Here’s the code that was used to solve the nonlinear equation.

from scipy.special import ellipj, ellipk
from numpy import sin, cos, pi, linspace, arcsin
from scipy.integrate import solve_ivp

def exact_period(θ):
    return 2*ellipk(sin(θ/2)**2)/pi

def nonlinear_ode(t, z):
    x, y = z
    return [y, -sin(x)]    

theta0 = pi/3
b = 2*pi*exact_period(theta0)
t = linspace(0, 2*b, 2000)

sol = solve_ivp(nonlinear_ode, [0, 2*b], [theta0, 0], t_eval=t)

The solution is contained in sol.y[0].

Let’s compare the numerical solution to the exact solution.

def f(t, c, theta0):
    a = sin(theta0/2)
    m = a**2
    sn, cn, dn, ph = ellipj(c*t, m)
    return 2*arcsin(a*cn/dn)

There are a couple things to note about the code. First,SciPy doesn’t implement the cd function, but it can be computed as cn/dn. Second, the function ellipj returns four functions at once because it takes about as much time to calculate all four as it does to compute one of them.

Here is a plot of the error in solving the differential equation.

And here is the difference between the exact solution to the nonlinear pendulum equation and the stretched solution to the linear equation.

How nonlinearity affects a pendulum

The equation of motion for a pendulum is the differential equation

\theta'' + \frac{g}{\ell}\sin \theta = 0

where g is the acceleration due to gravity and ℓ is the length of the pendulum. When this is presented in an introductory physics class, the instructor will immediately say something like “we’re only interested in the case where θ is small, so we can rewrite the equation as

\theta'' + \frac{g}{\ell} \theta = 0

Questions

This raises a lot of questions, or at least it should.

  1. Why not leave sin θ alone?
  2. What justifies replacing sin θ with just θ?
  3. How small does θ have to be for this to be OK?
  4. How do the solutions to the exact and approximate equations differ?

First, sine is a nonlinear function, making the differential equation nonlinear. The nonlinear pendulum equation cannot be solved using mathematics that students in an introductory physics class have seen. There is a closed-form solution, but only if you extend “closed-form” to mean more than the elementary functions a student would see in a calculus class.

Second, the approximation is justified because sin θ ≈ θ when θ is small. That’s true, but it’s kinda subtle. Here’s a post unpacking that.

The third question doesn’t have a simple answer, though simple answers are often given. An instructor could make up an answer on the spot and say “less than 10 degrees” or something like that. A more thorough answer requires answering the fourth question.

I address how nonlinear affects the solutions in a post a couple years ago. This post will expand a bit on that post.

Longer period

The primary difference between the nonlinear and linear pendulum equations is that the solutions to the nonlinear equation have longer periods. The solution to the linear equation is a cosine. Solving the equation determines the frequency, amplitude, and phase shift of the cosine, but qualitatively it’s just a cosine. The solution to the corresponding nonlinear equation, with sin θ rather than θ, is not exactly a cosine, but it looks a lot like a cosine, only the period is a little longer [1].

OK, the nonlinear pendulum has a longer period, but how much longer? The period is increased by a factor f0) where θ0 is the initial displacement.

You can find the exact answer in my earlier post. The exact answer depends on a special function called the “complete elliptic integral of the first kind,” but to a good approximation

f(\theta) \approx \frac{1}{\sqrt{\cos(\theta/2)}}

The earlier post compares this approximation to the exact function.

Linear solution with adjusted period

Since the nonlinear pendulum equation is roughly the same as the linear equation with a longer period, you can approximate the solution to the nonlinear equation by solving the linear equation but increasing the period. How good is that approximation?

Let’s do an example with θ0 = 60° = π/3 radians. Then sin θ0 = 0.866 but θ0, in radians, is 1.047, so we definitely can’t say sin θ0 is approximately θ0. To make things simple, let’s set ℓ = g. Also, assume the pendulum starts from rest, i.e. θ'(0) = 0.

Here’s a plot of the solutions to the nonlinear and linear equations.

Obviously the solution to the nonlinear equation has a longer period. In fact it’s 7.32% longer. (The approximation above would have estimated 7.46%.)

Here’s a plot comparing the solution of the nonlinear equation and the solution to the linear equations with period stretched by 7.32%.

The solutions differ by less than the width of the plotting line, so it’s too small to see. But we can see there’s a difference when we subtract the two solutions.

Here’s a plot of the solutions to the nonlinear and linear equations.

Update: The plot above is misleading. Part of what it shows is numerical error from solving the pendulum equation. When we redo the plot using the exact solution the error is about half as large. And the error is periodic, as we’d expect. See this post for more on the exact solution using Jacobi functions and the differential equation solver that was used to make the original plot.

Related posts

[1] The period of a pendulum depends on its length ℓ, and so we can think of the nonlinear term effectively replacing ℓ by a longer effective length ℓeff.

 

Differential equation with a small delay

In grad school I specialized in differential equations, but never worked with delay-differential equations, equations specifying that a solution depends not only on its derivatives but also on the state of the function at a previous time. The first time I worked with a delay-differential equation would come a couple decades later when I did some modeling work for a pharmaceutical company.

Large delays can change the qualitative behavior of a differential equation, but it seems plausible that sufficiently small delays should not. This is correct, and we will show just how small “sufficiently small” is in a simple special case. We’ll look at the equation

x′(t) = a x(t) + b x(t − τ)

where the coefficients a and b are non-zero real constants and the delay τ is a positive constant. Then [1] proves that the equation above has the same qualitative behavior as the same equation with the delay removed, i.e. with τ = 0, provided τ is small enough. Here “small enough” means

−1/e exp(−aτ) < e

and

aτ < 1.

There is a further hypothesis for the theorem cited above, a technical condition that holds on a nowhere dense set. The solution to a first order delay-differential like the one we’re looking at here is not determined by an initial condition x(0) = x0 alone. We have to specify the solution over the interval [−τ, 0]. This can be any function of t, subject only to a technical condition that holds on a nowhere-dense set of initial conditions. See [1] for details.

Example

Let’s look at a specific example,

x′(t) = −3 x(t) + 2 x(t − τ)

with the initial condition x(1) = 1. If there were no delay term τ, the solution would be x(t) = exp(1 − t). In this case the solution monotonically decays to zero.

The theorem above says we should expect the same behavior as long as

−1/e < 2τ exp(3τ) < e

which holds as long as τ < 0.404218.

Let’s solve our equation for the case τ = 0.4 using Mathematica.

tau = 0.4
solution = NDSolveValue[
    {x'[t] == -3 x[t] + 2 x[t - tau], x[t /; t <= 1] == t }, 
    x, {t, 0, 10}]
Plot[solution[t], {t, 0, 10}, PlotRange -> All]

This produces the following plot.

The solution initially ramps up to 1, because that’s what we specified, but it seems that eventually the solution monotonically decays to 0, just as when τ = 0.

When we change the delay to τ = 3 and rerun the code we get oscillations.

[1] R. D. Driver, D. W. Sasser, M. L. Slater. The Equation x’ (t) = ax (t) + bx (t – τ) with “Small” Delay. The American Mathematical Monthly, Vol. 80, No. 9 (Nov., 1973), pp. 990–995

Bowie integrator and the nonlinear pendulum

I recently learned of Bowie’s numerical method for solving ordinary differential equations of the form

y″ = f(y)

via Alex Scarazzini’s masters thesis [1].

The only reference I’ve been able to find for the method, other than [1], is the NASA Orbital Flight Handbook from 1963. The handbook describes the method as “a method employed by C. Bowie and incorporated in many Martin programs” and says nothing more about its origin.

Martin Company

What does it mean by “Martin programs”? The first line of the foreword of the manual says

This handbook has been produced by the Space Systems Division of the Martin Company under Contract NAS8-S03l with the George C. Marshall Space Flight Center of the National Aeronautics and Space Administration.

The Martin Company was the Glenn L. Martin Company, which became Martin Marietta after merging with American-Marietta Corporation in 1961. The handbook was written after the merger but used the older name. Martin Marietta would go on to become Lockheed Martin in 1995.

Bowie’s method was used “in many Martin programs” and yet is practically unknown in academic circles. Scarazzini’s thesis shows the method works well for his problem.

Nonlinear pendulum

My first thought when I saw the form of differential equations Bowie’s method solves was the nonlinear pendulum equation

y″ = − sin(y)

where the initial displacement y(0) is too large for the approximation sin θ ≈ θ to be sufficiently accurate. I wrote some Python code to try out Bowie’s method on this equation.

import numpy as np

N = 100
y  = np.zeros(N)
yp = np.zeros(N) # y'

y[0] = 1
yp[0] = 0

T = 4*ellipk(np.sin(y[0]/2)**2)
h = T/N

f   = lambda x: -np.sin(x)
fp  = lambda x: -np.cos(x) # f'
fpp = lambda x:  np.sin(x) # f''

for n in range(0, N-1):
    y[n+1] = y[n] + h*yp[n] + 0.5*h**2*f(y[n]) + \
              (h**3/6)*fp(y[n])*yp[n] + \
              (h**4/24)*(fpp(yp[n])*yp[n]**2 + fp(y[n])*f(y[n]))
    yp[n+1] = yp[n] + h*f(y[n]) + 0.5*h**2*fp(y[n])*yp[n] + \
              (h**3/6)*(fpp(yp[n])*yp[n]**2 + fp(y[n])*f(y[n]))

(Update: The code above was updated January 4, 2026 to fix a bug.)

Here’s a graph of the numerical solution.

The solution looks like a cosine, but it isn’t exactly. As I explain here,

The solution to the nonlinear pendulum equation is also periodic, though the solution is a combination of Jacobi functions rather than a combination of trig functions. The difference between the two solutions is small when θ0 is small, but becomes more significant as θ0 increases.

The difference in the periods is more evident than the difference in shape for the two waves. The period of the nonlinear solution is longer than that of the linearized solution.

That’s why the period T in the code is not

2π = 6.28

but rather

4 K(sin² θ0/2) = 6.70.

You’ll also see the period of the nonlinear pendulum given as 4 K(sin θ0/2). As pointed out in the article linked above,

There are two conventions for defining the complete elliptic integral of the first kind. SciPy uses a convention for K that requires us to square the argument.

Related posts

[1] Alex Scarazzini.3D Visualization of a Schwarzschild Black Hole Environment. University of Bern. August 2025.

Weak derivatives

There are numerous memes floating around with the words “Being weak is nothing to be ashamed of; staying weak is.” Or some variation. I thought about this meme in the context of weak derivatives.

Being weak is nothing to be ashamed of; staying weak is.

The last couple posts have talked about distributions, also called generalized functions. The delta function, for example, is not actually a function but a generalized function, a linear functional on a space of test functions.

Delta function meme

Distribution theory lets you take derivatives of functions that don’t have a derivative in the classical sense. View the function as a regular distribution, take its derivative as a distribution, and if this derivative is a regular distribution, that function is called a weak derivative of the original function.

You can use distribution theory to complete a space of functions analogous to how the real numbers complete the rational numbers.

To show that an equation has a rational solution, you might first show that it has a real solution, then show that the real solution is in fact a rational. To state the strategy more abstractly, to find a solution in a small space, you first look for solutions in a larger space where solutions are easier to find. Then you see whether the solution you found lies in the smaller space.

This is the modern strategy for studying differential equations. You first show that a differential equation has a solution in a weak sense, then if possible prove a regularity result that shows the solution is a classical solution. There’s no shame in finding a weak solution. But from a classical perspective, there’s shame in stopping there.

Related posts

ODE to Fisher’s transform

I was calculating a correlation coefficient this afternoon and ran into something interesting.

Suppose you have two uncorrelated random variables X and Y. If you draw, say, a thousand samples each from X and Y and compute Pearson’s correlation coefficient, you almost certainly will not get 0, though you very likely will get a small number.

How do you find a confidence interval around a correlation coefficient?

Sample correlation coefficient values do not follow a normal distribution, though the distribution is approximately normal if the population correlation is small. The distribution gets further from normal as the correlation gets close to 1 or −1.

Enter Fisher’s transformation. If you run the sample correlation coefficient r through the function

½ log((1 + r)/(1 − r)) = arctanh(r)

you get something that has a distribution closer to the normal distribution. You find a confidence interval for the transformed variable, then undo the transformation.

Now where did the Fisher transform come from?

I don’t know whether this was Fisher’s derivation, but Hotelling came up the following derivation. Assume you apply a transform G(r) to the correlation coefficient. Write an asymptotic expansion for the kurtosis κ3 and set the first term equal to zero. This leads to the ordinary differential equation

3(1 − r³) G″(r) − 6r G′(r) = 0

which has the solution G(r) = arctanh(r).

I found this interesting because I’ve worked with differential equations and with statistics, but I’ve rarely seen them overlap.

 

Differential equation on a doughnut

Here’s a differential equation from [1] that’s interesting to play with.

\begin{align*} x^\prime &= -my + nxz \\ y^\prime &= \phantom{-} mx + nyz \\ z^\prime &= (n/2)(1 + z^2 - x^2 - y^2) \\ \end{align*}

Even though it’s a nonlinear system, it has a closed-form solution, namely

\begin{align*} x(t) &= \frac{2a\cos(mt) - 2b \sin(mt)}{\Delta - 2c \sin(n t) + (2 - \Delta) \cos(n t)} \\ y(t) &= \frac{2a \sin(mt) + 2b \cos(mt)} {\Delta - 2 c \sin(n t) + (2 - \Delta) \cos(n t)} \\ z(t) &= \frac{2c \cos(nt) + (2 - \Delta) \sin(nt)}{\Delta - 2c \sin(nt) + (2 - \Delta) \cos(n t)} \\ \end{align*}

where (abc) is the solution at t = 0 and Δ = 1 + a² + b² + c².

The solutions lie on the torus (doughnut). If m and n are coprime integers then the solutions form a closed loop. If the ratio m/n is not rational then the solutions are dense on the torus.

Here’s an example with parameters a = 1, b = 1, c = 3, m = 3, and n = 5.

And now with parameters a = 1, b = 1, c = 0.3, m = 4, and n = 5.

And finally with parameters a = 1, b = 1, c = 0.3, m = π, and n = 5.

Note that when m = 2 and n = 3 the trajectory traces out a trefoil knot.

Related posts

[1] Richard Parris. A Three-Dimensional System with Knotted Trajectories. The American Mathematical Monthly, Vol. 84, No. 6 (Jun. – Jul., 1977), pp. 468–469

The biggest math symbol

The biggest math symbol that I can think of is the Riemann P-symbol

w(z)=P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

The symbol is also known as the Papperitz symbol because Erwin Papperitz invented the symbol for expressing solutions to Bernard Riemann’s differential equation.

Before writing out Riemann’s differential equation, we note that the equation has regular singular points at ab, and c. In fact, that is its defining feature: it is the most general linear second order differential equation with three regular singular points. The parameters ab, and c enter the equation in the as roots of an expression in denominators; that’s as it has to be if these are the singular points.

The way the Greek letter parameters enter Riemann’s equation is more complicated, but there is a good reason for the complication: the notation makes solutions transform as simply as possible under a bilinear transformation. This is important because Möbius transformations are the conformal automorphisms of the Riemann sphere.

To be specific, let

m(z) = \frac{Az + B}{Cz + D}

be a Möbius transformation. Then

P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\} = P \left\{ \begin{matrix} m(a) & m(b) & m(c) & \; \\ \alpha & \beta & \gamma & m(z) \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

Since the parameters on the top row of the P-symbol are the locations of singularities, when you transform a solution, moving the singularities, the new parameters have to be the new locations of the singularities. And importantly the rest of the parameters do not change.

Now with the motivation aside, we’ll write out Riemann’s differential equation in all its glory.

\frac{d^2w}{dz^2} + p(z) \frac{dw}{dz} + q(z) \frac{w}{(z-a)(z-b)(z-c)} = 0

where

p(z) = \frac{1-\alpha-\alpha^\prime}{z-a} + \frac{1-\beta-\beta^\prime}{z-b} + \frac{1-\gamma-\gamma^\prime}{z-c}

and

q(z) = \frac{\alpha\alpha^\prime (a-b)(a-c)} {z-a} +\frac{\beta\beta^\prime (b-c)(b-a)} {z-b} +\frac{\gamma\gamma^\prime (c-a)(c-b)} {z-c}

Related posts

Eigenvalues of the Laplacian on a square

What are the solutions to the equation

uxx + uyy = λu

on the unit square with the requirement that u(x, y) = 0 on the boundary?

It’s easy to see that the functions

u(x, y) = sin(mπx) sin(nπy)

are solutions with

λ = (m² + n²)π²

for non-negative integers m and n. It’s not so obvious that these are the only solutions, but we’ll take that on faith.

The previous post looked at Pólya’s bounds on the eigenvalues of the Laplace operator Δ for a general region D, with only the requirement that copies of D can tile the plane without overlapping. Surely squares satisfy this requirement, so the problem in this post is a special case of the problem in the previous post. So how do they compare?

Pólya gives lower bounds on the kth eigenvalue, so how do we order the numbers of the form (m² + n²)π?

There’s a theorem for counting the number of numbers n up to x where n is the sum of two squares, the Landau-Ramanujan theorem. But it seems to contradict Pólya’s bounds. That’s because Landau-Ramanujan only counts each n once, but eigenvalues need to be counted multiple times if a number can be written as a sum of squares multiple ways.

For example, 25π² should count as an eigenvalue four times, corresponding to (m, n) = (5, 0), (0, 5), (3, 4), and (4, 3).

About how large is the kth eigenvalue? If non-negative integers m and n satisfy

(m² + n²)π² ≤ x

then (mn) is inside a circle of radius √x/π. For large x, the number of such pairs is approximately the area of a circle of radius √x/π in the first quadrant, which is x / 4π. So the kth eigenvalue is approximately 4πk, which matches Pólya’s lower bound of 4πk for a region of area 1.

The sound of drums that tile the plane

The vibration of a thin membrane is often modeled by the PDE

Δu + λu = 0

where u is the height of the membrane and Δ is the Laplacian. Solutions only exist for certain values of λ, the eigenvalues of Δ. You could think of u as giving the height of a vibrating drum head and the λs as frequencies of vibration.

The λs depend on the shape of the drum head D and the boundary conditions. If we clamp down the drumhead on the rim, i.e. specify that u equals 0 on the boundary ∂D, then we call this Dirichlet boundary conditions. If the drumhead is free to vibrate, i.e. we do not specify the height on ∂D, but we do specify that the membrane is flat on ∂D, i.e. that the normal derivative ∂u/∂n equals 0, then we call this Neumann boundary conditions.

George Pólya [1] gives lower bounds on the λs under Dirichlet boundary conditions and upper bounds on the λs under Neumann boundary conditions. His theorems require that D be bounded and that it is possible to tile the plane with congruent copies of D. For example, D could be a rectangle. Or it could have curved sides, like figure in an Escher drawing.

Let A be the area of D. Under Dirichlet boundary conditions the kth eigenvalue is bounded below by

λk ≥ 4πk / A.

Under Neumann boundary conditions, the kth eigenvalue is bounded above by

λk ≤ 4π(k − 1) / A.

Update: See the next post for how the theorem in this post compares to the special case of a square.

Related posts

[1] G. Pólya. On the Eigenvalues of Vibrating Membranes. Proceedings of the London Mathematical Society, Volume s3-11, Issue 1, 1961, Pages 419–433.