Bowie integrator and the nonlinear pendulum

I recently learned of Bowie’s numerical method for solving ordinary differential equations of the form

y″ = f(y)

via Alex Scarazzini’s masters thesis [1].

The only reference I’ve been able to find for the method, other than [1], is the NASA Orbital Flight Handbook from 1963. The handbook describes the method as “a method employed by C. Bowie and incorporated in many Martin programs” and says nothing more about its origin.

Martin Company

What does it mean by “Martin programs”? The first line of the foreword of the manual says

This handbook has been produced by the Space Systems Division of the Martin Company under Contract NAS8-S03l with the George C. Marshall Space Flight Center of the National Aeronautics and Space Administration.

The Martin Company was the Glenn L. Martin Company, which became Martin Marietta after merging with American-Marietta Corporation in 1961. The handbook was written after the merger but used the older name. Martin Marietta would go on to become Lockheed Martin in 1995.

Bowie’s method was used “in many Martin programs” and yet is practically unknown in academic circles. Scarazzini’s thesis shows the method works well for his problem.

Nonlinear pendulum

My first thought when I saw the form of differential equations Bowie’s method solves was the nonlinear pendulum equation

y″ = − sin(y)

where the initial displacement y(0) is too large for the approximation sin θ ≈ θ to be sufficiently accurate. I wrote some Python code to try out Bowie’s method on this equation.

import numpy as np

N = 100
y  = np.zeros(N)
yp = np.zeros(N) # y'

y[0] = 1
yp[0] = 0

T = 4*ellipk(np.sin(y[0]/2)**2)
h = T/N

f   = lambda x: -np.sin(x)
fp  = lambda x: -np.cos(x) # f'
fpp = lambda x:  np.sin(x) # f''

for n in range(0, N-1):
    y[n+1] = y[n] + h*yp[n] + 0.5*h**2*f(y[n]) + \
              (h**3/6)*fp(y[n])*yp[n] + \
              (h**4/24)*(fpp(yp[n])*yp[n]**2 + fp(y[n])*f(y[n]))
    yp[n+1] = yp[n] + h*f(y[n]) + 0.5*h**2*fp(y[n])*yp[n] + \
              (h**3/6)*(fpp(yp[n])*yp[n]**2 + fp(y[n])*f(y[n]))

(Update: The code above was updated January 4, 2026 to fix a bug.)

Here’s a graph of the numerical solution.

The solution looks like a cosine, but it isn’t exactly. As I explain here,

The solution to the nonlinear pendulum equation is also periodic, though the solution is a combination of Jacobi functions rather than a combination of trig functions. The difference between the two solutions is small when θ0 is small, but becomes more significant as θ0 increases.

The difference in the periods is more evident than the difference in shape for the two waves. The period of the nonlinear solution is longer than that of the linearized solution.

That’s why the period T in the code is not

2π = 6.28

but rather

4 K(sin² θ0/2) = 6.70.

You’ll also see the period of the nonlinear pendulum given as 4 K(sin θ0/2). As pointed out in the article linked above,

There are two conventions for defining the complete elliptic integral of the first kind. SciPy uses a convention for K that requires us to square the argument.

Related posts

[1] Alex Scarazzini.3D Visualization of a Schwarzschild Black Hole Environment. University of Bern. August 2025.

Weak derivatives

There are numerous memes floating around with the words “Being weak is nothing to be ashamed of; staying weak is.” Or some variation. I thought about this meme in the context of weak derivatives.

Being weak is nothing to be ashamed of; staying weak is.

The last couple posts have talked about distributions, also called generalized functions. The delta function, for example, is not actually a function but a generalized function, a linear functional on a space of test functions.

Delta function meme

Distribution theory lets you take derivatives of functions that don’t have a derivative in the classical sense. View the function as a regular distribution, take its derivative as a distribution, and if this derivative is a regular distribution, that function is called a weak derivative of the original function.

You can use distribution theory to complete a space of functions analogous to how the real numbers complete the rational numbers.

To show that an equation has a rational solution, you might first show that it has a real solution, then show that the real solution is in fact a rational. To state the strategy more abstractly, to find a solution in a small space, you first look for solutions in a larger space where solutions are easier to find. Then you see whether the solution you found lies in the smaller space.

This is the modern strategy for studying differential equations. You first show that a differential equation has a solution in a weak sense, then if possible prove a regularity result that shows the solution is a classical solution. There’s no shame in finding a weak solution. But from a classical perspective, there’s shame in stopping there.

Related posts

ODE to Fisher’s transform

I was calculating a correlation coefficient this afternoon and ran into something interesting.

Suppose you have two uncorrelated random variables X and Y. If you draw, say, a thousand samples each from X and Y and compute Pearson’s correlation coefficient, you almost certainly will not get 0, though you very likely will get a small number.

How do you find a confidence interval around a correlation coefficient?

Sample correlation coefficient values do not follow a normally distribution, though the distribution is approximately normal if the population correlation is small. The distribution gets further from normal as the correlation gets close to 1 or −1.

Enter Fisher’s transformation. If you run the sample correlation coefficient r through the function

½ log((1 + r)/(1 − r)) = arctanh(r)

you get something that has a distribution closer to the normal distribution. You find a confidence interval for the transformed variable, then undo the transformation.

Now where did the Fisher transform come from?

I don’t know whether this was Fisher’s derivation, but Hotelling came up the following derivation. Assume you apply a transform G(r) to the correlation coefficient. Write an asymptotic expansion for the kurtosis κ3 and set the first term equal to zero. This leads to the ordinary differential equation

3(1 − r³) G″(r) − 6r G′(r) = 0

which has the solution G(r) = arctanh(r).

I found this interesting because I’ve worked with differential equations and with statistics, but I’ve rarely seen them overlap.

 

Differential equation on a doughnut

Here’s a differential equation from [1] that’s interesting to play with.

\begin{align*} x^\prime &= -my + nxz \\ y^\prime &= \phantom{-} mx + nyz \\ z^\prime &= (n/2)(1 + z^2 - x^2 - y^2) \\ \end{align*}

Even though it’s a nonlinear system, it has a closed-form solution, namely

\begin{align*} x(t) &= \frac{2a\cos(mt) - 2b \sin(mt)}{\Delta - 2c \sin(n t) + (2 - \Delta) \cos(n t)} \\ y(t) &= \frac{2a \sin(mt) + 2b \cos(mt)} {\Delta - 2 c \sin(n t) + (2 - \Delta) \cos(n t)} \\ z(t) &= \frac{2c \cos(nt) + (2 - \Delta) \sin(nt)}{\Delta - 2c \sin(nt) + (2 - \Delta) \cos(n t)} \\ \end{align*}

where (abc) is the solution at t = 0 and Δ = 1 + a² + b² + c².

The solutions lie on the torus (doughnut). If m and n are coprime integers then the solutions form a closed loop. If the ratio m/n is not rational then the solutions are dense on the torus.

Here’s an example with parameters a = 1, b = 1, c = 3, m = 3, and n = 5.

And now with parameters a = 1, b = 1, c = 0.3, m = 4, and n = 5.

And finally with parameters a = 1, b = 1, c = 0.3, m = π, and n = 5.

Note that when m = 2 and n = 3 the trajectory traces out a trefoil knot.

Related posts

[1] Richard Parris. A Three-Dimensional System with Knotted Trajectories. The American Mathematical Monthly, Vol. 84, No. 6 (Jun. – Jul., 1977), pp. 468–469

The biggest math symbol

The biggest math symbol that I can think of is the Riemann P-symbol

w(z)=P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

The symbol is also known as the Papperitz symbol because Erwin Papperitz invented the symbol for expressing solutions to Bernard Riemann’s differential equation.

Before writing out Riemann’s differential equation, we note that the equation has regular singular points at ab, and c. In fact, that is its defining feature: it is the most general linear second order differential equation with three regular singular points. The parameters ab, and c enter the equation in the as roots of an expression in denominators; that’s as it has to be if these are the singular points.

The way the Greek letter parameters enter Riemann’s equation is more complicated, but there is a good reason for the complication: the notation makes solutions transform as simply as possible under a bilinear transformation. This is important because Möbius transformations are the conformal automorphisms of the Riemann sphere.

To be specific, let

m(z) = \frac{Az + B}{Cz + D}

be a Möbius transformation. Then

P \left\{ \begin{matrix} a & b & c & \; \\ \alpha & \beta & \gamma & z \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\} = P \left\{ \begin{matrix} m(a) & m(b) & m(c) & \; \\ \alpha & \beta & \gamma & m(z) \\ \alpha^\prime & \beta^\prime & \gamma^\prime & \; \end{matrix} \right\}

Since the parameters on the top row of the P-symbol are the locations of singularities, when you transform a solution, moving the singularities, the new parameters have to be the new locations of the singularities. And importantly the rest of the parameters do not change.

Now with the motivation aside, we’ll write out Riemann’s differential equation in all its glory.

\frac{d^2w}{dz^2} + p(z) \frac{dw}{dz} + q(z) \frac{w}{(z-a)(z-b)(z-c)} = 0

where

p(z) = \frac{1-\alpha-\alpha^\prime}{z-a} + \frac{1-\beta-\beta^\prime}{z-b} + \frac{1-\gamma-\gamma^\prime}{z-c}

and

q(z) = \frac{\alpha\alpha^\prime (a-b)(a-c)} {z-a} +\frac{\beta\beta^\prime (b-c)(b-a)} {z-b} +\frac{\gamma\gamma^\prime (c-a)(c-b)} {z-c}

Related posts

Eigenvalues of the Laplacian on a square

What are the solutions to the equation

uxx + uyy = λu

on the unit square with the requirement that u(x, y) = 0 on the boundary?

It’s easy to see that the functions

u(x, y) = sin(mπx) sin(nπy)

are solutions with

λ = (m² + n²)π²

for non-negative integers m and n. It’s not so obvious that these are the only solutions, but we’ll take that on faith.

The previous post looked at Pólya’s bounds on the eigenvalues of the Laplace operator Δ for a general region D, with only the requirement that copies of D can tile the plane without overlapping. Surely squares satisfy this requirement, so the problem in this post is a special case of the problem in the previous post. So how do they compare?

Pólya gives lower bounds on the kth eigenvalue, so how do we order the numbers of the form (m² + n²)π?

There’s a theorem for counting the number of numbers n up to x where n is the sum of two squares, the Landau-Ramanujan theorem. But it seems to contradict Pólya’s bounds. That’s because Landau-Ramanujan only counts each n once, but eigenvalues need to be counted multiple times if a number can be written as a sum of squares multiple ways.

For example, 25π² should count as an eigenvalue four times, corresponding to (m, n) = (5, 0), (0, 5), (3, 4), and (4, 3).

About how large is the kth eigenvalue? If non-negative integers m and n satisfy

(m² + n²)π² ≤ x

then (mn) is inside a circle of radius √x/π. For large x, the number of such pairs is approximately the area of a circle of radius √x/π in the first quadrant, which is x / 4π. So the kth eigenvalue is approximately 4πk, which matches Pólya’s lower bound of 4πk for a region of area 1.

The sound of drums that tile the plane

The vibration of a thin membrane is often modeled by the PDE

Δu + λu = 0

where u is the height of the membrane and Δ is the Laplacian. Solutions only exist for certain values of λ, the eigenvalues of Δ. You could think of u as giving the height of a vibrating drum head and the λs as frequencies of vibration.

The λs depend on the shape of the drum head D and the boundary conditions. If we clamp down the drumhead on the rim, i.e. specify that u equals 0 on the boundary ∂D, then we call this Dirichlet boundary conditions. If the drumhead is free to vibrate, i.e. we do not specify the height on ∂D, but we do specify that the membrane is flat on ∂D, i.e. that the normal derivative ∂u/∂n equals 0, then we call this Neumann boundary conditions.

George Pólya [1] gives lower bounds on the λs under Dirichlet boundary conditions and upper bounds on the λs under Neumann boundary conditions. His theorems require that D be bounded and that it is possible to tile the plane with congruent copies of D. For example, D could be a rectangle. Or it could have curved sides, like figure in an Escher drawing.

Let A be the area of D. Under Dirichlet boundary conditions the kth eigenvalue is bounded below by

λk ≥ 4πk / A.

Under Neumann boundary conditions, the kth eigenvalue is bounded above by

λk ≤ 4π(k − 1) / A.

Update: See the next post for how the theorem in this post compares to the special case of a square.

Related posts

[1] G. Pólya. On the Eigenvalues of Vibrating Membranes. Proceedings of the London Mathematical Society, Volume s3-11, Issue 1, 1961, Pages 419–433.

My favorite paper: H = W

A paper came out in 1964 with the title “H = W.” The remarkably short title was not cryptic, however. The people for whom the paper was written knew exactly what it meant. There were two families of function spaces, one denoted with H and another denoted with W, that were speculated to be the same, and this paper proved that indeed they were.

This post will explain the significance of the paper. It will be more advanced and less self-contained than most of my posts. If you’re still interested, read on.

Definitions

The derivative of a function might exist in a generalized sense when it doesn’t exist in the classical sense. I give an introduction to this idea in the post How to differentiate a non-differentiable function. The generalized derivative is a linear functional on test functions [1] and may not correspond to a classical function. The delta “function” δ(x) is the classic example.

A regular distribution is a distribution whose action on a test function is equal to multiplying the test function by a locally integrable function and integrating.

Given Ω ⊂ ℝn, the Sobolev space Wk,p(Ω) consists of functions whose partial derivatives of order up to k are regular distributions that lie in the space Lp(Ω).

For example, let I be the interval (−1, 1) and let f(x) = |x|. The function f is not differentiable at 0, but the generalized derivative of f is the sign function sgn(x), which is in Lp(I) for all p. The generalized derivative of sgn(x) is 2δ(x), which is not a regular distribution [2], and so fW1,p(I) but fW2,p(I).

The norm on Wk,p(Ω) is the sum of the Lp norms of the function and each of its partial derivatives up to order k.

The Sobolev space Hk,p(Ω) is the closure of test functions in the norm of the space Wk,p(Ω).

Theorem

It’s not obvious a priori that these two ways of defining a Sobolev space are equivalent, but James Serrin and Normal George Meyers [3] proved in 1964 that for all domains Ω, and for all non-negative integers k, and for all 1 ≤ p in < ∞ we have

Hk,p(Ω) = Wk,p(Ω).

The proof is remarkably brief, less than a page.

Significance

Why does this theorem matter? Sobolev spaces are the machinery of the modern theory of differential equations. I spent a lot of time in my mid 20s working with Sobolev spaces.

The grand strategy of PDE research is to first search for generalized solutions to an equation, solutions belonging to a Sobolev space, then if possible prove that the generalized solution is in fact a classical solution.

This is analogous to first proving an algebraic equation has complex solution, then proving that the complex solution is real, or proving that an equation has a real number solution, then proving that the real solution is in fact an integer. It’s easier to first find a solution in a larger space, then if possible show that the thing you found belongs to a smaller space.

Related posts

[1] A test function in this context is an infinitely differentiable function of compact support. In other contexts a test function is not required to have compact support but is required to go asymptotically approach zero rapidly, faster than the reciprocal of any polynomial.

[2] The classical derivative of sgn(x) is equal to zero almost everywhere. But the derivative as a distribution is not zero. The pointwise derivative may not equal the generalized derivative.

[2] Norman G, Meyers; Serrin, James (1964), “H = W”, Proceedings of the National Academy of Sciences, 51 (6): 1055–1056

Fredholm index

The previous post on kernels and cokernels mentioned that for a linear operator TV → W, the index of T is defined as the difference between the dimension of its kernel and the dimension of its cokernel:

index T = dim ker T − dim coker T.

The index was first called the Fredholm index, because of it came up in Fredholm’s investigation of integral equations. (More on this work in the next post.)

Robustness

The index of a linear operator is robust in the following sense. If V and W are Banach spaces and TV → W is a continuous linear operator, then there is an open set around T in the space of continuous operators from V to W on which the index is constant. In other words, small changes to T don’t change its index.

Small changes to T may alter the dimension of the kernel or the dimension of the cokernel, but they don’t alter their difference.

Relation to Fredholm alternative

The next post discusses the Fredholm alternative theorem. It says that if K is a compact linear operator on a Hilbert space and I is the identity operator, then the Fredholm index of IK is zero. The post will explain how this relates to solving linear (integral) equations.

Analogy to Euler characteristic

We can make an exact sequence with the spaces V and W and the kernel and cokernel of T as follows:

0 → ker TVW → coker T → 0

All this means is that the image of one map is the kernel of the next.

We can take the alternating sum of the dimensions of the spaces in this sequence:

dim ker T − dim V + dim W − dim coker T.

If V and W have the same finite dimension, then this alternating sum equals the index of T.

The Euler characteristic is also an alternating sum. For a simplex, the Euler characteristic is defined by

V − EF

where V is the number of vertices, E the number of edges, and F the number of faces. We can extend this to higher dimensions as the number of zero-dimensional object (vertices), minus the number of one-dimensional objects (edges), plus the number of two-dimensional objects, minus the number of three dimensional objects, etc.

A more sophisticated definition of Euler characteristic is the alternating sum of the dimensions of cohomology spaces. These also form an exact sequence.

The Atiyah-Singer index theorem says that for elliptic operators on manifolds, two kinds of index are equal: the analytical index and the topological index. The analytical index is essentially the Fredholm index. The topological index is derived from topological information about the manifold.

This is analogous to the Gauss-Bonnet theorem that says you can find the Euler characteristic, a topological invariant, by integrating Gauss curvature, an analytic calculation.

Other posts in this series

This is the middle post in a series of three. The first was on kernels and cokernels, and the next is on the Fredholm alternative.

Linear KdV dispersion

The Korteweg–De Vries (KdV) equation

u_t - 6 u\, u_x + u_{xxx} = 0

is a nonlinear PDE used to model shallow water waves. The linear counterpart omits the nonlinear term in the middle.

u_t + u_{xxx} = 0

This variant is useful in itself, but also for understanding the nonlinear KdV equation.

Solitons

Solutions to the linear KdV equation spread out over time. The nonlinear term in the KdV equation counterbalances the dispersion term uxxx so that solutions to the nonlinear PDE behave in some ways like linear solutions.

Solutions to a nonlinear equation don’t add, and yet they sorta add. Here’s a description from [1].

At the heart of these observations is the discovery that these nonlinear waves can interact strongly and then continue thereafter almost as if there had been no interaction at all. This persistence of the wave led Zabusky and Kruskal to coin the term ‘soliton’ (after photon, proton, etc.) to emphasize the particle-like character of these waves which retain their identities in a collision.

I added the emphasis on almost because many descriptions leave out this important qualifier.

Solution to linear KdV

There is a compact expression for the solution to the linear KdP equation if u, ux , and uxx all go to 0 as |x| → ∞. If u(x, 0) = f(x), then the solution is

u(x, t) = (3t)^{-1/3} \int_{-\infty}^\infty f(y) \text{ Ai}\!\left( \frac{x-y}{(3t)^{1/3}} \right) \,dy

Here Ai(z) is the Airy function. This function has come up several times here. For example, I’ve written about the Airy function in the context of triple factorials and in connection with Bessel functions.

Aside on third order equations

Third order differential equations are uncommon. Third order linear ODEs are quite rare. Third order differential equations are usually nonlinear PDEs, like the KdV equation. The linear KdV is an example of a linear third order PDE that arises in applications.

 

[1] P. G. Drazin and R. S. Johnson. Solitons: an introduction. Cambridge University Press. 1989.