Closed-form solutions to nonlinear PDEs

The traditional approach to teaching differential equations is to present a collection of techniques for finding closed-form solutions to ordinary differential equations (ODEs). These techniques seem completely unrelated [1] and have arcane names such as integrating factors, exact equations, variation of parameters, etc.

Students may reasonably come away from an introductory course with the false impression that it is common for ODEs to have closed-form solutions because it is common in the class.

My education reacted against this. We were told from the beginning that differential equations rarely have closed-form solutions and that therefore we wouldn’t waste time learning how to find such solutions. I didn’t learn the classical solution techniques until years later when I taught an ODE class as a postdoc.

I also came away with a false impression, the idea that differential equations almost never have closed-form solutions in practice, especially nonlinear equations, and above all partial differential equations (PDEs). This isn’t far from the truth, but it is an exaggeration.

I specialized in nonlinear PDEs in grad school, and I don’t recall ever seeing a closed-form solution. I heard rumors of a nonlinear PDE with a closed form solution, the KdV equation, but I saw this as the exception that proves the rule. It was the only nonlinear PDE of practical importance with a closed-form solution, or so I thought.

It is unusual for a nonlinear PDE to have a closed-form solution, but it is not unheard of. There are numerous examples of nonlinear PDEs, equations with important physical applications, that have closed-form solutions.

Yesterday I received a review copy of Analytical Methods for Solving Nonlinear Partial Differential Equations by Daniel Arrigo. If I had run across with a book by that title as a grad student, it would have sounded as eldritch as a book on the biology of Bigfoot or the geography of Atlantis.

A few pages into the book there are nine exercises asking the reader to verify closed-form solutions to nonlinear PDEs:

  1. a nonlinear diffusion equation
  2. Fisher’s equation
  3. Fitzhugh-Nagumo equation
  4. Berger’s equation
  5. Liouville’s equation
  6. Sine-Gordon equation
  7. Korteweg–De Vries (KdV) equation
  8. modified Korteweg–De Vries (mKdV) equation
  9. Boussinesq’s equation

These are not artificial examples crafted to have closed-form solutions. These are differential equations that were formulated to model physical phenomena such as groundwater flow, nerve impulse transmission, and acoustics.

It remains true that differential equations, and especially nonlinear PDEs, typically must be solved numerically in applications. But the number of nonlinear PDEs with closed-form solutions is not insignificant.

Related posts

[1] These techniques are not as haphazard as they seem. At a deeper level, they’re all about exploiting various forms of symmetry.


Blow up in finite time

A few years ago I wrote a post about approximating the solution to a differential equation even though the solution did not exist. You can ask a numerical method for a solution at a point past where the solution blows up to infinity, and it will dutifully give you a finite solution. The result is meaningless, but will give a result anyway.

The more you can know about the solution to a differential equation before you attempt to solve it numerically the better. At a minimum, you’d like to know whether there even is a solution before you compute it. Unfortunately, a lot of theorems along these lines are local in nature: the theorem assures you that a solution exists in some interval, but doesn’t say how big that interval might be.

Here’s a nice theorem from [1] that tells you that a solution is going to blow up in finite time, and it even tells you what that time is.

The initial value problem

y′ = g(y)

with y(0) = y0 with g(y) > 0 blows up at T if and only if the integral

\int_{y_0}^\infty \frac{1}{g(t)} \, dt
converges to T.

Note that it is not necessary to first find a solution then see whether the solution blows up.

Note also that an upper (or lower) bound on the integral gives you an upper (or lower) bound on T. So the theorem is still useful if the integral is hard to evaluate.

This theorem applies only to autonomous differential equations, i.e. the right hand side of the equation depends only on the solution y and not on the solution’s argument t. The differential equation alluded to at the top of the post is not autonomous, and so the theorem above does not apply. There are non-autonomous extensions of the theorem presented here (see, for example, [2]) but I do not know of a theorem that would cover the differential equation presented here.

[1] Duff Campbell and Jared Williams. Exloring finite-time blow-up. Pi Mu Epsilon Journal, Spring 2003, Vol. 11, No. 8 (Spring 2003), pp. 423–428

[2] Jacob Hines. Exploring finite-time blow-up of separable differential equations. Pi Mu Epsilon Journal, Vol. 14, No. 9 (Fall 2018), pp. 565–572

When is a function of two variables separable?

Given a function f(xy), how can you tell whether f can be factored into the product of a function g(x) of x alone and a function h(y) of y alone? Depending on how an expression for f is written, it may or may not be obvious whether f(x, y) can be separated into g(x) h(y).

There are several situations in which you might want to know whether a function is separable. For example, the ordinary differential equation

y′ = f(x, y)

can be solved easily when f(x, y) = g(x) h(y).

You might want to do something similar for a partial differential equation, using separation of variables, possibly choosing a coordinate system that allows the separation of variables trick to work.

Aside from applications to differential equations, you might want to know whether a polynomial in two variables can be factored into the product of polynomials in each variable separately.

In [1] David Scott gives a simple necessary condition for f to be separable:

f fxy = fx fy

Here the subscripts indicate partial derivatives.

It’s easy to see this condition is necessary. Scott shows the condition is also sufficient under some mild technical assumptions.

As an example, determine the value of k such that the differential equation

y′ = 6xy² + 3y² −4x + k

is separable.

Scott’s equation

f fxy = fx fy


(6xy² + 3y² −4x + k)(12y) = (6y² −4)(12xy + 6y)

which holds if and only if k = −2.

Related posts

[1] David Scott. When is an Ordinary Differential Equation Separable? The American Mathematical Monthly, Vol. 92, No. 6, pp. 422–423

Applications of Bernoulli differential equations

When a nonlinear first order ordinary differential equation has the form

\frac{dy}{dx} + P(x)\,y = Q(x)\, y^n

with n ≠ 1, the change of variables

u = y^{1-n}

turns the equation into a linear equation in u. The equation is known as Bernoulli’s equation, though Leibniz came up with the same technique. Apparently the history is complicated [1].

It’s nice that Bernoulli’s equation can be solve in closed form, but is it good for anything? Other than doing homework in a differential equations course, is there any reason you’d want to solve Bernoulli’s equation?

Why yes, yes there is. According to [1], Bernoulli’s equation is a generalization of a class of differential equations that came out of geometric problems.

Someone asked about applications of Bernoulli’s equation on Stack Exchange and got a couple interesting answers.

The first answer said that a Bernoulli equation with n = 3 comes up in modeling frictional forces. See also this post on drag forces.

The second answer links to a paper on Bernoulli memristors.

Related posts

[1] Adam E. Parker. Who Solved the Bernoulli Differential Equation and How Did They Do It? College Mathematics Journal, vol. 44, no. 2, March 2013.

Convergent subsequence

I was reading a theorem giving conditions for a divergent series to have a convergent subseries and had a sort of flashback.

I studied nonlinear PDEs in grad school, which amounted to applied functional analysis. We were constantly proving or using theorems about sequences having convergent subsequences, often subsequences that converged in a very weak sense.

This seemed strange to me at first. If a sequence diverges, why is it of any interest that a subsequence converges? This seemed like blackout poetry, completely changing the meaning of a text by selecting various words. For example, here is the opening paragraph of Pride and Prejudice, blacked out to appear to be a real estate ad.

good neighborhood, surrounding park

Here’s the big picture I was missing. We’re trying to show that a differential equation has a solution, and we’re doing that by some kind of successive approximation. Maybe our series of approximations doesn’t work in general, but that doesn’t matter. We’re just trying to find something that is a solution. Once you come up with a candidate solution, by whatever means, grasping at whatever straws you can grasp, you then prove that the candidate really is a solution, perhaps a solution in a weak sense. Then you show that this solution, potentially one of many, is unique. Then you show that your weak solution is a in fact a solution in a stronger sense.

Related posts

Period of a nonlinear pendulum

The term “nonlinear pendulum” is analogous to a retronym, a new name for an old thing to distinguish it from a new variation. For example, once upon a time a guitar was just a guitar. Now such a guitar is called an acoustic guitar to distinguish it from an electric guitar. Similarly, analog signal processing is a retronym to distinguish what was once the only kind of signal processing from the new arrival, digital signal processing.

The equation of motion for a pendulum is nonlinear. If the initial angle of displacement is sufficiently small, the linearized form of the equation is adequate for most applications. This linearized approximation is better known than the more accurate original equation, and so the un-linearized equation is known as the nonlinear pendulum equation.

The (nonlinear) equation of motion for a pendulum is the differential equation

\theta'' + \frac{g}{\ell}\sin \theta = 0

where g is the acceleration due to gravity and ℓ is the length of the pendulum. For small initial displacement θ0 the linear approximation

\theta'' + \frac{g}{\ell} \theta = 0

works well. The smaller θ0 is the more accurate the linear approximation is.

Linear and nonlinear period

The period of a pendulum obtained by solving the linearized equation is

T = 2\pi \sqrt{\frac{\ell}{g}}

The solution to the nonlinear pendulum equation is also periodic, though the solution is a combination of Jacobi functions rather than a combination of trig functions. The difference between the two solutions is small when θ0 is small, but becomes more significant as θ0 increases.

The difference in the periods is more evident than the difference in shape for the two waves. The period of the nonlinear solution is longer than that of the linearized solution. Here’s a plot of the solutions to the linear and nonlinear equations, with ℓ = g and θ0 = 1.

The period for the nonlinear pendulum is given by

T = 2\pi \sqrt{\frac{\ell}{g}}\, f(\theta_0)

where f is an increasing function, equal to 1 at θ = 0.

The exact form of f involves special functions, and so there is naturally a lot of interest in approximations to f. The exact value is given by

f(\theta) = \frac{2}{\pi}K(\sin(\theta/2)) = \frac{1}{\text{AGM}(1, \cos(\theta/2))}

where K is the “complete elliptic integral of the first kind” and AGM is the arithmetic-geometric mean.

The AGM of two numbers is found by taking their ordinary (arithmetic) mean and geometric mean, then repeating the process. This process converges very rapidly, and so doing one step of the iteration gives a good approximation. If that’s not good enough, doing two steps gives an even better approximation, and so on. In fact, a common approximation for f(θ) is to do half a step, taking the geometric mean of 1 and cos(θ/2), i.e.

f(\theta) \approx \frac{1}{\sqrt{\cos(\theta/2)}}

To see how accurate this approximation is, let’s plot the exact and approximate values of f.

The two curves can hardly be distinguished visually, so let’s look at a plot of their difference.

Here’s the code that produced the two plots above.

from scipy.special import ellipk
from numpy import sin, cos, pi, linspace
import matplotlib.pyplot as plt

def exact(θ):
    return 2*ellipk(sin(θ/2)**2)/pi

def approx(θ):
    return cos(θ/2)**-0.5

t = linspace(0, 1.5)
plt.plot(t, exact(t))
plt.plot(t, approx(t))
plt.legend(["exact", "approx"])

plt.plot(t, exact(t) - approx(t))
plt.ylabel("approximation error")

NB: There are two conventions for defining the complete elliptic integral of the first kind. SciPy uses a convention for K that requires us to square the argument.

Driven oscillations

The differences between the linearized and nonlinear equation become more apparent when there is a forcing function, i.e. when the right-hand side of the differential equations is not zero. Here are the solutions when the forcing function is cos(2t).

Now the solutions not only differ in their period, the shapes of the solutions are substantially different. The linear solutions are well-behaved but the nonlinear solutions can be chaotic with sensitive dependence on initial conditions. This remains true if a damping term is added.

Related posts

Rational solution to Korteweg–De Vries equation

Students seeing differential equations for the first time expect every equation to have a nice closed-form solution, because up to that point in their education nearly every problem they’ve seen has been contrived to have a nice closed-form solution.

Once you resign yourself to the fact that a differential equation will rarely have a closed form solution, it’s a treat when you run across one that does. This is especially true for nonlinear equations.

The Korteweg–De Vries (KdV) equation is

u_t - 6 u\, u_x + u_{xxx} = 0

is such a treat. I wrote a few days ago about the sech² solution to the KdV equation.

u(x,t) = -\frac{v}{2} \,\text{sech}^2\left(\frac{\sqrt{v}}{2} (x - vt - a)\right )

There’s also a rational solution:

u(x, t) = 6 x \frac{ \left(x^3-24 t\right)}{\left(x^3 + 12 t\right)^2}

We can verify this is a solution to the KdV equation reusing the Mathematica code from the earlier post.

    u[x_, t_] := u[x_, t_] := 6 x (x^3 - 24 t)/(x^3 + 12 t)^2
    Simplify[ D[u[x, t], {t, 1}] 
            - 6 u[x, t] D[u[x, t], {x, 1}] 
            + D[u[x, t], {x, 3}] ]

This simplifies to 0.

Here’s a plot:

The top of the plot looks like a two-lane road on top of a mountain ridge, with a sinkhole in the middle of the road.

The “road” is a artifact of plotting. The solution is singular along the curve x³ + 12t= 0, and Mathematica had to chop the top of the graph off because it can’t plot an infinitely tall function.


Solitons and the KdV equation

Rarely does a nonlinear differential equation, especially a nonlinear partial differential equation, have a closed-form solution. But that is the case for the Korteweg–De Vries equation.

(Technically I should say it’s rare for a naturally-occurring nonlinear differential equation to have a closed-form solution. You can always start with a solution and cook up a contrived differential equation that it satisfies, but that differential equation will not be one that is interesting in applications.)

The Korteweg–De Vries (KdV) equation is

u_t - 6 u\, u_x + u_{xxx} = 0

This equation is used to model shallow water waves.

The KdV equation is a third-order PDE, which is unusual but not unheard of. As I wrote about earlier in the context of ODEs, third order linear equations are virtually nonexistent in application, but third order nonlinear equations are not so uncommon.

The function

u(x,t) = -\frac{v}{2} \,\text{sech}^2\left(\frac{\sqrt{v}}{2} (x - vt - a)\right )

is a solution to the KdV equation. It is an example of a class of functions known as solitons.

You can increase the amplitude by increasing v, but when you do you also increase the velocity of the wave. You can’t vary amplitude and velocity independently because the KdV equation is nonlinear.

Verifying by hand that this is a solution is tedious, so I’ll show the verification in Mathematica.

    u[x_, t_] := - (v/2) Sech[Sqrt[v] (x - v t - a)/2]^2
    Simplify[  D[u[x, t], {t, 1}]  
               - 6 u[x, t] D[u[x, t], {x, 1}] 
               + D[u[x, t], {x, 3}]  ]

This returns 0. (Without the Simplify[] command it returns a mess, but the mess simplifies to 0.)

Here’s a plot of the soliton with a = 0 and v = 1.

Here’s a slice with x = 0.

This looks remarkably like the density function of a Gaussian turned upside down. Here’s a plot with the soliton (in blue) with the density of a normal random variable with variance 5/2, scaled to have the same amplitude at 0.

Related posts

When a function cannot be extended

The relation between a function and its power series is subtle. In a calculus class you’ll see equations of the form “series = function” which may need some footnotes. Maybe the series only represents the function over part of its domain: the function extends further than the power series representation.

Starting with the power series, we can ask whether the function it represents extends further than the series representation.

This video does a nice job of explaining why a particular function cannot be extended beyond the disk on which the series converges.

Toward the end, the video explains how its main example is a member of a broader class of functions that have no analytic continuation. The technical term, which the video does not use, is lacunary series [1]. When the gaps in a power series grow faster than linearly, the series cannot be extended beyond its radius of convergence.

Lacunary series make interesting images since the behavior of the function becomes complicated toward the edge of the domain. The video gives some nice examples. The image above comes from this post and the following image comes from this post.

Differential equations

The video mentions Hadamard’s gap theorem. I believe his gap theorem was a spin-off of his work on Laplace’s equation. See this post on Hadamard’s counterexample to the Dirichlet principle for the Laplacian.

The motivation for a LOT of classical math was differential equations. I didn’t realize this as a student. Years later I’d run into something and think “So that is why this person was interested in that problem,” such as why Hadamard would care about whether power series could be extended.

Hadamard wanted to solve a differential equation on a disk with boundary conditions specified on the rim. It’s going to be a problem if the series representation of the solution doesn’t extend to the rim.

Related posts

[1] Lacuna is the Latin word for a hole or a pit. The word came to be use metaphorically for a gap, such as a gap in a manuscript. Later mathematicians used this term for power series with increasing gaps between non-zero terms.

Test functions

Test functions are how you can make sense of functions that aren’t really functions.

The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended real numbers where a function can take on the value ∞.

You can make things like the delta function rigorous by saying they’re not functions of real numbers, but functions that operate on other functions, i.e. test functions. More on that here. These functions acting on test functions are called generalized functions or distributions. (This this post for how this kind of distribution differs from a probability distribution, but is analogous.)

Analogy with test charges

To say rigorously how these generalized functions behave you show how they act on test functions φ. Test functions are analogous to test charges: you can describe an electric field by saying what force it would exert on a test charge.

A test charge is somewhat idealized. It has to be so small that it tests a field without effecting it. This isn’t really possible, but you can think of it as a limit. You’re looking at the limit of the force per unit charge as the charge goes to zero.

Similarly, a test function is ideal in that it very well behaved so the generalized functions that act on it can be badly behaved. Test functions are infinitely differentiable, and they either have compact support or have extremely thin tails, depending on context.

Analogy with category theory

While writing the previous post I thought about an analogy between distribution theory and category theory. I worked with distribution theory a lot in grad school, and I found it natural that the definition of a distribution depended on how it acted in relation to something else, i.e. how it acted on all test functions.

But I found category definitions that involved extraneous objects puzzling. For example, the product of two objects is a third object such that for any fourth object (!) a certain diagram commutes. Why is this superfluous object doing injecting itself into the definition? If I’d thought of it as a test object then I would have found the definition more palatable.

As with distribution theory, you’re defining something by how it relates to all elements of some collection. But in distribution theory, your distributions and your test functions are very distinct things. In category theory, your test objects are peers of the thing you’re testing.