Moving between differential and integral equations

My years in graduate school instilled a Pavlovian response to PDEs: multiply by a test function and integrate by parts. This turns a differential equation into an integral equation [1].

I’ve been reading a book [2] on integral equations right now, and it includes several well-known techniques for turning certain kinds of integral equations into differential equations. Then this afternoon I talked to someone who was excited to have discovered a way to turn a more difficult integral equation into a differential equation.

For theoretical purposes, you often want to turn differential equations into integral equations. But for computational purposes, you often want to do the reverse.

Differential and integral equations are huge, overlapping fields, and sweeping generalities have exceptions. The opposite of the statement above may also be true. You may want to turn a differential equation into an integral equation for computational purposes (as in the finite element method) or turn an integral equation into a differential equation for theoretical convenience (as was the case for the person I was taking to).


[1] Sorta. More precisely this moves from the strong form of the PDE to a weak form. This involves integration, at least formally.

[2] Vladimir Ryzhov et al. Modern Methods in Mathematical Physics: Integral Equations in Wolfram Mathematica.

The essence of chaos

logistic bifurcation diagram

Linear systems can show sensitive dependence on initial conditions, but they cannot be chaotic. Only nonlinear systems can be chaotic.

George Datseris and Ulrich Parlitz explain this well in their book Nonlinear Dynamics:

… Sensitive dependence is not sufficient for a definition of chaos. … the state space is first stretched and then folded within itself. This is the defining characteristic of chaotic systems. The stretching part is where sensitive dependence comes from … as state space volumes increase in size … nonlinear dynamics takes over and folds the state space back in on itself. … This is why only nonlinear systems can be chaotic: linear systems with positive Lyapunov exponents always escape to infinity.

Related posts

The Pearson distributions

The previous post was about 12 probability distributions named after Irving Burr. This post is about 12 probability distributions named after Karl Pearson. The Pearson distributions are better known, and include some very well known distributions.

Burr’s distributions are defined by their CDFs; Pearson’s distributions are defined by their PDFs.

Pearson’s differential equation

The densities of Pearson’s distributions all satisfy the same differential equation:

f'(x) = \frac{(x-a) f(x)}{c_0 + c_1x + c_2x^2}

This is a linear differential equation, and so multiples of a solution are also a solution. However, a probability density must integrate to 1, so there is a unique probability density solution given a, c0, c1, and c2.

Well known distributions

Note that f(x) = exp(-x²/2) satisfies the differential equation above if we set a = 0, c0 = 1, and c1 = c2 = 0. This says the normal distribution is a Pearson distribution.

If f(x) = xm exp(-x) then the differential equation is satisfied for am, c0 = −1, and c0 = c2 = 0. This says that the exponential distribution and more generally the gamma distribution are Pearson distributions.

You can also show that the Cauchy distribution and more generally the Student t distribution are also Pearson distributions. So are the beta distributions (with a transformed range).

Table of Pearson distributions

The table below lists all Pearson distributions with their traditional names. The order of the list is a little strange for historical reasons.

The table uses Iverson’s bracket notation: a Boolean expression in brackets represents the function that is 1 when the condition holds and 0 otherwise. This way all densities are defined over the entire real line, though some of them are only positive over an interval.

The densities are presented without normalization constant; the normalization constant are whatever they have to be for the function to integrate to 1. The normalization constants can be complicated functions of the parameters and so they are left out for simplicity.

\begin{align*} \text{I} \hspace{1cm} & (1+x)^{m_1} (1-x)^{m_2} \,\,[-1 \leq x \leq 1] \\ \text{II} \hspace{1cm} & (1 - x^2)^m \,\, [ -1 \leq x \leq 1] \\ \text{III} \hspace{1cm} & x^m \exp(-x) \,\, [0 \leq x] \\ \text{IV} \hspace{1cm} & (1 + x^2)^{-m} \exp(-v \arctan x) \\ \text{V} \hspace{1cm} & x^{-m} \exp(-1/x) \,\, [0 \leq x] \\ \text{VI} \hspace{1cm} & x^{m_2}(1 + x)^{-m_1} \,\,[0 \leq x] \\ \text{VII} \hspace{1cm} & (1 + x^2)^{-m}\\ \text{VIII} \hspace{1cm} & (1 + x)^{-m} \,\, [0 \leq x \leq 1] \\ \text{IX} \hspace{1cm} & (1 + x)^{m} \,\, [0 \leq x \leq 1] \\ \text{X} \hspace{1cm} & \exp(-x) \,\, [0 \leq x] \\ \text{XI} \hspace{1cm} & x^{-m} \,\,[1 \leq x] \\ \text{XII} \hspace{1cm} & \left( (g+x)(g-x)\right)^h \,\, [-g \leq x \leq g] \end{align*}

There is a lot of redundancy in the list. All the distributions are either special cases of or limiting cases of distributions I, IV, and VI.

Note that VII is the Student t distribution after you introduce a scaling factor.


The Pearson distributions are determined by their first few moments, provided these exist, and these moments can be derived from the parameters in Pearson’s differential equation.

This suggests moment matching as a way to fit Pearson distributions to data: solve for the distribution parameters that make the the exact moments match the empirical moments. Sometimes this works very well, though sometimes other approaches are better, depending on your criteria for what constitutes a good match.

Heat equation and the normal distribution

The density function of a normal distribution with mean 0 and standard deviation √(2kt) satisfies the heat equation. That is, the function

u(x, t) = \frac{1}{2\sqrt{\pi kt}} \exp\left(-\frac{x^2}{4kt}\right)

satisfies the partial differential equation

u_t = k\,u_{xx}

You could verify this by hand, or if you’d like, here’s Mathematica code to do it.

      u[x_, t_] := PDF[NormalDistribution[0, Sqrt[2 k t]], x]
      Simplify[ D[u[x, t], {t, 1}] - k D[u[x, t], {x, 2}] ]

This returns 0 as expected.

Solutions to the heat equation are unique if we specify initial conditions. So if we start with a very concentrated heat source at time t = ε, say

u(x, \varepsilon) = \frac{1}{2\sqrt{\pi k\varepsilon}} \exp\left(-\frac{x^2}{4k\varepsilon}\right)

where ε is a tiny but positive number, then the solution at the top of the post hold for all t > 0 and is unique.

Alternatively, we could say that if our initial condition is u(x, 0) = δ where the right hand side is the Dirac delta function, then the solution at the top of the post holds for all t > 0 and is unique. You could justify this rigorously using distribution theory (in the analytic sense, not the probability sense, though they’re related). Or you could justify it heuristically by letting ε to go zero and thinking of δ as the limit.

It is also the case that a constant satisfies the heat equation. So if we add a constant to our solution, the constant representing ambient temperature, and add our initial condition to that constant, the solution is the constant plus the solution above.

The temperature at the center will decay in proportion to 1/√t. A point x away from the center will initially get warmer, reach a maximum temperature at time t = x²/2k, and then cool off roughly in proportion to 1/√t.

Here’s a plot of the temperature at x = 10 with k = 5.

Related posts

Third order ordinary differential equations

Most applied differential equations are second order. This probably has something to do with the fact that Newton’s laws are second order differential equations.

Higher order equations are less common in application, and when they do pop up they usually have even order, such as the 4th order beam equation.

What about 3rd order equations? Third order equations are rare in application, and third order linear equations are even more rare.

This post will focus on ordinary differential equations (ODEs), but similar remarks apply to PDEs.

Third order nonlinear equations

One example of a third order nonlinear ODE is the Blasius boundary layer equation from fluid dynamics

y''' + yy'' = 0

and its generalization, the Falkner-Skan equation

y''' + yy'' + \beta\left(1 - (y')^2\right) = 0.

Third order linear equations

I’ve seen two linear third order ODEs in application, but neither one is very physical.

The first [1] is an equation satisfied by the product of Airy functions:

 y''' - 4xy' - 2y = 0.

Here is a short proof that the product of Airy functions satisfy this equation.

Airy functions come up in applications such as quantum mechanics, optics, and probability. I suppose products of Airy functions may come up in those areas, but the equation above seems like something discovered after the fact. It seems someone thought it would be useful to find a differential equation that products of Air functions satisfy. I doubt someone wrote down the equation because it came up in applications, then discovered that products of Airy functions were the solutions.

The second [2] is a statistical model for analyzing handwriting:

x'''(t) = \beta_1(t) x'(t) + \beta_2(t) x''(t) + f(t).

Here someone decided to try modeling handwriting movements as a function of velocity, acceleration, and “jerk” (third derivative of position). It may be a useful model, but it wasn’t derived in the same physical sense as say the equations of mechanical vibrations. You could also object that since x(t) does not appear in the equation, this is a second order differential equation for y(t) = x′(t).

Related posts

[1] Abramowitz and Stegun, Handbook of Mathematical Functions, equation 10.4.57.

[2] Ransay and Silverman. Applied Functional Data Analysis: Methods and Case Studies. Equation 12.1.

Finding a vector potential whose curl is given

The previous post showed that if a vector field F over a simply connected domain has zero curl, then there is a potential function φ whose gradient is F.

An analogous result says that if the vector field F has zero divergence, again over a simply connected domain, then there is a vector potential Φ whose curl is F.

These are both special cases of Poincaré’s lemma.

This post will outline how to calculate Φ. First of all, Φ is far from unique. Any vector field with zero curl can be added to Φ without changing its curl. So if

Φ = (Φ1, Φ2, Φ3)

then we can assume that one of the components, say Φ3, is zero by adding the right curl-free component. If you find that argument less than convincing, look at it this way: we’re going to solve a harder problem than simply find Φ such that ∇×Φ = F by giving ourselves the additional requirement that the last component of Φ must be zero.

Now if Φ = (Φ1, Φ2, 0) and ∇×Φ = F, then

-∂z Φ2= F1
z Φ1 = F2
x Φ2 – ∂y Φ1 = F3

We can solve the first equation by integrating F1 with respect to z and adding a function of x and y to be determined later. We can solve the second equation similarly, then use the third equation to determine the functions of x and y left over from solving the first two equations.


This post is the third, and probably last, in a series of posts looking at vector calculus from a more advanced perspective.  The first post in the series looked at applying {grad, curl, div} to {grad, curl, div}: seeing which combinations are defined, and which combinations are always 0. To illustrate the general pattern, the post dips into differential forms and the Hodge star operator.

The second post looked at finding the (scalar) potential function for a curl-free vector field, and mentions connections to topology and differential equations.

The present post is a little terse, but it makes more sense if you’ve gone through the previous post. The method of solution here is analogous to the method in the previous post, and that post goes into a little more detail.

Conservative vector fields

The previous post discussed the fact that the curl of a divergence is zero. The converse is also true (given one more requirement): a vector field F is the gradient of some potential φ function if ∇×F = 0. In that case we say F is a conservative vector field.

It’s clear that having zero curl is a necessary condition for a vector field to be the gradient of a potential. It’s not as clear that having zero curl is sufficient. And in fact, it’s not quite sufficient: it’s sufficient over a simply connected domain. This means a domain with no holes: any loop you draw in the domain can be continuously deformed to a point without leaving the domain. If you’re working with vector fields defined everywhere on ℝ³ then you don’t have to worry about this condition because ℝ³ is simply connected.

Aside on topology

For a calculus student, the requirement that a domain be simply connected is a footnote. For a topologist, it’s the main thing.

If a domain is not simply connected, then a vector field might have curl zero, but not be the gradient of a potential. So in general we have two kinds of vector fields with zero curl: those that are gradients and those that are not. We could look at the space of all vector fields that have zero curl and mod out by the vector fields that are gradients. The dimension of this space tells us something about how the domain is connected. This is the start of de Rham cohomology.

Calculating the potential function

If a vector field is conservative, i.e. if it is the gradient of a potential function φ, then you can find φ by integration.

The partial derivative of φ with respect to x is the first component of your vector field, so you can integrate to find φ as a function of x (and y and z). This integral will only be unique up to a constant, and functions of y and z alone are constants as far as partial derivatives with respect to x are concerned.

Now the partial derivative of φ with respect to y has to equal the second component of your vector field. So taking the derivative of what you found above determines your potential, up to a function of z. So then you differentiate again, this time with respect to z, and set this equal to the third component of your vector field, you’ve determined your potential function up to an constant. And that’s as far as it can be determined: any constant term goes away when you take the gradient.

Exact differential equations

A differential equation of the form

M(x, y) + N(x, y) y‘(x) = 0

is said to be exact if the partial derivative of M with respect to y equals the partial of N with respect to x. In that case you can find a function φ such that the partial of φ with respect to x is M and the partial of φ with respect to y is N (assuming you’re working in a simply connected domain). This function φ is a potential, though differential equation texts don’t call it that, and you find it just as you found φ above. The solution to your differential equation is given (implicitly) by

φ(x, y) = c

where c is a constant to be determined by initial conditions.

Poincaré’s lemma

For a vector field over a simply connected domain, having zero curl is necessary and sufficient for the existence of a potential function φ. This is a case of Poincaré’s lemma. The next post will look at another case of Poincaré’s lemma, finding a vector potential.

Solving Laplace’s equation in the upper half plane

In the previous post, I said that solving Laplace’s equation on the unit disk was important because the unit disk is a sort of “hub” of conformal maps: there are references and algorithms for mapping regions to and from a disk conformally.

The upper half plane is a sort of secondary hub. You may want to map two regions to and from each other via a half plane. And as with the disk, there’s an explicit solution to Laplace’s equation on a half plane.

Another reason to be interested in Laplace’s equation on a half plane is the connection to the Hilbert transform and harmonic conjugates.

Given a continuous real-valued function u on the real line, u can be extended to a harmonic function on the upper half plane by taking the convolution of u with the Poisson kernel, a variation on the Poisson kernel from the previous post. That is, for y > 0,

u(x + iy) = \frac{1}{\pi} \int_{-\infty}^\infty \frac{y}{(x-t)^2 + y^2}\, u(t)\, dt

This gives a solution to Laplace’s equation on the upper half plane with boundary values given by u on the real line. The function u is smooth on the upper half plane, and its limiting values as y → 0 is continuous.

Furthermore, u is the real part of an analytic function f = u + iv. The function v is the harmonic conjugate of u, and also equals the Hilbert transform of u.

Solving Laplace’s equation on a disk

Why care about solving Laplace’s equation

\Delta u = 0

on a disk?

Laplace’s equation is important in its own right—for example, it’s important in electrostatics—and understanding Laplace’s equation is a stepping stone to understanding many other PDEs.

Why care specifically about a disk? An obvious reason is that you might need to solve Laplace’s equation on a disk! But there are two less obvious reasons.

First, a disk can be mapped conformally to any simply connected proper open subset of the complex plane. And because conformal equivalence is transitive, two regions conformally equivalent to the disk are conformally equivalent to each other. For example, as I wrote about here, you can map a Mickey Mouse silhouette

Mickey Mouse

to and from the Batman logo

Batman logo

using conformal maps. In practice, you’d probably map Mickey Mouse to a disk, and compose that map with a map from the disk to Batman. The disk is a standard region, and so there are catalogs of conformal maps between the disk and other regions. And there are algorithms for computing maps between a standard region, such as the disk or half plane, and more general regions. You might be able to lookup a mapping from the disk to Mickey, but probably not to Batman.

In short, the disk is sort of the hub in a hub-and-spoke network of cataloged maps and algorithms.

Secondly, Laplace’s equation has an analytical solution on the disk. You can just write down the solution, and we will shortly. If it were easy to write down the solution on a triangle, that might be the hub, but instead its a disk.

Suppose u is a real-valued continuous function on the the boundary of the unit disk. Then u can be extended to a harmonic function, i.e. a solution to Laplace’s equation on the interior of the disk, via the Poisson integral formula:

u(z) = \frac{1}{2\pi} \int_0^{2\pi} u(e^{\i\theta})\, \text{Re}\left( \frac{e^{i\theta} + z}{e^{i\theta} - z}\right)\, d\theta

Or in terms of polar coordinates:

u(re^{i\varphi}) = \frac{1}{2\pi} \int_0^{2\pi} \frac{u(e^{\i\theta}) (1 - r^2)}{1 - 2r\cos(\theta-\varphi) + r^2}\, d\theta

Related posts

Eliminating terms from higher-order differential equations

This post ties together two earlier posts: the previous post on a change of variable to remove a term from a polynomial, and an older post on a change of variable to remove a term from a differential equation. These are different applications of the same idea.

A linear differential equation can be viewed as a polynomial in the differential operator D applied to the function we’re solving for. More on this idea here. So it makes sense that a technique analogous to the technique used for “depressing” a polynomial could work similarly for differential equations.

In the differential equation post mentioned above, we started with the equation

y'' + p(x) y' + q(x) y = 0

and reduced it to

u'' + r(x) u = 0

using the change of variable

u(x) = \exp\left( \frac{1}{2} \int^x p(t)\,dt\right ) y(x)

So where did this change of variables come from? How might we generalize it to higher-order differential equations?

In the post on depressing a polynomial, we started with a polynomial

p(x) = ax^n + bx^{n-1} + cx^{n-2} + \cdots

and use the change of variables

x = t - \frac{b}{na}

to eliminate the xn-1 term. Let’s do something analogous for differential equations.

Let P be an nth degree polynomial and consider the differential equation

P(D) y = 0

We can turn this into a differential

Q(D) u = 0

where the polynomial

Q(D) = P\left(D - \frac{p}{n}\right)

has no term involving Dn-1 by solving

\left(D - \frac{p}{n}\right) u = D y

which leads to

u(x) = \exp\left( \frac{1}{n} \int^x p(t)\,dt\right ) y(x)

generalizing the result above for second order ODEs.