Heat equation and the normal distribution

The density function of a normal distribution with mean 0 and standard deviation √(2kt) satisfies the heat equation. That is, the function

u(x, t) = \frac{1}{2\sqrt{\pi kt}} \exp\left(-\frac{x^2}{4kt}\right)

satisfies the partial differential equation

u_t = k\,u_{xx}

You could verify this by hand, or if you’d like, here’s Mathematica code to do it.

      u[x_, t_] := PDF[NormalDistribution[0, Sqrt[2 k t]], x]
      Simplify[ D[u[x, t], {t, 1}] - k D[u[x, t], {x, 2}] ]

This returns 0 as expected.

Solutions to the heat equation are unique if we specify initial conditions. So if we start with a very concentrated heat source at time t = ε, say

u(x, \varepsilon) = \frac{1}{2\sqrt{\pi k\varepsilon}} \exp\left(-\frac{x^2}{4k\varepsilon}\right)

where ε is a tiny but positive number, then the solution at the top of the post hold for all t > 0 and is unique.

Alternatively, we could say that if our initial condition is u(x, 0) = δ where the right hand side is the Dirac delta function, then the solution at the top of the post holds for all t > 0 and is unique. You could justify this rigorously using distribution theory (in the analytic sense, not the probability sense, though they’re related). Or you could justify it heuristically by letting ε to go zero and thinking of δ as the limit.

It is also the case that a constant satisfies the heat equation. So if we add a constant to our solution, the constant representing ambient temperature, and add our initial condition to that constant, the solution is the constant plus the solution above.

The temperature at the center will decay in proportion to 1/√t. A point x away from the center will initially get warmer, reach a maximum temperature at time t = x²/2k, and then cool off roughly in proportion to 1/√t.

Here’s a plot of the temperature at x = 10 with k = 5.

Related posts

Third order ordinary differential equations

Most applied differential equations are second order. This probably has something to do with the fact that Newton’s laws are second order differential equations.

Higher order equations are less common in application, and when they do pop up they usually have even order, such as the 4th order beam equation.

What about 3rd order equations? Third order equations are rare in application, and third order linear equations are even more rare.

This post will focus on ordinary differential equations (ODEs), but similar remarks apply to PDEs.

Third order nonlinear equations

One example of a third order nonlinear ODE is the Blasius boundary layer equation from fluid dynamics

y''' + yy'' = 0

and its generalization, the Falkner-Skan equation

y''' + yy'' + \beta\left(1 - (y')^2\right) = 0.

Third order linear equations

I’ve seen two linear third order ODEs in application, but neither one is very physical.

The first [1] is an equation satisfied by the product of Airy functions:

 y''' - 4xy' - 2y = 0.

Here is a short proof that the product of Airy functions satisfy this equation.

Airy functions come up in applications such as quantum mechanics, optics, and probability. I suppose products of Airy functions may come up in those areas, but the equation above seems like something discovered after the fact. It seems someone thought it would be useful to find a differential equation that products of Air functions satisfy. I doubt someone wrote down the equation because it came up in applications, then discovered that products of Airy functions were the solutions.

The second [2] is a statistical model for analyzing handwriting:

x'''(t) = \beta_1(t) x'(t) + \beta_2(t) x''(t) + f(t).

Here someone decided to try modeling handwriting movements as a function of velocity, acceleration, and “jerk” (third derivative of position). It may be a useful model, but it wasn’t derived in the same physical sense as say the equations of mechanical vibrations. You could also object that since x(t) does not appear in the equation, this is a second order differential equation for y(t) = x′(t).

Related posts

[1] Abramowitz and Stegun, Handbook of Mathematical Functions, equation 10.4.57.

[2] Ransay and Silverman. Applied Functional Data Analysis: Methods and Case Studies. Equation 12.1.

Finding a vector potential whose curl is given

The previous post showed that if a vector field F over a simply connected domain has zero curl, then there is a potential function φ whose gradient is F.

An analogous result says that if the vector field F has zero divergence, again over a simply connected domain, then there is a vector potential Φ whose curl is F.

These are both special cases of Poincaré’s lemma.

This post will outline how to calculate Φ. First of all, Φ is far from unique. Any vector field with zero curl can be added to Φ without changing its curl. So if

Φ = (Φ1, Φ2, Φ3)

then we can assume that one of the components, say Φ3, is zero by adding the right curl-free component. If you find that argument less than convincing, look at it this way: we’re going to solve a harder problem than simply find Φ such that ∇×Φ = F by giving ourselves the additional requirement that the last component of Φ must be zero.

Now if Φ = (Φ1, Φ2, 0) and ∇×Φ = F, then

-∂z Φ2= F1
z Φ1 = F2
x Φ2 – ∂y Φ1 = F3

We can solve the first equation by integrating F1 with respect to z and adding a function of x and y to be determined later. We can solve the second equation similarly, then use the third equation to determine the functions of x and y left over from solving the first two equations.


This post is the third, and probably last, in a series of posts looking at vector calculus from a more advanced perspective.  The first post in the series looked at applying {grad, curl, div} to {grad, curl, div}: seeing which combinations are defined, and which combinations are always 0. To illustrate the general pattern, the post dips into differential forms and the Hodge star operator.

The second post looked at finding the (scalar) potential function for a curl-free vector field, and mentions connections to topology and differential equations.

The present post is a little terse, but it makes more sense if you’ve gone through the previous post. The method of solution here is analogous to the method in the previous post, and that post goes into a little more detail.

Conservative vector fields

The previous post discussed the fact that the curl of a divergence is zero. The converse is also true (given one more requirement): a vector field F is the gradient of some potential φ function if ∇×F = 0. In that case we say F is a conservative vector field.

It’s clear that having zero curl is a necessary condition for a vector field to be the gradient of a potential. It’s not as clear that having zero curl is sufficient. And in fact, it’s not quite sufficient: it’s sufficient over a simply connected domain. This means a domain with no holes: any loop you draw in the domain can be continuously deformed to a point without leaving the domain. If you’re working with vector fields defined everywhere on ℝ³ then you don’t have to worry about this condition because ℝ³ is simply connected.

Aside on topology

For a calculus student, the requirement that a domain be simply connected is a footnote. For a topologist, it’s the main thing.

If a domain is not simply connected, then a vector field might have curl zero, but not be the gradient of a potential. So in general we have two kinds of vector fields with zero curl: those that are gradients and those that are not. We could look at the space of all vector fields that have zero curl and mod out by the vector fields that are gradients. The dimension of this space tells us something about how the domain is connected. This is the start of de Rham cohomology.

Calculating the potential function

If a vector field is conservative, i.e. if it is the gradient of a potential function φ, then you can find φ by integration.

The partial derivative of φ with respect to x is the first component of your vector field, so you can integrate to find φ as a function of x (and y and z). This integral will only be unique up to a constant, and functions of y and z alone are constants as far as partial derivatives with respect to x are concerned.

Now the partial derivative of φ with respect to y has to equal the second component of your vector field. So taking the derivative of what you found above determines your potential, up to a function of z. So then you differentiate again, this time with respect to z, and set this equal to the third component of your vector field, you’ve determined your potential function up to an constant. And that’s as far as it can be determined: any constant term goes away when you take the gradient.

Exact differential equations

A differential equation of the form

M(x, y) + N(x, y) y‘(x) = 0

is said to be exact if the partial derivative of M with respect to y equals the partial of N with respect to x. In that case you can find a function φ such that the partial of φ with respect to x is M and the partial of φ with respect to y is N (assuming you’re working in a simply connected domain). This function φ is a potential, though differential equation texts don’t call it that, and you find it just as you found φ above. The solution to your differential equation is given (implicitly) by

φ(x, y) = c

where c is a constant to be determined by initial conditions.

Poincaré’s lemma

For a vector field over a simply connected domain, having zero curl is necessary and sufficient for the existence of a potential function φ. This is a case of Poincaré’s lemma. The next post will look at another case of Poincaré’s lemma, finding a vector potential.

Solving Laplace’s equation in the upper half plane

In the previous post, I said that solving Laplace’s equation on the unit disk was important because the unit disk is a sort of “hub” of conformal maps: there are references and algorithms for mapping regions to and from a disk conformally.

The upper half plane is a sort of secondary hub. You may want to map two regions to and from each other via a half plane. And as with the disk, there’s an explicit solution to Laplace’s equation on a half plane.

Another reason to be interested in Laplace’s equation on a half plane is the connection to the Hilbert transform and harmonic conjugates.

Given a continuous real-valued function u on the real line, u can be extended to a harmonic function on the upper half plane by taking the convolution of u with the Poisson kernel, a variation on the Poisson kernel from the previous post. That is, for y > 0,

u(x + iy) = \frac{1}{\pi} \int_{-\infty}^\infty \frac{y}{(x-t)^2 + y^2}\, u(t)\, dt

This gives a solution to Laplace’s equation on the upper half plane with boundary values given by u on the real line. The function u is smooth on the upper half plane, and its limiting values as y → 0 is continuous.

Furthermore, u is the real part of an analytic function f = u + iv. The function v is the harmonic conjugate of u, and also equals the Hilbert transform of u.

Solving Laplace’s equation on a disk

Why care about solving Laplace’s equation

\Delta u = 0

on a disk?

Laplace’s equation is important in its own right—for example, it’s important in electrostatics—and understanding Laplace’s equation is a stepping stone to understanding many other PDEs.

Why care specifically about a disk? An obvious reason is that you might need to solve Laplace’s equation on a disk! But there are two less obvious reasons.

First, a disk can be mapped conformally to any simply connected proper open subset of the complex plane. And because conformal equivalence is transitive, two regions conformally equivalent to the disk are conformally equivalent to each other. For example, as I wrote about here, you can map a Mickey Mouse silhouette

Mickey Mouse

to and from the Batman logo

Batman logo

using conformal maps. In practice, you’d probably map Mickey Mouse to a disk, and compose that map with a map from the disk to Batman. The disk is a standard region, and so there are catalogs of conformal maps between the disk and other regions. And there are algorithms for computing maps between a standard region, such as the disk or half plane, and more general regions. You might be able to lookup a mapping from the disk to Mickey, but probably not to Batman.

In short, the disk is sort of the hub in a hub-and-spoke network of cataloged maps and algorithms.

Secondly, Laplace’s equation has an analytical solution on the disk. You can just write down the solution, and we will shortly. If it were easy to write down the solution on a triangle, that might be the hub, but instead its a disk.

Suppose u is a real-valued continuous function on the the boundary of the unit disk. Then u can be extended to a harmonic function, i.e. a solution to Laplace’s equation on the interior of the disk, via the Poisson integral formula:

u(z) = \frac{1}{2\pi} \int_0^{2\pi} u(e^{\i\theta})\, \text{Re}\left( \frac{e^{i\theta} + z}{e^{i\theta} - z}\right)\, d\theta

Or in terms of polar coordinates:

u(re^{i\varphi}) = \frac{1}{2\pi} \int_0^{2\pi} \frac{u(e^{\i\theta}) (1 - r^2)}{1 - 2r\cos(\theta-\varphi) + r^2}\, d\theta

Related posts

Eliminating terms from higher-order differential equations

This post ties together two earlier posts: the previous post on a change of variable to remove a term from a polynomial, and an older post on a change of variable to remove a term from a differential equation. These are different applications of the same idea.

A linear differential equation can be viewed as a polynomial in the differential operator D applied to the function we’re solving for. More on this idea here. So it makes sense that a technique analogous to the technique used for “depressing” a polynomial could work similarly for differential equations.

In the differential equation post mentioned above, we started with the equation

y'' + p(x) y' + q(x) y = 0

and reduced it to

u'' + r(x) u = 0

using the change of variable

u(x) = \exp\left( \frac{1}{2} \int^x p(t)\,dt\right ) y(x)

So where did this change of variables come from? How might we generalize it to higher-order differential equations?

In the post on depressing a polynomial, we started with a polynomial

p(x) = ax^n + bx^{n-1} + cx^{n-2} + \cdots

and use the change of variables

x = t - \frac{b}{na}

to eliminate the xn-1 term. Let’s do something analogous for differential equations.

Let P be an nth degree polynomial and consider the differential equation

P(D) y = 0

We can turn this into a differential

Q(D) u = 0

where the polynomial

Q(D) = P\left(D - \frac{p}{n}\right)

has no term involving Dn-1 by solving

\left(D - \frac{p}{n}\right) u = D y

which leads to

u(x) = \exp\left( \frac{1}{n} \int^x p(t)\,dt\right ) y(x)

generalizing the result above for second order ODEs.

Elliptic coordinates and Laplace’s equation

In rectangular coordinates, constant values of x are vertical lines and constant values of y are horizontal lines. In polar coordinates, constant values of r are circles and constant values of θ are lines from the origin.

In elliptic coordinates, the position of a point is specified by two numbers, μ and ν. Constant values of μ are ellipses and constant values of ν are hyperbolas. (Elliptic coordinates could just as easily have been called hyperbolic coordinates, but in applications we’re usually more interested in the ellipses and so the conventional name makes sense.)

As with rectangular and polar coordinates, elliptic coordinates are orthogonal, i.e. holding each coordinate constant separately gives two curves that are perpendicular where they intersect.

Here’s a plot, with curves of constant μ in blue and curves of constant ν in green.

Graph paper for elliptic coordinates

The correspondence between rectangular and elliptic coordinates is given below.

x = a cosh μ cos ν
y = a sinh μ sin ν

The value of a can be chosen for convenience. More on that below.

Notice that elliptic coordinates area approximately polar coordinates when μ is large. This is because cosh(μ) approaches sinh(μ) as μ increases. For small values of μ elliptic coordinates are very different from polar coordinates.

The purpose of various coordinate systems is to adapt your coordinates to the structure of your problem. If you’re working with something that’s radially symmetric, polar coordinates are natural. And if you’re working with an ellipse, say you’re computing the electric field of an ellipse with uniform charge, then elliptic coordinates are natural. Ellipses can be described by one number, the value of μ.

Separation of variables

However, the real reason for specialized coordinate systems is solving partial differential equations. (Differential equations provided the motivation for a great deal of mathematics; this is practically an open secret.)

When you first see differential equations, the first, and possibly only, technique for solving PDEs you see is separation of variables. If you have PDE for a function of two variables u(x, y), you first assume u is the product of a function of x alone and a function of y alone:

u(x, y) = X(x) Y(y).

Then you stick this expression into your PDE and get independent ODEs for X and Y. Then sums of the solutions (i.e. Fourier series) solve your PDE.

Now, here’s the kicker. We could do the same thing in elliptic coordinates, starting from the assumption

u(μ, ν) = M(μ) N(ν)

and stick this into our PDE written in elliptic coordinates. The form of a PDE changes when you change coordinates. For example, see this post on the Laplacian in various coordinate systems. When you leave rectangular coordinates, the Laplacian is no longer simply the sum of the second partials with respect to each variable. However, separation of variables still works for Laplace’s equation using elliptic coordinates.

Separable coordinate systems get their name from the fact that the separation of variables trick works in that coordinate system. Not for all possible PDEs, but for common PDEs like Laplace’s equation. Elliptic coordinates are separable.

If you’re solving Laplace’s equation on an ellipse, say with a constant boundary condition, the form of the solution is going to be complicated in rectangular coordinates, but simpler in elliptic coordinates.

Confocal elliptic coordinates

The coordinate ellipses, curves with μ constant, are ellipses with foci at ±a; you can choose a so that a particular ellipse you’re interested has a particular μ value, such as 1. Since all these ellipses have the same foci, we say they’re confocal. But confocal elliptic coordinates are something slightly different.

Elliptic coordinates and confocal elliptic coordinates share the same coordinate curves, but they label them differently. Confocal elliptic coordinates (σ, τ) are related to (ordinary) elliptic coordinates via

σ = cosh μ
τ = cos ν

Laplace’s equation is still separable under this alternate coordinate system, but it looks different; the expression for the Laplacian changes because we’ve changed coordinates.


Trading generalized derivatives for classical ones

Generalized functions have generalized derivatives. This is how we make sense of things like delta “functions” that are not functions, or functions that are not differentiable satisfying a differential equation. More on that here.

A major theme in the modern approach to partial differential equations is to first look for solutions in a space of generalized functions, then (hopefully) prove that the generalized function that solves your equation is a plain old function with derivatives in the classical sense.

You can differentiate generalized functions as many times as you’d like; the derivatives always exist, but these derivatives may not correspond to ordinary functions, i.e. may not be regular distributions [1]. And if the (generalized) derivatives are regular distributions, the functions they correspond to may not have as many (classical) derivatives as you’d like.

Rather than generalized derivatives existing, we’re primarily interested in generalized derivatives being regular and having finite Lebesgue norms. If generalized functions are “nice” in this sense, then we can “trade” them for classical functions, and the Sobolev embedding theorem sets the “exchange rate” for the trade.

There are a lot of variations on the Sobolev embedding theorem, but one version of the theorem, but one version says that if a function f on ℝn has generalized derivatives up to order k that all correspond to Lp functions, then f is equal (almost everywhere) to a function that has r derivatives, with each of the derivatives being Hölder continuous with exponent α if

n < p


r + α < k – n/p.

Related posts

[1] A distribution is a continuous linear functional on a space of test functions. If the effect of a distribution on a test function φ is equal to the integral of the product of a function f and φ, then the distribution is called regular, and we leave the distinction between the distribution and the function f implicit.

The Dirac delta function δ acts on a test function φ by returning φ(0). This is not a regular distribution because there is no function you can integrate against φ and always get φ(0) out.

A more direct approach to series solutions

In the previous post we found a solution to

y'' + 7y' + 12 = x^2.

using operator calculus, i.e. treating the differential operator D like a number and doing tricks with it. See the earlier post for a justification of why we can get away with unorthodox manipulations.

We can generalize the method of the previous post to say that a solution to any differential equation of the form

p(D) y = f(x)

is given by

y = \frac{1}{p(D)} f(x).

In the previous post we had

 p(D) = D^2 + 7D + 12


f(x) = x^2

but the method works more generally.

We then find a power series for 1/p(D), most likely by partial fraction decomposition, and apply the result to f(x). There may be a fair amount of labor left, but it’s purely calculus; all the differential equation work is done.

Conceptually, this method subsumes other differential equation techniques such as undetermined coefficients and power series solutions.

Let’s use this method to find an approximate solution to

y'' + 7y' + 12 = \zeta(x)

where ζ(x) is the Riemann zeta function.

In the previous post we worked out that

 \frac{1}{D^2 + 7D + 12} = \frac{1}{D+3} - \frac{1}{D+4} = \frac{1}{12} - \frac{7}{144}D + \frac{37}{1728}D^2 + {\cal O}(D^3)

and so an approximate solution to our differential equation near 0 is

 y(x) = \frac{1}{12}\zeta(0) - \frac{7}{144} \zeta'(0) + \frac{37}{1728}\zeta''(0) + {\cal O}(x^3).

Numerically this works out to

y(x) = - 0.0416667 + 0.0446706 x - 0.0429602 x^2.

If you want more terms, carry the series for 1/(D² + 7D + 12) out to more terms.

Related links