How can we extend the idea of derivative so that more functions are differentiable? Why would we want to do so? How can we make sense of a delta “function” that isn’t really a function? We’ll answer these questions in this post.

Suppose *f*(*x*) is a differentiable function of one variable. Suppose φ(*x*) is an infinitely differentiable function that is zero outside of some finite interval. Functions like φ are called test functions. Integration by parts says that

where the integrals are over the entire real line. (The fact that φ is zero outside a finite interval mean the “*uv*” term from integration by parts is zero.) Now suppose *f*(*x*) is not differentiable. Then the left side of the equation above does not make sense, but the right side does. We use the right hand side to develop the definition of the generalized derivative.

We think of the function *f* not as a function of real numbers, but as a **distribution** that operates on tests functions. That is, we associate with *f* the linear functional on the space of tests functions that maps φ to ∫ *f*(*x*) φ(*x*) *dx*. Then the distributional derivative of this functional is another linear functional, the distribution that maps test functions φ to -∫ *f*(*x*) φ'(*x*) *dx*. In summary,

We can use this procedure to define as many derivatives of *f* as we’d like, as long as *f* is integrable. So *f* could be some horribly ill-behaved function, differentiable nowhere in the classical sense, and we could define its 37th derivative by repeatedly applying this idea. (Distributions are also called “generalized functions.” Distributional derivatives are also called “generalized derivatives” or “weak derivatives.”)

By the way, this same procedure is used to make sense of the delta function. The delta function isn’t a function at all. It is the distribution δ that evaluates test functions at zero, i.e. δ maps φ to φ(0). (The delta function often nonsensically defined to be a function that is infinite at zero and zero everywhere else.)

Why would we want to be able to differentiate more functions? When we can differentiate more functions, we can look in a bigger space for solutions to differential equations. Sometimes this allows us to find solutions to equations that do not have classical solutions. Other times this allows us to find classical solutions more easily. We may first prove that a generalized solution exists, and then prove that the generalized solution is in fact a classical solution.

Here’s an analogy that explains how generalized solutions might lead to classical solutions. Suppose you want to find the minimum value of a function for integer arguments. You might first look for a real number that minimizes the function. This lets you, for example, use derivatives in your search for the minimum. If the real minimum you find happens to also be an integer, then you’ve solved your original problem. Distributions and generalized derivatives work much the same way. You might find a classical solution by first looking in a larger space of possible solutions, a space that allows you to use more powerful techniques in your search for a solution.

**Related post**: Approximating a solution that doesn’t exist

Great post John!

First heard of weak formulations when taught about the Finite Element Method, buth didn’t dive into them until a couple of years ago, when I started studying Jean Leray’s work on Navier-Stokes equations.

I seem to recall that it was actually Leray who first came up with this whole “weak solution” and “generalized derivative” thing. Do you have any historical insight?

Jamie,

I believe it was Sergei Sobolev, the eminent Russian mathematician, who developed this field inititally. I remember doing this as part of my dissertation for my bachelors degree. Fascinating field developed by amazing mathematicians.

Jaime, I’m not certain about the history of distributions, but I believe there were many historical precedents before people like Sobolev formalized the theory. For example, mathematicians studied non-differentiable functions by looking at limits of a sequence of differentiable functions and I imagine that goes back decades before formal distribution theory.

You can also use Fourier transforms to extend the idea of derivatives (see this post) and I imagine that also predates distribution theory by decades.

As for Navier-Stokes, you might find these notes useful.

I got stuck right here:

The fact that φ is zero outside a finite interval mean the “uv” term from integration by parts is zero.

Why wouldn’t the uv term be determined by the parts of f and phi on the interval where phi is non-zero?

Sue, here’s some more detail on the integration by parts. Let u = φ and v = f’ dx. ∫ f’ φ = f φ – ∫ f φ’. When these terms are evaluated at -∞ and ∞, the f φ term drops out since φ is zero at -∞ and ∞.

Ah. So by sticking a shift by t into φ, you can define f(t), f'(t), f”(t) etc. Slick. But you’d have to say that the equivalence is w.r.t. a particular test function, correct?

John: Distributions are functionals on the entire space of test functions. So two distributions are equal if the have the same effect on

alltest functions. In practice, the test functions fade into the background. You never need to look at a particular test function.Right, this is the basis for finite elements: shape functions that apply only within the element.

The beauty of them is that weighted residual methods like this still apply even for those situations where there isn’t a calculus of variations functional at hand. It made it possible to extend FEA beyond linear conduction heat transfer and small strain elasticity, both of which can be tackled using calculus of variations, into general continuum mechanics.

Typically fantastic stuff, John.

It is a great post, but it would be better with an example. For example, the derivative of the step function.

Can somebody help me find the solution:

We say g is integrable such that INTEGRAL { g(x)d(x) } = 1 .

Determine (1/e^2) .( g'(x/e) ) in distribution(C-infinity) as ‘e’ goes to 0?

What is a good way to work with generalized functions when programming? What primitive pieces are necessary to construct them?

How do we extent this to multivariate functions?

A multivariate distribution is a linear functional on multivariate test functions. These test functions are infinitely differentiable functions with compact support.

The definition of differentiation is based on what you get from integration by parts. Every variable you take a partial derivative with respect to multiplies the result by -1. So in the first equation, instead of just a minus sign, you’d have (-1)^k if you’re taking a kth order partial derivative.