How can we extend the idea of derivative so that more functions are differentiable? Why would we want to do so? How can we make sense of a delta “function” that isn’t really a function? We’ll answer these questions in this post.

Suppose *f*(*x*) is a differentiable function of one variable. Suppose φ(*x*) is an infinitely differentiable function that is zero outside of some finite interval. Functions like φ are called test functions. Integration by parts says that

where the integrals are over the entire real line. (The fact that φ is zero outside a finite interval mean the “*uv*” term from integration by parts is zero.) Now suppose *f*(*x*) is not differentiable. Then the left side of the equation above does not make sense, but the right side does. We use the right hand side to develop the definition of the generalized derivative.

We think of the function *f* not as a function of real numbers, but as a **distribution** that operates on tests functions. That is, we associate with *f* the linear functional on the space of tests functions that maps φ to ∫ *f*(*x*) φ(*x*) *dx*. Then the distributional derivative of this functional is another linear functional, the distribution that maps test functions φ to −∫ *f*(*x*) φ'(*x*) *dx*. In summary,

We can use this procedure to define as many derivatives of *f* as we’d like, as long as *f* is integrable. So *f* could be some horribly ill-behaved function, differentiable nowhere in the classical sense, and we could define its 37th derivative by repeatedly applying this idea. (Distributions are also called “generalized functions.” Distributional derivatives are also called “generalized derivatives” or “weak derivatives.”)

By the way, this same procedure is used to make sense of the delta function. The delta function isn’t a function at all. It is the distribution δ that evaluates test functions at zero, i.e. δ maps φ to φ(0). (The delta function often nonsensically defined to be a function that is infinite at zero and zero everywhere else.)

Why would we want to be able to differentiate more functions? When we can differentiate more functions, we can look in a bigger space for solutions to differential equations. Sometimes this allows us to find solutions to equations that do not have classical solutions. Other times this allows us to find classical solutions more easily. We may first prove that a generalized solution exists, and then prove that the generalized solution is in fact a classical solution.

Here’s an analogy that explains how generalized solutions might lead to classical solutions. Suppose you want to find the minimum value of a function for integer arguments. You might first look for a real number that minimizes the function. This lets you, for example, use derivatives in your search for the minimum. If the real minimum you find happens to also be an integer, then you’ve solved your original problem. Distributions and generalized derivatives work much the same way. You might find a classical solution by first looking in a larger space of possible solutions, a space that allows you to use more powerful techniques in your search for a solution.

Great post John!

First heard of weak formulations when taught about the Finite Element Method, buth didn’t dive into them until a couple of years ago, when I started studying Jean Leray’s work on Navier-Stokes equations.

I seem to recall that it was actually Leray who first came up with this whole “weak solution” and “generalized derivative” thing. Do you have any historical insight?

Jamie,

I believe it was Sergei Sobolev, the eminent Russian mathematician, who developed this field inititally. I remember doing this as part of my dissertation for my bachelors degree. Fascinating field developed by amazing mathematicians.

Jaime, I’m not certain about the history of distributions, but I believe there were many historical precedents before people like Sobolev formalized the theory. For example, mathematicians studied non-differentiable functions by looking at limits of a sequence of differentiable functions and I imagine that goes back decades before formal distribution theory.

You can also use Fourier transforms to extend the idea of derivatives (see this post) and I imagine that also predates distribution theory by decades.

As for Navier-Stokes, you might find these notes useful.

I got stuck right here:

The fact that φ is zero outside a finite interval mean the “uv” term from integration by parts is zero.

Why wouldn’t the uv term be determined by the parts of f and phi on the interval where phi is non-zero?

Sue, here’s some more detail on the integration by parts. Let u = φ and v = f’ dx. ∫ f’ φ = f φ – ∫ f φ’. When these terms are evaluated at -∞ and ∞, the f φ term drops out since φ is zero at -∞ and ∞.

Ah. So by sticking a shift by t into φ, you can define f(t), f'(t), f”(t) etc. Slick. But you’d have to say that the equivalence is w.r.t. a particular test function, correct?

John: Distributions are functionals on the entire space of test functions. So two distributions are equal if the have the same effect on

alltest functions. In practice, the test functions fade into the background. You never need to look at a particular test function.Right, this is the basis for finite elements: shape functions that apply only within the element.

The beauty of them is that weighted residual methods like this still apply even for those situations where there isn’t a calculus of variations functional at hand. It made it possible to extend FEA beyond linear conduction heat transfer and small strain elasticity, both of which can be tackled using calculus of variations, into general continuum mechanics.

Typically fantastic stuff, John.

It is a great post, but it would be better with an example. For example, the derivative of the step function. ;-)

Can somebody help me find the solution:

We say g is integrable such that INTEGRAL { g(x)d(x) } = 1 .

Determine (1/e^2) .( g'(x/e) ) in distribution(C-infinity) as ‘e’ goes to 0?

What is a good way to work with generalized functions when programming? What primitive pieces are necessary to construct them?

How do we extent this to multivariate functions?

A multivariate distribution is a linear functional on multivariate test functions. These test functions are infinitely differentiable functions with compact support.

The definition of differentiation is based on what you get from integration by parts. Every variable you take a partial derivative with respect to multiplies the result by -1. So in the first equation, instead of just a minus sign, you’d have (-1)^k if you’re taking a kth order partial derivative.

Very good post. Being mathematician myself and having lived with generalized functions for, maybe forty years, they are pare part of me today. I live with them, and they are very real for me. As you know physicists discovered them first. Oh, there is Oliver Heaviside, and Dirac himself, These two were geat. But it ws necessary to put things in order, and that was Sobolev´s great acomplishment. Then Laurent Schwartz made all so clear.

I generalized limit for arbitrary (discontinuous) function. It gives an obvious way to define derivative of EVERY function (for example every function from a vector space to a topological vector space, which includes all normed and Banach spaces, for example).

It is done using my “algebraic general topology”, a wide generalization of general topology.

Start reading here: http://www.mathematics21.org/limit-of-discontinuous-function.html

Don’t forget to nominate me for Fields Medal and Abel Prize ;-)

Also, participate in my research: I don’t yet know how to equate the left and right part of a differential equation with my kind of derivative. Isn’t this the most important problem in modern mathematics to define and understand nondifferentiable solutions of differential equations?

I need to define derivatives for a spike train, i.e. I have a spike train which contains deltas at certain points, this spike train is the output of a thresholding func such that whenever input is >0, output=1, else 0

I need to calculate dout/din, can you please help/guide me how to approximate it? maybe with some theoretical insights too?

Wouldn’t also be clarifying to define the weak derivative of a (non-differentiable) f as any g (turns out they are all equal almost everywhere) such that $\forall \phi \in C^{\infty}, \int g \phi = – \int f \phi’$

This means also that $g = f’$ as distributions.

I thought you were going to bring up fractional differential operators.

Not this time, but I have written a few posts on how to define fractional derivatives:

In terms of Fourier transforms

In terms of binomial series

In terms of fractional integrals

# “The delta function isn’t a function at all.”

Paul Dirac called it a function, and that is good enough for me.