How do you take the Fourier transform of a function when the integral that would define its transform doesn’t converge? The answer is similar to how you can differentiate a non-differentiable function: you take a theorem from ordinary functions and make it a definition for generalized functions. I’ll outline how this works below.

Generalized functions are linear functionals on smooth (ordinary) functions. Given an ordinary function *f*, you can create a generalized function that maps a smooth test function φ to the integral of *f*φ.

There are other kinds of generalized functions. For example, the Dirac delta “function” is really the generalized function δ that maps φ to φ(0). This is the formalism behind the hand-wavy nonsense about a function infinitely concentrated at 0 and integrating to 1. “Integrating” the product δφ is really applying the linear functional δ to φ.

Now for absolutely integrable functions *f* and *g*, we have

In words, the integral of the Fourier transform of *f* times *g* equals the integral of *f* times the Fourier transform of *g*. This is the theorem we use as motivation for our definition.

Now suppose *f* is a function that doesn’t have a classical Fourier transform. We make *f* into a generalized function and define its Fourier transform as the linear function that maps a test function φ to the integral of *f* times the Fourier transform of φ.

More generally, the Fourier transform of a generalized function *f* is is the linear function that maps a test function φ to the action of *f* on the Fourier transform of φ.

This allows us to say, for example, that the Fourier transform of the constant function *f*(*x*) = 1 is 2πδ, an exercise left for the reader.

The Heisenberg uncertainty principle for ordinary functions says that the flatter a function is, the more concentrated its Fourier transform and vice versa. Generalized Fourier transforms take this to an extreme. The Fourier transform of the flattest functions, i.e. constant functions, are multiples of the most concentrated function, the delta (generalized) function.

By far the best textbook I’ve ever found that covers this material is Richards and Youn, “The Theory Of Distributions: a Non-technical Introduction.”

I’m curious why you’d declare dirac deltas as hand-wavy nonsense. Once you get into convolving delta trains and aliasing, it’s very useful and pretty intuitive. Handwavy, sure. But nonsense? Linear functionals provide zero added value in using e.g. z- or fourier transforms to model feedback systems. Having an intuitive understanding of convolution with a delta however does.

The idea of a function that is zero everywhere except at the origin, is infinite at the origin, and integrates to 1 is nonsense. No such function could possibly exist. But I’m not saying that the Dirac delta is nonsense. It has a perfectly rigorous definition, and it is useful in several areas. However, without the formal definition in mind, it’s easy to make errors when manipulating Dirac deltas.

This is a very interesting topic, but probably inaccessible to readers without a background in statistics or pure mathematics. The nonsense definition of the Dirac function is used because explanations like the above do not appeal to intuition. E.g., how would one present the above concepts to a student in engineering?

I’d give engineers the hand wavy definition followed by correct definition. The formal definition doesn’t take that long. This blog post is 339 words, most of which are devoted to the Fourier transform, not just the Dirac delta.

The hand wavy definition is a dead end. For example, how do you take the derivative of δ?

Now I did leave out some details. What are the exact requirements to be a test function? I didn’t say, and might not go over that with engineers.

Stefan — one way of intuitively introducing the Dirac delta is to first work with the Kronecker delta. That is, start in the discrete domain, and show how convolving f with a delta function “samples” f, where “convolving” is just a discrete summation that you can do by hand and totally understand. Then extend from the discrete to the continuous (by letting the sample spacing get really small, etc.)

John — I realize this is off-topic, sorry.