Exponential sum for the new year

Exponential sums can make intricate patterns. Last year I made a page that displays a different page each day, using the month, day, and year as parameters in the expression below. The images plot the partial sums of this sum.

\sum_{n=0}^N \exp\left( 2\pi i \left( \frac{n}{m} + \frac{n^2}{d} + \frac{n^3}{y} \right ) \right )

This was yesterday’s image.

Image of the day, New Year's Eve.

Today’s image is surprisingly plain if we use y = 18.

This is in part because the least common multiple of 1, 1, and 18 is 18. The image could have no more than 18 vertices. In fact, it has only 6 vertices because the summand above has period 6.

But if we use y = 2018 we get something much more complex.

The Exponential Sum of the Day page will use y = 18 this year. There will be a few simple images this year but there will also be lots of surprises.

Moment generating functions and connections to other things

This post relates moment generating functions to the Laplace transform and to exponential generating functions. It also brings in connections to the z-transform and the Fourier transform.

Thanks to Brian Borchers who suggested the subject of this post in a comment on a previous post on transforms and convolutions.

Moment generating functions

The moment generating function (MGF) of a random variable X is defined as the expected value of exp(tX). By the so-called rule of the unconscious statistician we have

M_X(t) \equiv \mathrm{E}[e^{tX}] = \int_{-\infty}^\infty e^{tx} f_X(x)\, dx

where fX is the probability density function of the random variable X. The function MX is called the moment generating function of X because it’s nth derivative, evaluated at 0, gives the nth moment of X, i.e. the expected value of Xn.

Laplace transforms

If we flip the sign on t in the integral above, we have the two-sided Laplace transform of fX. That is, the moment generating function of X at t is the two-sided Laplace transform of fX at –t. If the density function is zero for negative values, then the two-sided Laplace transform reduces to the more common (one-sided) Laplace transform.

Exponential generating functions

Since the derivatives of MX at zero are the moments of X, the power series for MX is the exponential generating function for the moments. We have

M_X(t) = m_0 + m_1t + \frac{m_2}{2}t^2 + \frac{m_3}{3!} t^3 + \cdots

where mn is the nth moment of X.

Other generating functions

This terminology needs a little explanation since we’re using “generating function” two or three different ways. The “moment generating function” is the function defined above and only appears in probability. In combinatorics, the (ordinary) generating function of a sequence is the power series whose coefficient of xn is the nth term of the sequence. The exponential generating function is similar, except that each term is divided by n!. This is called the exponential generating series because it looks like the power series for the exponential function. Indeed, the exponential function is the exponential generating function for the sequence of all 1’s.

The equation above shows that MX is the exponential generating function for mn and the ordinary generating function for mn/n!.

If a random variable Y is defined on the integers, then the (ordinary) generating function for the sequence Prob(Yn) is called, naturally enough, the probability generating function for Y.

The z-transform of a sequence, common in electrical engineering, is the (ordinary) generating function of the sequence, but with x replaced with 1/z.

Characteristic functions

The characteristic function of a random variable is a variation on the moment generating function. Rather than use the expected value of tX, it uses the expected value of itX. This means the characteristic function of a random variable is the Fourier transform of its density function.

Characteristic functions are easier to work with than moment generating functions. We haven’t talked about when moment generating functions exist, but it’s clear from the integral above that the right tail of the density has to go to zero faster than ex, which isn’t the case for fat-tailed distributions. That’s not a problem for the characteristic function because the Fourier transform exists for any density function. This is another example of how complex variables simplify problems.

Shannon wavelet

The Shannon wavelet has an interesting plot:

Shannon wavelet

Given the complexity of the plot, the function definition is surprisingly simple:

\frac{1}{\pi t} (\sin 2\pi t - \sin\pi t)

The Fourier transform is even simpler: it’s the indicator function of [-2π, -π] ∪ [π, 2π], i.e. the function that is 1 on the intervals [-2π, -π] and [π, 2π] but zero everywhere else.

The Shannon wavelet is orthogonal to integer translates of itself. This isn’t obvious in the time domain, but it’s easy to prove in the frequency domain using Parseval’s theorem.

Here’s a plot of the original wavelet and the wavelet shifted to the left by 3:

Two Shannon wavelets

And here’s a plot of the product of the two wavelets. It’s plausible that the areas above and below the x-axis cancel out each other, and indeed they do.

Product of two Shannon wavelets

Related post: Sinc and Jinc integrals

Fourier-Bessel series and Gibbs phenomena

Fourier-Bessel series are analogous to Fourier series. And like Fourier series, they converge pointwise near a discontinuity with the same kind of overshoot and undershoot known as the Gibbs phenomenon.

Fourier-Bessel series

Bessel functions come up naturally when working in polar coordinates, just as sines and cosines come up naturally when working in rectangular coordinates. You can think of Bessel functions as a sort of variation on sine waves. Or even more accurately, a variation on sinc functions, where sinc(z) = sin(z)/z. [1]

A Fourier series represents a function as a sum of sines and cosines of different frequencies. To make things a little simpler here, I’ll only consider Fourier sine series so I don’t have to repeatedly say “and cosine.”

f(z) = \sum_{n=1}^\infty c_n \sin(n \pi z)

A Fourier-Bessel function does something similar. It represents a function as a sum of rescaled versions of a particular Bessel function. We’ll use the Bessel J0 here, but you could pick some other Jν.

Fourier series scale the sine and cosine functions by π times integers, i.e. sin(πz), sin(2πz), sin(3πz), etc. Fourier-Bessel series scale by the zeros of the Bessel function: J01z),  J02z),  J03z), etc. where λn are the zeros of J0. This is analogous to scaling sin(πz) by its roots: π, 2π, 3π, etc. So a Fourier-Bessel series for a function f looks like

f(z) = \sum_{n=1}^\infty c_n J_0(\lambda_n z).

The coefficients cn for Fourier-Bessel series can be computed analogously to Fourier coefficients, but with a couple minor complications. First, the basis functions of a Fourier series are orthogonal over [0, 1] without any explicit weight, i.e. with weight 1. And second, the inner product of a basis function doesn’t depend on the frequency. In detail,

\int_0^1 \sin(m \pi z) \, \sin(n \pi z) \, dz = \frac{\delta_{mn}}{2}.

Here δmn equals 1 if m = n and 0 otherwise.

Fourier-Bessel basis functions are orthogonal with a weight z, and the inner product of a basis function with itself depends on the frequency. In detail

\int_0^1 J_0(\lambda_m z) \, J_0(\lambda_n z) \, z\, dz = \frac{\delta_{mn}}{2} J_1(\lambda_n).

So whereas the coefficients for a Fourier sine series are given by

c_n = 2 \int_0^1 f(z)\, \sin(n\pi z) \,dz

the coefficients for a Fourier-Bessel series are given by

c_n = \frac{ 2\int_0^1 f(z)\, J_0(\lambda_n z) \, dz}{ J_1(\lambda_n)^2 }.

Gibbs phenomenon

Fourier and Fourier-Bessel series are examples of orthogonal series, and so by construction they converge in the norm given by their associated inner product. That means that if SN is the Nth partial sum of a Fourier series

\lim_{N\to\infty} \int_0^1 \left( f(x) - S_N(x) \right)^2 \, dz = 0
and the analogous statement for a Fourier-Bessel series is

\lim_{N\to\infty} \int_0^1 \left( f(x) - S_N(x) \right)^2 \, z\, dz = 0.

In short, the series converge in a (weighted) L² norm. But how do the series converge pointwise? A lot of harmonic analysis is devoted to answering this question, what conditions on the function f guarantee what kind of behavior of the partial sums of the series expansion.

If we look at the Fourier series for a step function, the partial sums converge pointwise everywhere except at the step discontinuity. But the way they converge is interesting. You get a sort of “bat ear” phenomena where the partial sums overshoot the step function at the discontinuity. This is called the Gibbs phenomenon after Josiah Willard Gibbs who observed the effect in 1899. (Henry Wilbraham observed the same thing earlier, but Gibbs didn’t know that.)

The Gibbs phenomena is well known for Fourier series. It’s not as well known that the same phenomenon occurs for other orthogonal series, such as Fourier-Bessel series. I’ll give an example of Gibbs phenomenon for Fourier-Bessel series taken from [2] and give Python code to visualize it.

We take our function f(z) to be 1 on [0, 1/2] and 0 on (1/2, 1]. It works out that

c_n = \frac{1}{\lambda_n} \frac{J_1(\lambda_n / 2)}{ J_1(\lambda_n)^2 }.

Python code and plot

Here’s the plot with 100 terms. Notice how the partial sums overshoot the mark to the left of 1/2 and undershoot to the right of 1/2.

Plot showing Gibbs phenomena for Fourier-Bessel series

Here’s the same plot with 1,000 terms.

Gibbs phenomena for 1000 terms of Fourier-Bessel series

Here’s the Python code that produced the plot.

    import matplotlib.pyplot as plt
    from scipy.special import j0, j1, jn_zeros
    from scipy import linspace

    N = 100 # number of terms in series

    roots = jn_zeros(0, N)
    coeff = [j1(r/2) / (r*j1(r)**2) for r in roots]
    z = linspace(0, 1, 200)

    def partial_sum(z):
        return sum([coeff[i]*j0(roots[i]*z) for i in range(N)])

    plt.plot(z, partial_sum(z))
    plt.ylabel("{}th partial sum".format(N))


[1] To be precise, as z goes to infinity

J_nu(z) sim sqrt{frac{2}{pi z}} cosleft(z - frac{1}{2} nu pi - frac{pi}{4} right)

and so the Bessel functions are asymptotically proportional to sin(z – φ)/√z for some phase shift φ.

[2] The Gibbs’ phenomenon for Fourier-Bessel Series. Temple H. Fay and P. Kendrik Kloppers. International Journal of Mathematical Education in Science and Technology. 2003. vol. 323, no. 2, 199-217.

Animated exponential sum

I’m experimenting with making animated versions of the kinds of images I wrote about in my previous post. Here’s an animated version of the exponential sum of the day for 12/4/17.

Why that date? I wanted to start with something with a fairly small period, and that one looked interesting. I’ll have to do something different for the images that have a much longer period.

Update: Now all the images on the exponential sum site have the option of showing animation. Here’s an interesting example.


Exponential sums make pretty pictures

Exponential sums are a specialized area of math that studies series with terms that are complex exponentials. Estimating such sums is delicate work. General estimation techniques are ham-fisted compared to what is possible with techniques specialized for these particular sums. Exponential sums are closely related to Fourier analysis and number theory.

Exponential sums also make pretty pictures. If you make a scatter plot of the sequence of partial sums you can get surprising shapes. This is related to the trickiness of estimating such sums: the partial sums don’t simply monotonically converge to a limit.

The exponential sum page at UNSW suggests playing around with polynomials with dates in the denominator. If we take that suggestion with today’s date, we get the curve below:

f(n) = n/10 + n**2/7 + n**3/17

These are the partial sums of exp(2πi f(n)) where f(n) = n/10 + n²/7 + n³/17.

[Update: You can get an image each day for the current day’s date here.]

Here’s the code that produced the image.

    import matplotlib.pyplot as plt
    from numpy import array, pi, exp, log

    N = 12000
    def f(n):
        return n/10 + n**2/7 + n**3/17 

    z = array( [exp( 2*pi*1j*f(n) ) for n in range(3, N+3)] )
    z = z.cumsum()

    plt.plot(z.real, z.imag, color='#333399')

If we use logarithms, we get interesting spirals. Here f(n) = log(n)4.1.

f(n) = log(n)**4.1

And we can mix polynomials with logarithms. Here f(n) = log(n) + n²/100.

f(n) = log(n) + n**2/100

In this last image, I reduced the number of points from 12,000 to 1200. With a large number of points the spiral nature dominates and you don’t see the swirls along the spiral as clearly.


Defining the Fourier transform on LCA groups

My previous post said that all the familiar variations on Fourier transforms—Fourier series analysis and synthesis, Fourier transforms on the real line, discrete Fourier transforms, etc.—can be unified into a single theory. They’re all instances of a Fourier transform on a locally compact Abelian (LCA) group. The difference between them is the underlying group.

Given an LCA group G, the Fourier transform takes a function on G and returns a function on the dual group of G. We said this much last time, but we didn’t define the dual group; we just stated examples. We also didn’t say just how you define a Fourier transform in this general setting.

Characters and dual groups

Before we can define a dual group, we have to define group homomorphisms. A homomorphism between two groups G and H is a function h between the groups that preserves the group structure. Suppose the group operation is denoted by addition on G and by multiplication on H (as it will be in our application), saying h preserves the group structure means

h(x + y) = h(x) h(y)

for all x and y in G.

Next, let T be the unit circle, i.e. complex numbers with absolute value 1. T is a group with respect to multiplication. (Why T for circle? This is a common notation, anticipating generalization to toruses in all dimensions. A circle is a one-dimensional torus.)

Now a character on G is a continuous homomorphism from G to T. The set of all characters on G is the dual group of G. Call this group Γ. If G is an LCA group, then so is Γ.


The classical Fourier transform is defined by an integral. To define the Fourier transform on a group we have to have a way to do integration on that group. And there’s a theorem that says we can always do that. For every LCA group, there exists a Haar measure μ, and this measure is nice enough to develop our theory. This measure is essentially unique: Any two Haar measures on the same LCA group must be proportional to each other. In other words, the measure is unique up to multiplying by a constant.

On a discrete group—for our purposes, think of the integers and the integers mod m—Haar measure is just counting; the measure of a set is the number of things in the set. And integration with respect to this measure is summation.

Fourier transform defined

Let f be a function in L¹(G), i.e. an absolutely integrable function on G. Then the Fourier transform of f is a function on Γ defined by

\hat{f}(\gamma) = \int_G f(x)\, \gamma(-x) \, d\mu

What does this have to do with the classical Fourier transform? The classical Fourier transform takes a function of time and returns a function of frequency. The correspondence between the classical Fourier transform and the abstract Fourier transform is to associate the frequency ω with the character that takes x to the value exp(iωx).

There are multiple slightly different conventions for the classical Fourier transform cataloged here. These correspond to different constant multiples in the choice of measure on G and Γ, i.e. whether to divide by or multiply by √(2π), and in the correspondence between frequencies and characters, whether ω corresponds to exp(±iωx) or exp(±2πiωx).

Unified theory of Fourier transforms

You can take a periodic function and analyze it into its Fourier coefficients, or use the Fourier coefficients in a sum to synthesize a periodic function. You can take the Fourier transform of a function defined on the whole real line and get another such function. And you can compute the discrete Fourier transform via the FFT algorithm.

Is there a general theory that unifies all these related but different things? Why yes, yes there is.


Everything in the opening paragraph is simply a Fourier transform, each in a different context. And the contexts correspond to groups. Specifically, locally compact Abelian groups.

Some of these groups are easier to see than others. Clearly the real numbers with addition form a group: the sum of two real numbers is a real number, etc. But where are the groups in the other contexts?

You can think of a periodic function as a function on a circle; the function values have to agree at both ends of an interval, so you might as well think of those two points as the same point, i.e. join them to make a circle. Shifting along an interval, wrapping around if necessary, corresponds to a rotation of the circle, and rotations form a group. So analyzing a periodic function in to a set of Fourier coefficients is a Fourier transform on the circle.

You can think of a set of Fourier coefficients as a function on the integers, mapping n to the nth coefficient. Synthesizing a set of Fourier coefficients into a periodic function is a Fourier transform on the group of integers.

What about a discrete Fourier transform (DFT)? If you have a function sampled at m points, you could think of those points as the group of integers mod m. Your sampled points constitute a function on the integers mod m, and the DFT is a Fourier transform on that group.

Note that the DFT is a Fourier transform in its own right. It’s not an approximation per se, though it’s nearly always used as part of an approximation process. You can start with a continuous function, approximate it by a finite set of samples, compute the DFT of these samples, and the result will give you an approximation to the Fourier transform of the original continuous function.

What about functions of several variables? These are functions on groups too. A function of two real variables, for example, is a function on R², which is a group with (vector) addition.

Dual groups

A Fourier transform takes a function defined on a group and returns a function defined on the dual of that group. I go into exactly what a dual group is in my next post, but for now, just note that the Fourier transform takes a function defined on one group and returns a function defined on another group.

The dual of the circle is the integers, and vice versa. That’s why the Fourier transform of a function on the circle is an infinite set of Fourier coefficients, which we think of as a function on the integers. The Fourier transform of the function on the integers, i.e. a set of Fourier coefficients, is a function on the circle, i.e. a periodic function.

The dual group of the real numbers is the real numbers again. That’s why the Fourier transform of a function on the real line is another function on the real line.

The integers mod m is also its own dual group. So the DFT takes a set of m numbers and returns a set of m numbers.

Locally compact Abelian (LCA) groups

What do locally compact and Abelian mean? And why do we make these assumptions?

Let’s start with Abelian. This just means that the group operation is commutative. When we’re adding real numbers, or composing rotations of a circle, these operations are commutative.

Locally compactness is a more technical requirement. The circle is compact, and so are the integers mod m. But if we restricted out attention to compact groups, that would leave out the integers and the real numbers. These spaces are not compact, but they’re locally compact, and that’s enough for the theory to go through.

It turns out that LCA groups are a sort of theoretical sweet spot. Assuming groups are LCA is general enough to include the examples we care about the most, but it’s not so general that the theory becomes harder and the results less powerful.

More connections

This post relates Fourier series (analysis and synthesis) to Fourier transforms (on the real line) by saying they’re both special cases of Fourier analysis on LCA groups. There are a couple other ways to connect Fourier series and Fourier transforms.

You can take the Fourier transform (not Fourier series) of a periodic function two ways: by restricting it to one period and defining it to be zero everywhere else, or by letting it repeat forever across the real line and taking the Fourier transform in the sense of generalized functions. You can read more about these two approaches in this post.

Fourier transform of a function on a graph

What is a Fourier transform at its core? An expansion of function in terms of eigenfunctions of the Laplacian. For a function on the real line, the Laplacian is simply the second derivative. The functions mapped to multiples of themselves by taking second derivatives are sines and cosines of various frequencies. A Fourier series is a change of basis, using as basis vectors those functions who behave the simplest under the second derivative.

The Fourier transform of a function on a graph is also a change of basis, expanding a discrete function in terms of eigenvalues of the Laplacian, in this case the graph Laplacian.

The Fourier transform of a function f, evaluated at a frequency ω, is the inner product of f with the eigenfunction exp(2πiωt).

\hat{f}(\omega) = \langle f, \exp(2\pi i \omega t) \rangle = \int_{-\infty}^\infty f(t) \exp(-2\pi i \omega t) \, dx

The inner product of two complex functions f and g is the integral of the product of f and the conjugate of g. Conjugation is why exp(2πiωt) became exp(-2πiωt).

The Fourier transform of a discrete function f on a graph, evaluated at an eigenvalue λi, is the inner product of f (i.e. the vector of values of f at each node) with the eigenvector associated with λi.

\hat{f}(\lambda_i) = \langle f, v^*_i \rangle = \sum_{j=1}^N f(j) v_i^*(j)

Here the inner product is a discrete sum rather than an integral. As before, we take the complex conjugate of the second item in the product.

The eigenvectors associated with the smallest eigenvalues of the graph Laplacian are analogous to low frequency sines and cosines. The eigenvalue components corresponding to nearly vertices in a graph should be close together. This analogy explains why spectral coordinates work so well.

Click to learn more about consulting for complex networks



Discrete Laplace transform

The relationship between the discrete Laplace transform and discrete Fourier transform is not quite the same as that between their continuous counterparts.

Continuous Fourier and Laplace transforms

The continuous versions of the Fourier and Laplace transforms are given as follows.

Fourier transform:

{\cal F}(f)(\omega) = \int_{-\infty}^\infty \exp(-i\omega x) f(x)\, dx

Laplace transform:
{\cal L}(f)(s) = \int_0^\infty \exp(-s x) f(x)\, dx

The Fourier transform is defined several ways, and I actually prefer the convention that puts a factor of 2π in the exponential, but the convention above makes the analogy with Laplace transform simpler. There are two differences between the Fourier and Laplace transforms. The Laplace transform integrates over only half the real line, compared to the entire real line for Fourier. But a variation on the Laplace transform, the Bilateral Laplace transform integrates over the entire real line. The Bilateral Laplace transform at s is simply the Fourier transform at xis. And of course the same is true for the (one-sided) Laplace transform if the function f is only non-zero for positive values.

I’ve encountered the Fourier transform more in application, and the Laplace transform more in teaching. This is not to say the Laplace transform isn’t used in practice; it certainly is used in applications. But the two transforms serve similar purposes, and the Laplace transform is easier to teach. Because the factor exp(-sx) decays rapidly, the integral defining the Laplace transform converges for functions where the integral defining the Fourier transform would not. Such functions may still have Fourier transforms, but the transforms require distribution theory whereas the Laplace transforms can be computed using basic calculus.

Discrete Fourier and Laplace Transforms

There’s more difference between the discrete versions of the Fourier and Laplace transforms than between the continuous versions.

The discrete Fourier transform (DFT) approximates the integral defining the (continuous) Fourier transform with a finite sum. It discretizes the integral and truncates its domain. The discrete Laplace transform is an infinite sum. It discretizes the integral defining the Laplace transform, but it does not truncate the domain. Given a step size η > 0, the discrete Laplace transform of f is

{\cal L}_\eta(f)(s) = \eta \sum_{n=0}^\infty \exp(-sn\eta) f(n\eta)

The discrete Laplace transform isn’t “as discrete” as the discrete Fourier transform. The latter takes a finite sequence and returns a finite sequence. The former evaluates a function at an infinite number of points and produces a continuous function.

The discrete Laplace transform is used in applications such as signal processing, as well as in the theory of analytic functions.

Connection with the z-transform and generating functions

If η = 1 and z = exp(-s), the discrete Laplace transform becomes the z-transform of the values of f at non-negative integers. And if we replace z with 1/z, or equivalently set z = exp(s) instead of z = exp(-s), we get the generating function of the values of f at non-negative integers.

z-transforms are common in digital signal processing, while generating functions are common in combinatorics. They are essentially the same thing.

Click to learn more about consulting help with signal processing


Visualizing the DFT matrix

The discrete Fourier transform (DFT) of length N multiplies a vector by a matrix whose (j, k) entry is ωjk where ω = exp(-2πi/N), with j and k running from 0 to – 1. Each element of the matrix is a rotation, so if N = 12, we can represent each element by an hour on a clock. The angle between the hour hand and minute hand corresponds to the phase of the matrix entry. We could also view each element as a color around a color wheel. The image below does both.

The matrix representing the inverse of the DFT is the conjugate of the DFT matrix (divided by Nf, but we’re only looking at phase here, so we can ignore this rescaling.) The image below displays the DFT matrix on the left and it’s inverse on the right.

Taking the conjugate amounts to making all the clocks run backward.

The DFT is often called the FFT. Strictly speaking, the FFT is an algorithm for computing the DFT. Nobody computes a DFT by multiplying by the DFT matrix, because the FFT is faster. The DFT matrix has a lot of special structure, which the FFT takes advantage of to compute the product faster than using ordinary matrix multiplication.

By the way, there are Unicode characters for clock times on the hour, U+1F550 through U+1F55B. I created the image above by writing a script that put the right characters in a table. The colors have HSL values where H is proportional to the angle and S = L =0.8.

Click to learn more about consulting help with signal processing


Relating Fourier series and Fourier transforms

Fourier series and Fourier transforms may seem more different than they are because of the way they’re typically taught. Fourier series are presented more as a representation of a function, not a transformation. Here’s a function on an interval. We can write it as a sum of sines and cosines, just as we can write a function as a sum of powers in a power series. There’s not much emphasis on the coefficients per se. They appear inside a sum, but don’t get much attention on their own.

Fourier transforms, on the other hand, are presented as genuine transforms. Here’s a function, and here’s its transform, another function. One’s a function of time, the other a function of frequency. Or maybe both are presented as representations of the same function in two different domains, the time domain and the frequency domain.

You could think of the Fourier series as a kind of transform, taking a periodic function and mapping it to an infinite sequence, the Fourier series coefficients. And you could think of the Fourier transform as being a kind of continuous set of coefficients for representing a function, if you interpret the inversion theorem the right way.

Here are a couple connections between Fourier series and Fourier transforms. Start with a function f on an interval and compute its Fourier series. The Fourier series is periodic, so we could think of f as periodic, even though we only care about f on the interval. Instead, let’s think of extending f to be 0 everywhere outside the interval. Now we take the Fourier transform of f. The Fourier series coefficients are the Fourier transform of f evaluated at integer arguments.

Now let’s go back to thinking of f as a periodic function. What would it’s Fourier transform look like? In classical analysis, you can’t do that. Periodic functions have Fourier series but they don’t have Fourier transforms because the integral defining the latter does not converge. But by the magic of tempered distributions, we can indeed take the Fourier transform of a periodic function. The result is a weighted sum of delta distributions at each integer, and the coefficient of the delta distribution at n is the nth Fourier series coefficient.

The proof of the claim in the previous paragraph is simple once you understand the sha function Ш. Start with a function f defined on a unit interval and extended to be zero outside that interval. Convolving f with Ш make a periodic function f*Ш extending f. The Fourier transform of a convolution is the product of the convolutions. The Fourier transform of f is simply its classical Fourier transform F. The Ш function is it’s own Fourier transform, so the transform of f*Ш is FШ. Multiplying a function by Ш samples that function, and the samples of F are the Fourier coefficients of the Fourier series of f*Ш, the periodic extension of f.

Click to learn more about consulting help with signal processing


An example of coming full circle

Here’s an interesting line from Brad Osgood:

Isn’t it a little embarrassing that multibillion dollar industries seem to depend on integrals that don’t converge?

In context, he’s not saying that huge companies are blithely using bad math. Some are, but that’s not what he’s getting at here. His discussion is an example of coming full circle, where experts and novices come to the same conclusion for different reasons.

The divergent integrals Osgood refers to are Fourier transforms of certain functions. A beginner might not notice that said integrals don’t converge. An expert knows that the calculations are justified by a more sophisticated theory. Someone in-between would have objections. Experts can be casual, not because they’re ignorant of technical difficulties but because they’ve mastered these difficulties. [1]

The expert in Fourier analysis has all the technicalities in the back of his or her mind. Often these don’t need to be explicitly exercised. You can blithely go about using formal calculations that aren’t justified by the classical theory.

But the expert doesn’t entirely come full circle, not in the sense of walking in circles in the woods. It’s more like winding around a parking garage, coming back to the same (x, y) location but one level up. Sometimes the expert needs to pull out the technical machinery to avoid an error the beginner could fall into. The theory of tempered distributions, for example, doesn’t justify every calculation a novice might try.

Click to learn more about consulting help with signal processing


[1] In a nutshell, here’s the theory that justifies apparently sloppy calculations with Fourier transforms. The key is to view the function you want to transform not as a function on the real line but as a tempered distribution, a linear functional on the space of smooth, rapidly decaying test functions. A function acts on a test function by forming their product and integrating. Then use Parseval’s theorem from the classical theory as the definition in this new context, moving the transform operation from the original function to the test function. Simple, right?

The Dirac comb or Sha function

shaThe sha function, also known as the Dirac comb, is denoted with the Cyrillic letter sha (Ш, U+0428). This letter was chosen because it looks like how people visualize the function, a long series of vertical spikes. The function is called the Dirac comb for the same reason. This function is very important in Fourier analysis because it relates Fourier series and Fourier transforms. It relates sampling and periodization.  It’s its own Fourier transform, and with a few qualifiers discussed later, the only such function.

The Ш function, really the Ш distribution, is defined as

sha(x) = \sum_{n=-\infty}^\infty \delta(x-n)

Here δ(xn) is the Dirac delta distribution centered at n. The action of δ(xn) on a test function is to evaluate that function at n. You can envision Ш as an infinite sequence of spikes, one at each integer. The action of Ш on a test function is to add up its values at every integer.


The product of Ш with a function f is a new distribution whose action on a test function φ is the sum of f φ over all integers. Or you could think of the distribution as a sort of clothesline on which to hang the sampled values of f, much the way a generating function works.


Next let’s look at a function f that lives on [0, 1], i.e. is zero everywhere outside the unit interval. The convolution of f with δ(xn) is f(xn), i.e. a copy of f shifted over to live on the interval [n, n+1]. So by taking the convolution with Ш, we create copies of f all over the real line. We’ve made f into a periodic function. So instead of saying “the function f extended to create a periodic function” you can simply say f*Ш.

Fourier transform

Now let’s think about the Fourier transform of Ш. The Fourier transform of δ(x) is 1, i.e. the function equal to 1 everywhere [1].  (The more concentrated a function is, the more spread out its Fourier transform. So if you have an infinitely concentrated function δ, its Fourier transform is perfectly flat, 1. You can calculate the transform rigorously, this this is the intuition.) If you shift a function by n, you rotate its Fourier transform by exp(-2πinω). So we can compute the transform of Ш:

Fourier transform of sha = \sum_{n=-\infty}^\infty \exp(-2\pi i n \omega)

This equation only makes sense in terms of distributions. The right hand side does not converge in the classical sense; the individual terms don’t even go to zero, since each term has magnitude 1. So what kind of distribution is this thing on the right side? It is in fact the Ш function again, though this is not obvious.

To see that the exponential sum is actually the Ш function, i.e. that Ш is its own Fourier transform, we need to back up a little bit and define Fourier transform of a distribution. As usual with distributions, we take a classical theorem and turn it into a definition in a broader context.

For absolutely integrable functions, we have

\int_{-\infty}^ \infty \hat{f}(x) \, \varphi(x) \, dx = \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

where the hat on top of a function indicates its Fourier transform. Inspired by the theorem above, we define the Fourier transform of a distribution f to be the functional whose action on a test function φ is given below.

 \hat{f} : \varphi \mapsto \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

As we noted in a previous post, the integral above can be taken literally if f is a distribution associated with an ordinary function, but in general it means the application of the linear functional to the test function.

As a distribution, exp(-2πinω) acts on a test function φ by integrating against it. From the definition of a (classical) Fourier transform, this gives the Fourier transform of φ evaluated at n. So the Fourier transform of Ш acts on φ by summing the values of φ’s Fourier transform over all integers. By the Poisson summation formula, this is the same as summing the values of φ itself over all integers. Which is the same as applying Ш. So the Fourier transform of Ш has the same effect on test functions as Ш. In other words, Ш is its own Fourier transform.


We haven’t been explicit about where our test functions come from. We require that xn φ(x) goes to zero as x goes to ±∞ for any positive integer n. These are called functions of rapid decay. And the distributions we define as linear functionals on such test functions are called tempered distributions.

The Ш distribution is essentially unique. Any tempered distribution with period 1 that equals its own Fourier transform must be a multiple of Ш.

Click to learn more about consulting help with signal processing


[1] All Fourier transform calculations here use the convention I call (-1, τ, 1) in these notes on various definitions. This may be the most common definition, though there are several minor variations in common use.