Relating Fourier series and Fourier transforms

Fourier series and Fourier transforms may seem more different than they are because of the way they’re typically taught. Fourier series are presented more as a representation of a function, not a transformation. Here’s a function on an interval. We can write it as a sum of sines and cosines, just as we can write a function as a sum of powers in a power series. There’s not much emphasis on the coefficients per se. They appear inside a sum, but don’t get much attention on their own.

Fourier transforms, on the other hand, are presented as genuine transforms. Here’s a function, and here’s it’s transform, another function. One’s a function of time, the other a function of frequency. Or maybe both are presented as representations of the same function in two different domains, the time domain and the frequency domain.

You could think of the Fourier series as a kind of transform, taking a periodic function and mapping it to an infinite sequence, the Fourier series coefficients. And you could think of the Fourier transform as being a kind of continuous set of coefficients for representing a function, if you interpret the inversion theorem the right way.

Here are a couple connections between Fourier series and Fourier transforms. Start with a function f on an interval and compute its Fourier series. The Fourier series is periodic, so we could think of f as periodic, even though we only care about f on the interval. Instead, let’s think of extending f to be 0 everywhere outside the interval. Now we take the Fourier transform of f. The Fourier series coefficients are the Fourier transform of f evaluated at integer arguments.

Now let’s go back to thinking of f as a periodic function. What would it’s Fourier transform look like? In classical analysis, you can’t do that. Periodic functions have Fourier series but they don’t have Fourier transforms because the integral defining the latter does not converge. But by the magic of tempered distributions, we can indeed take the Fourier transform of a periodic function. The result is a weighted sum of delta distributions at each integer, and the coefficient of the delta distribution at n is the nth Fourier series coefficient.

The proof of the claim in the previous paragraph is simple once you understand the sha function Ш. Start with a function f defined on a unit interval and extended to be zero outside that interval. Convolving f with Ш make a periodic function f*Ш extending f. The Fourier transform of a convolution is the product of the convolutions. The Fourier transform of f is simply its classical Fourier transform F. The Ш function is it’s own Fourier transform, so the transform of f*Ш is FШ. Multiplying a function by Ш samples that function, and the samples of F are the Fourier coefficients of the Fourier series of f*Ш, the periodic extension of f.

Related: DSP and time series analysis

An example of coming full circle

Here’s an interesting line from Brad Osgood:

Isn’t it a little embarrassing that multibillion dollar industries seem to depend on integrals that don’t converge?

In context, he’s not saying that huge companies are blithely using bad math. Some are, but that’s not what he’s getting at here. His discussion is an example of coming full circle, where experts and novices come to the same conclusion for different reasons.

The divergent integrals Osgood refers to are Fourier transforms of certain functions. A beginner might not notice that said integrals don’t converge. An expert knows that the calculations are justified by a more sophisticated theory. Someone in-between would have objections. Experts can be casual, not because they’re ignorant of technical difficulties but because they’ve mastered these difficulties. [1]

The expert in Fourier analysis has all the technicalities in the back of his or her mind. Often these don’t need to be explicitly exercised. You can blithely go about using formal calculations that aren’t justified by the classical theory.

But the expert doesn’t entirely come full circle, not in the sense of walking in circles in the woods. It’s more like winding around a parking garage, coming back to the same (x, y) location but one level up. Sometimes the expert needs to pull out the technical machinery to avoid an error the beginner could fall into. The theory of tempered distributions, for example, doesn’t justify every calculation a novice might try.

* * *

[1] In a nutshell, here’s the theory that justifies apparently sloppy calculations with Fourier transforms. The key is to view the function you want to transform not as a function on the real line but as a tempered distribution, a linear functional on the space of smooth, rapidly decaying test functions. A function acts on a test function by forming their product and integrating. Then use Parseval’s theorem from the classical theory as the definition in this new context, moving the transform operation from the original function to the test function. Simple, right?

DSP and time series analysis

Finding 2016 in pi

2016 appears in π starting at the 7173rd decimal place:

You can confirm this with Mathematica or Wolfram Alpha:

        Mod[ Floor[10^7177 pi] , 10000]

I found it using the following Python code:

        >>> from sympy import pi
        >>> digits = str(pi.evalf(10000))[2:]
        >>> digits.find('2016')

By the way, it’s also true that 2016 = 1 + 2 + 3 + … + 63.

Get rid of something every Thursday

I heard of someone who had a commitment to get rid of something every Thursday. I don’t know anything about how they carried that out. It could mean throwing out or donating to charity a physical object each Thursday. Or maybe it could be handing over a responsibility or letting go of an ambition. It could be a combination, such as getting rid of an object that is a reminder of something intangible that you want to let go of.

This may mean reducing your total inventory of objects or obligations, or it could be simply turnover, making room for new things.

Getting rid of an obligation is not necessarily irresponsible, nor is letting go of an ambition necessarily lazy. Letting go of one obligation to take on another could be very responsible. Letting go of one ambition to pursue another could be a lot of work.

Related posts:

Automate to save mental energy, not time

Automation doesn’t always save as much time or effort as we expect.

The xkcd cartoon above is looking at automation as an investment. Does the work I put in now eventually save more work than I put into it? Automation may be well worth it even if the answer is “no.”

Automation can be like a battery as well as an investment. Putting energy into batteries is a bad investment; you’ll never get out as much energy as you put in. But that’s not why you put energy into batteries. You put energy in while you can so you can use some of that energy later when you need it.

Write automation scripts when you have the time, energy, and motivation to do so and when nothing else is more important. (Or nothing is more interesting, if you’re looking for a way to procrastinate without feeling too guilty. This is called “moral compensation.”) You may indeed save more work than you put into writing the scripts. But you also may save mental energy just when you need it.

Suppose it takes you an hour to write a script that only saves you two minutes later. If that two minutes would have derailed your concentration at a critical moment, but it didn’t because you had the script, writing the script may have paid for itself, even though you invested 60 minutes to save 2 minutes.

If your goal is to save mental energy, not time, you have a different strategy for automation. If a script executes faster than a manual process, but it takes a long time to remember where to find the script and how to run it, it may be a net loss. The less often you run a script, the chattier the interface should be.

The same considerations apply to learning third party software. I suspect the time I’ve put into learning some features of Emacs, for example, will not pay for itself in terms of time invested versus time saved. But I’ve invested leisure time to save time when I’m working hard, not to save keystrokes but to save mental energy for the project at hand.

Related post: How much does typing speed matter?

The Dirac comb or Sha function

shaThe sha function, also known as the Dirac comb, is denoted with the Cyrillic letter sha (Ш, U+0428). This letter was chosen because it looks like how people visualize the function, a long series of vertical spikes. The function is called the Dirac comb for the same reason. This function is very important in Fourier analysis because it relates Fourier series and Fourier transforms. It relates sampling and periodization.  It’s its own Fourier transform, and with a few qualifiers discussed later, the only such function.

The Ш function, really the Ш distribution, is defined as

sha(x) = \sum_{n=-\infty}^\infty \delta(x-n)

Here δ(xn) is the Dirac delta distribution centered at n. The action of δ(xn) on a test function is to evaluate that function at n. You can envision Ш as an infinite sequence of spikes, one at each integer. The action of Ш on a test function is to add up its values at every integer.


The product of Ш with a function f is a new distribution whose action on a test function φ is the sum of f φ over all integers. Or you could think of the distribution as a sort of clothesline on which to hang the sampled values of f, much the way a generating function works.


Next let’s look at a function f that lives on [0, 1], i.e. is zero everywhere outside the unit interval. The convolution of f with δ(xn) is f(xn), i.e. a copy of f shifted over to live on the interval [n, n+1]. So by taking the convolution with Ш, we create copies of f all over the real line. We’ve made f into a periodic function. So instead of saying “the function f extended to create a periodic function” you can simply say f*Ш.

Fourier transform

Now let’s think about the Fourier transform of Ш. The Fourier transform of δ(x) is 1, i.e. the function equal to 1 everywhere [1].  (The more concentrated a function is, the more spread out its Fourier transform. So if you have an infinitely concentrated function δ, its Fourier transform is perfectly flat, 1. You can calculate the transform rigorously, this this is the intuition.) If you shift a function by n, you rotate its Fourier transform by exp(-2πinω). So we can compute the transform of Ш:

Fourier transform of sha = \sum_{n=-\infty}^\infty \exp(-2\pi i n \omega)

This equation only makes sense in terms of distributions. The right hand side does not converge in the classical sense; the individual terms don’t even go to zero, since each term has magnitude 1. So what kind of distribution is this thing on the right side? It is in fact the Ш function again, though this is not obvious.

To see that the exponential sum is actually the Ш function, i.e. that Ш is its own Fourier transform, we need to back up a little bit and define Fourier transform of a distribution. As usual with distributions, we take a classical theorem and turn it into a definition in a broader context.

For absolutely integrable functions, we have

\int_{-\infty}^ \infty \hat{f}(x) \, \varphi(x) \, dx = \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

where the hat on top of a function indicates its Fourier transform. Inspired by the theorem above, we define the Fourier transform of a distribution f to be the functional whose action on a test function φ is given below.

 \hat{f} : \varphi \mapsto \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

As we noted in a previous post, the integral above can be taken literally if f is a distribution associated with an ordinary function, but in general it means the application of the linear functional to the test function.

As a distribution, exp(-2πinω) acts on a test function φ by integrating against it. From the definition of a (classical) Fourier transform, this gives the Fourier transform of φ evaluated at n. So the Fourier transform of Ш acts on φ by summing the values of φ’s Fourier transform over all integers. By the Poisson summation formula, this is the same as summing the values of φ itself over all integers. Which is the same as applying Ш. So the Fourier transform of Ш has the same effect on test functions as Ш. In other words, Ш is its own Fourier transform.


We haven’t been explicit about where our test functions come from. We require that xn φ(x) goes to zero as x goes to ±∞ for any positive integer n. These are called functions of rapid decay. And the distributions we define as linear functionals on such test functions are called tempered distributions.

The Ш distribution is essentially unique. Any tempered distribution with period 1 that equals its own Fourier transform must be a multiple of Ш.

* * *

[1] All Fourier transform calculations here use the convention I call (-1, τ, 1) in these notes on various definitions. This may be the most common definition, though there are several minor variations in common use.

Consulting help with Fourier analysis

The longer it has taken, the longer it will take

Suppose project completion time follows a Pareto (power law) distribution with parameter α. That is, for t > 1, the probability that completion time is bigger than t is t. (We start out time at t = 1 because that makes the calculations a little simpler.)

Now suppose we know that a project has lasted until t0 so far. Then the expected finish time is αt0/(α-1) and so the expected additional time is t0/(α-1). Note that both are proportional to t0. So the longer it has taken, the longer it will take. If the project is running late, you can expect the time remaining to be even more than the expected time before the project started. The finish line is moving away from you!

For example, suppose α = 2 (in applications of power laws, α is often between 1 and 3) and you’re measuring time in years. When the project starts at t = 1, it is expected to take one year, until t = 2. Now suppose you’re starting the second year and the project isn’t done. Now it’s expected to finish at t = 4, two more years. When you started, the project was supposed to take a year. One year later, it has taken a year, and should be expected to take two more years. I said “should be expected” rather than “is expected” because no one would believe such an estimate. (Ever heard of the Big Dig? Or other megaprojects?)

Note that we have computed the conditional probability given only the time it has taken so far, and no other information. If you know more, for example maybe you know that some specific pieces have been completed, then you should use that information.

This is related to the Lindy effect. The longer a cultural artifact has been around, the longer it is expected to last into the future.

Two meanings of distribution

There are a couple common uses of the term distribution in math. The most familiar is probability distribution, such as a beta distribution, a Poisson distribution, etc. Less familiar but still common is distributions in the sense of generalized functions, like the Dirac delta distribution. Anybody with much exposure to math will have heard of a probability distribution. Generalized functions are common knowledge in some areas of math such as differential equations or harmonic analysis, but mathematicians in other areas, say graph theory, may not have heard of them.

This post briefly answers two questions:

  1. What is a distribution as in a generalized function?
  2. What does it have to do with a probability distribution?

Most of this post will deal with the first question, but we’ll circle back to the second question by the end.

You may have heard that a Dirac delta function δ(x) is an “infinitely concentrated” function or a point mass. Or you may have heard some of the rules for working with it, such as that it is infinite at the origin, zero everywhere else, and integrates to 1. But no function can actually do what the delta function is said to do. Measure theory will let functions take on actual infinite values, but the value of a function at a single point, even if that value is infinite, cannot matter to its integral. Even putting that aside, if you say δ(x) is infinite at 0 and integrates to 1, then how do you make sense of expressions like 2 δ? Is it twice as infinite at 0, whatever that means? Is it twice as zero everywhere else? And what on earth could it mean to take a derivative or Fourier transform of δ(x)?

Generalized functions are a way to define things like the δ distribution rigorously. They let you preserve some of the intuitive/magical properties you want while also giving rules to keep you from getting into trouble. Regarding the paragraph above, the theory will let you integrate, differentiate, and take the Fourier transform of δ(x) but it won’t let you do things like say that 2δ = δ since 2×0 = 0 and 2×∞ = ∞.

Generalized functions are just functions, but not functions of real numbers. They are linear functions that take other functions [1] and return real numbers. The functions they act on are typically called test functions. To reduce the confusion of having different kinds of functions under discussion, linear functions that act on other functions are usually called functionals. A functional is just a function, a linear function from test functions to real numbers, but it helps to give it a different name.

You can write the action of a distribution f on a test function φ as if it were an integral:

f : \varphi \mapsto \int f(x)\, \varphi(x) \, dx

If f is a function, you can take the integral literally. Distributions generalize functions by associating a function with the linear operator that acts on a test function by multiplying by it and integrating.

But distributions include other kinds of linear functionals, in which case the integral expression is not literal. The δ distribution, for example, acts on a test function φ by returning φ(0). And here’s the connection to the intuitive idea of a function infinitely concentrated at 0. If a function integrates to 1, and is very concentrated near 0, then its integral when multiplied by φ is approximately φ(0).  You could make this rigorous and define generalized functions as limits of functions, but that approach is something of a dead end. The theory is much simpler using the linear functional definition.

How does this let you differentiate things like the Dirac delta? In a nutshell, you take what is a theorem for ordinary functions and turn it into a definition for generalized functions. I explain this in more detail here.

So the theory of distributions lets you use your intuition regarding “infinitely concentrated” functions and such. It also lets you carry out formal calculations, such as differentiating or taking the Fourier transform of distributions. But it also keeps you out of trouble. Back to our example above, what does 2δ mean? It’s simply the linear functional that takes a test function φ and returns 2 φ(0).

Now what does all this have to do with probability distributions? You can think of a probability density function as something that exists to be integrated. You find the probability of some event (set) by integrating a probability density over it. You find the expected value of some function by multiplying that function by a probability distribution and integrating. Likewise you could think of distributions in the sense of generalized functions as things that exist to be integrated. They act on test functions by being integrated against them, or by doing things analogous to integration that are more general.

People sometimes get confused because they look at probability densities outside of integrals and try to think of them is probabilities. They’re not. They are things you integrate to get probabilities. A probability density can, for example, be larger than 1, but a probability cannot. Likewise people sometimes get confused when they think of generalized functions on their own. If you give the generalized function something to act on, you’re more likely to be guided into doing the right thing. Distributions, whether probability distributions or generalized functions, act on other things.

* * *

[1] The space of test functions can vary. The most common choice is infinitely differentiable functions with compact support. But for Fourier analysis, the natural space of test functions consists of infinitely differentiable functions of rapid decay, i.e. functions φ such that xn φ(x) goes to zero as x goes to ±∞ for any positive integer n.

The reason is that the Fourier transform of such a function is another function of the same kind. Test functions of compact support aren’t suited for Fourier analysis because a function with compact support cannot have a Fourier transform with compact support. It’s related to the Heisenberg uncertainty principle: the more concentrated something is in the time domain, the less concentrated it is in the frequency domain. A signal can’t be time-limited and bandlimited.

Restarting @DSP_fact, ending @PerlRegex

I’m making a couple changes to my Twitter accounts.

First, I’m winding down @PerlRegex. I’ll stop tweeting there when my scheduled tweets run out. I suggest that everyone who has been following @PerlRegex start following @RegexTip instead. The latter is more general, but is mostly compatible with Perl.

Second, I’m reviving my @DSP_Fact. I stopped tweeting there a couple years ago, but I’d like to start posting there again. This time it’s going to be a little broader. I intend to include some material on acoustics, Fourier analysis (continuous and discrete), and maybe some other related material.

DSP_fact logo


I was listening to a classic music station yesterday, and I heard the story of a professional pianist whose hand was injured in an accident. He then started learning trumpet and two years later he was a professional trumpeter. I didn’t catch the musician’s name.

I was not surprised that a professional in one instrument could become a professional in another, but I was surprised that he did it in only two years. It probably helped that he was no longer able to play piano; I imagine if he wanted to learn trumpet in addition to piano he would not have become so proficient so quickly.

If you go by the rule of thumb that it takes about 10 years to master anything, this professional pianist was 80% of the way to becoming a professional trumpeter before he touched a trumpet. Or to put it another way, 80% of being a professional musician who plays trumpet is becoming a professional musician.

Transferable skills are more difficult to acquire and more valuable than the context in which they’re exercised.

Related posts:

Sinc and Jinc sums

In the previous post, we looked at an elegant equation involving integrals of the sinc function and computed the corresponding integrals for the jinc function.

\int_{-\infty}^\infty \mbox{sinc}(x) \, dx = \int_{-\infty}^\infty \mbox{sinc}^2(x) \, dx = \pi

It turns out the analogous equation holds for sums as well:

\sum_{n=-\infty}^\infty \mbox{sinc}(n) = \sum_{n=-\infty}^\infty \mbox{sinc}^2(n) = \pi

As before, we’d like to compute these two sums and see whether we can compute the corresponding sums for the jinc function.

The Poisson summation formula says that a function and its Fourier transform produce the same sums over the integers:

\sum_{n=-\infty}^\infty f(n) = \sum_{n=-\infty}^\infty \hat{f}(n)

Recall from the previous post that the Fourier transform of sinc is the function π box(π x) where the box function is 1 on [-1/2, 1/2] and zero elsewhere. The only integer n with πn inside [-1/2, 1/2] is 0, so the sum of sinc(n) over the integers equals π. A very similar argument shows that the sum of jinc(n) over the integers equals its Fourier transform at 0, which equals 2.

Let tri(x) be the triangle function, defined to be 1 – |x| for -1 < x < 1 and 0 otherwise. Then the Fourier transform of tri(x) is sinc2(π ω) and so π tri(π x) and sinc2 are Fourier transform pairs. The Poisson summation formula says the sum of sinc2 over the integers is the sum of π tri(π x) over the integers, which is π.

I don’t know the Fourier transform of jinc2 and doubt it’s easy to compute. Maybe the sum could be computed more easily without Fourier transforms, e.g. using contour integration.

DSP and time series analysis

Sinc and Jinc integrals

The sinc function is defined by sinc(x) = sin(x)/x. Philip Woodward introduced the name of the function in 1952, saying it “occurs so often in Fourier analysis and its applications that it does seem to merit some notation of its own.”

Here’s an elegant equation involving the integrals of the sinc function:

\int_{-\infty}^\infty \mbox{sinc}(x) \, dx = \int_{-\infty}^\infty \mbox{sinc}^2(x) \, dx = \pi

When I ran across this recently I wondered two things: How hard is it to compute these two integrals? What are the corresponding results for the jinc function? The jinc function is analogous to sinc, but using a Bessel function in place of sine: jinc(x) = J1(x)/x.

The Fourier transform of the box function, the function box(x) that is 1 on the interval [-1/2, 1/2] and zero everywhere else, is sinc(π ω). (That’s one of the reasons sinc comes up so often in Fourier analysis, as Woodward observed.) So the Fourier transform of sinc(x) is π box(π x). The integral of a function is the value of its Fourier transform at zero, so sinc integrates to π. [1]

By Plancherel’s theorem, the integral of sinc2(x) is the integral of it’s Fourier transform squared, which equals π.

[There are several conventions for defining the Fourier transform. Here I’m using what I call the (-1, τ, 1) definition in these notes. See that page for other conventions and how to convert between them.]

Now for the jinc function. It also has a simple Fourier transform: f(ω) = 2 √(1 – (2πω)) for |x| < 1/2π and zero otherwise. As above, we can compute the integral of jinc over the real line by evaluating its Fourier transform at 0, which equals 2.

Also as above, the integral of jinc2 is the integral of its Fourier transform squared, which works out to 8/3π.

Update: See the next post for the analogous relations for sums.

Related posts:

* * *

[1] You may have a couple objections to this calculation. I found the Fourier transform of the box function was sinc, then concluded that the transform of sinc is the box function. But applying the Fourier transform twice doesn’t give you the original function back, right? When you transform f(x) twice you get  f(-x), but the functions involved here are even, so  f(-x) =  f(x).

OK, but you may still have another objection: the sinc function does not have bounded L1 norm, so you can’t just take it’s Fourier transform. True, but you can justify the transform in terms of L2 theory or distribution theory.

Digital signal processing and time series