# Fourier, Gauss, and Heisenberg

Several weeks ago I wrote about the Fourier uncertainty principle which gives a lower bound on the product of the variance of a function f and the variance of its Fourier transform. This post expands on the earlier post by quoting some results from a recent paper .

## Gaussian density

The earlier post said that the inequality in the Fourier uncertainty principle is exact when f is proportional to a Gaussian probability density. G. H. Hardy proved this result in 1933 in the form of the following theorem.

Let f be a square-integrable function on the real line and assume f and its Fourier transform satisfy the following bounds for some constant C. Then if ab > 1/4, then f = 0. And if ab = 1/4, f(x) = c exp(-ax²) for some constant c.

Let’s translate this into probability terms by setting Now Hardy’s theorem says that if f is bounded by a multiple of a Gaussian density with variance σ² and its Fourier transform is bounded by a multiple of a Gaussian density with variance τ², then the product of the two variances is no greater than 1. And if the product of the variances equals 1, then f is a multiple of a Gaussian density with variance σ².

## Heisenberg uncertainty

Theorem 3 in  says that if u(tx) is a solution to the free Schrödinger’s equation then u at different points in time satisfies a theorem similar to Hardy’s theorem. In fact, the authors show that this theorem is equivalent to Hardy’s theorem.

Specifically, if u is a sufficiently smooth solution and then αβ > (4T)-2 implies u(t, x) = 0, and αβ = (4T)-2 implies ## Related posts

 Aingeru Fernández-Bertolin and Eugenia Malinnikova. Dynamical versions of Hardy’s uncertainty principle: A survey. Bulletin of the American Mathematical Society. DOI: https://doi.org/10.1090/bull/1729

# Computing Fourier series coefficients with the FFT

The Discrete Fourier Transform (DFT) is a mathematical function, and the Fast Fourier Transform (FFT) is an algorithm for computing that function. Since the DFT is almost always computed via the FFT, the distinction between the two is sometimes lost.

It is often not necessary to distinguish between the two. In my previous post, I used the two terms interchangeably because the distinction didn’t matter in that context.

Here I will make a distinction between the DFT and the FFT; I’ll use DFT to refer to the DFT as it is defined in , and FFT for the DFT as computed in NumPy‘s FFT routine. The differences are due to varying conventions; this is often an issue.

Suppose we have a function f on an interval [-A/2, A/2] and we sample f at N points where N is an even integer.

Define for n running from –N/2 + 1 to N/2.

DFT of the sequence {fn} is defined in  as the sequence {Fk} where Now suppose the function f that we sampled has a Fourier series The Fourier coefficients ck relate to the DFT output Fk according to the Discrete Poisson Summation Formula : This means that the ck simply equal the Fk if f has no frequency components higher than N/2 Hz because all the terms in the infinite sum above are zero. That is, if f is band-limited and we have sampled at a frequency higher than the Nyquist frequency, then we can simply read the Fourier coefficients off the DFT.

However, when f is not band-limited, or when it is band-limited but we have not sampled it finely enough, we will have aliasing. Our Fourier coefficients will differ from our DFT output by an error term involving high-frequency Fourier series coefficients.

In application it’s usually too much to hope that your function is exactly band-limited, but it may be approximately band-limited. The Fourier coefficients of smooth functions eventually decay rapidly (see details here) and so the error in approximating Fourier series coefficients by DFT terms is small.

## Computing a DFT with the FFT

We defined the DFT of the sequence {fn} above to be the sequence {Fk} where and k runs from –N/2 + 1 to N/2.

NumPy, on the other hand, defines the DFT of the sequence {an} to be the sequence {Ak} where and k runs from 0 to N-1.

Relative to the definition in the previous post, the NumPy definition difference in three ways:

1. The normalization factor 1/N is missing.
2. The input indices are numbered differently.
3. The output is arranged differently.

The first difference is trivial to overcome: simply divide the FFT output by N.

The second difference is easy to deal with if we think of the inputs to the FFT being samples from a periodic function, which they usually are. The fk come from sampling a periodic function f over an interval [-A/2, A/2]. If we sample the same function over [0, A] we get the an. We have If we extend the fs and the as periodically past their original ranges of definition, then they all agree. But since we start our sampling in difference places, our samples would be listed in different orders if we stored them by increasing index.

Something similar occurs with the output.

For 0 ≤ kN/2,

Fk = Ak.

But for N < k < N,

Fk = ANk.

## Example

We’ll use a band-limited function in our example so that we find the Fourier coefficients exactly.

f(x) = 7 cos(6πx) – 5 sin(4πx)

We compute the FFT as follows.

    import numpy as np

def f(x):
return 7*np.sin(3*2*np.pi*x) - 5*np.cos(2*2*np.pi*x)

N = 8
delta = 1/N
x = [f(n*delta) for n in range(N)]
print( np.fft.fft(x)/N )


The output is

[0, 0, -2.5, -3.5i, 0, 3.5i, -2.5, 0]

This says F2 and F-2 = 5/2, and so the coefficient of cos(2·2πx) is 5.

Also F3 = -7/2 and F-3 = 7/2, so the coefficient of cos(3·2πx) is 7. These results follow from Euler’s equation

exp(iθ) = cos(θ) + i sin(θ)

 William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.

# Support of a signal and its FFT

The previous post looked at the Fourier uncertainty principle. This post looks at an analogous result for the discrete Fourier transform.

The uncertainty principle for the (continuous) Fourier transform says a signal cannot be localized in both the time domain and the frequency domain. The more bunched up a function is, the more spread out its transform is.

The discrete analog of the Fourier uncertainty principle given in  says something similar, that a signal and its discrete Fourier transform cannot both be concentrated. The difference is in how concentration is measured.

In the previous post, concentration was measured in terms of variance, very similar to how variance is defined for random variables. But in the discrete case, we do something simpler: we count how many terms are non-zero.

The support of a function is the set of points where the function is not zero. A highly concentrated function is one with small support. The discrete Fourier uncertainty principle says that a non-zero signal and its Fourier transform cannot both have small support.

Specifically, let f be a sequence of length N and let F be its discrete Fourier transform. Then

| support of f | × | support of F | ≥ N.

where | S | is the number of elements in a set S.

As with the continuous case, we have a lower bound on the product of the uncertainty of something and its Fourier transform. The discrete form is simpler in that the lower bound does not depend on one’s convention for defining the Fourier transform. As with the continuous case, there are many conventions for defining the DFT. But these variations in convention don’t effect counting zeros.

## Examples

As pointed out in , you can construct examples where the inequality above holds exactly. These sequences are evenly spaced constants. If N = pq then the FFT of a sequence of 1s every p places will have non-zero elements every q places.

For example,

    >>> import numpy as np
>>> np.fft.fft([1,0,0,1,0,0])
array([2, 0, 2, 0, 2, 0])


In this instance, the original sequence has support of size 2, its transform has support of size 3, and the product of the support sizes is their length 6.

Note that multiplying a Python list (not a NumPy array) by an integer concatenates the list to itself that many times. The example above could be read as saying the FFT of

    [1, 0, 0]*2

is

    [2, 0]*3

.

As another example, we can show that the FFT of

    [1, 0, 0, 0, 0]*3

is

    [3, 0, 0]*5

.

The code below is complicated a bit by NumPy bookkeeping, but essentially it shows that the FFT of [1,0,0,0,0]*3 is [3,0,0]*5.

    >>> x = np.fft.fft([1, 0, 0, 0, 0]*3)
>>> y = 3*np.array([1, 0, 0]*5, dtype=complex)
>>> np.all(x == y)
True


The lower bound on the product of support sizes is tight for multiples and rotations of sequences like those above.

For example,

    >>> np.fft.fft([0, 7, 0, 0, 7, 0])
array([14. +0.j        ,  0. +0.j        , -7.-12.12435565j,
0. +0.j        , -7.+12.12435565j,  0. +0.j        ])


Here we took the first example, changed the 1s to 7s, and rotated everything one position. The FFT is a little more complicated in this case, but it’s still the case that the input has support of size 2 and the output has support of size 3.

The only sequences for which the discrete uncertainty principle lower bound is tight are those like we’ve seen above: multiples and rotations of evenly spaced 1s. For any other sequences the inequality above is strict. For example, here’s what happens if our 1’s are not evenly spaced.

    >>> np.fft.fft([1, 0, 0, 0, 1, 0])
array([2. +0.00000000e+00j, 0.5+8.66025404e-01j, 0.5-8.66025404e-01j,
2. -5.55111512e-17j, 0.5+8.66025404e-01j, 0.5-8.66025404e-01j])


Now the input has support 2, the output has support 6.

## Related posts

 David L. Donoho and Philip B. Stark. Uncertainty Principles and Signal Recovery. SIAM Journal on Applied Mathematics, June 1989, Vol. 49, No. 3, pp. 906-931

# Fourier uncertainty principle

Heisenberg’s uncertainty principle says there is a limit to how well you can know both the position and momentum of anything at the same time.  The product of the uncertainties in the two quantities has a lower bound.

There is a closely related principle in Fourier analysis that says a function and its Fourier transform cannot both be localized. The more concentrated a signal is in the time domain, the more spread out it is in the frequency domain.

There are many ways to quantify how localized or spread out a function is, and corresponding uncertainty theorems. This post will look at the form closest to the physical uncertainty principle of Heisenberg, measuring the uncertainty of a function in terms of its variance. The Fourier uncertainty principle gives a lower bound on the product of the variance of a function and the variance of its Fourier transform.

## Variance

When you read “variance” above you might immediately thing of variance as in the variance of a random variable. Variance in Fourier analysis is related to variance in probability, but there’s a twist.

If f is a real-valued function of a real variable, its variance is defined to be This is the variance of a random variable with mean 0 and probability density |f(x)|². The twist alluded to above is that f is not a probability density, but |f(x)|² is.

Since we said f is a real-valued function, we could leave out the absolute value signs and speak of f(x)² being a probability density. In quantum mechanics applications, however, f is complex-valued and |f(x)|² is a probability density. In other words, we multiply f by its complex conjugate, not by itself.

The Fourier variance defined above applies to any f for which the integral converges. It is not limited to the case of |f(x)|² being a probability density, i.e. when |f(x)|² integrates to 1.

## Uncertainty principle

The Fourier uncertainty principle is the inequality where the constant C depends on your convention for defining the Fourier transform . Here ||f||2² is the integral of f², the square of the L² norm.

Perhaps a better way to write the inequality is for non-zero f. Rather than look at the variances per se, we look at the variances relative to the size of f. This form is scale invariant: if we multiply f by a constant, the numerators and denominators are multiplied by that constant squared.

The inequality is exact when f is proportional to a Gaussian probability density. And in this case the uncertainty is easy to interpret. If f is proportional to the density of a normal distribution with standard deviation σ, then its Fourier transform is proportional to the density of a normal distribution with standard deviation 1/σ, if you use the radian convention described in .

## Example

We will evaluate both sides of the Fourier uncertainty principle with

    h[x_] := 1/(x^4 + 1)

and its Fourier transform

    g[w_] := FourierTransform[h[x], x, w]

We compute the variances and the squared norms with

    v0 = Integrate[x^2 h[x]^2, {x, -Infinity, Infinity}]
v1 = Integrate[w^2 g[w]^2, {w, -Infinity, Infinity}]
n0 = Integrate[    h[x]^2, {x, -Infinity, Infinity}]
n1 = Integrate[    g[w]^2, {w, -Infinity, Infinity}]


The results in mathematical notation are From here we can calculate that the ratio of the left side to the right side of the uncertainty principle is 5/18, which is larger than the lower bound C = 1/4.

By the way, it’s not a coincidence that h and its Fourier transform have the same norm. That’s always the case. Or rather, that is always the case with the Fourier convention we are using here. In general, the L² norm of the Fourier transform is proportional to the L² norm of the function, where the proportionality constant depends on your definition of Fourier transform but not on the function.

Here is a page that lists the basic theorems of Fourier transforms under a variety of conventions.

## Related posts

 In the notation of the previous post C = 1/(4b²). That is, C = 1/4 if your definition has a term exp(± i ω t) and C = 1/16π² if your definition has a exp(±2 π i ω t) term.

Said another way, if you express frequency in radians, as is common in pure math, C = 1/4. But if you express frequency in Hertz, as is common in signal processing, C = 1/16π².

Fourier transform definitions also have varying constants outside the integral, e.g. possibly dividing by √2π, but this factor effects both sides of the Fourier uncertainty principle equally.

# Fourier transforms in Mathematica

Unfortunately there are many slightly different ways to define the Fourier transform. So the first two questions when using Mathematica (or any other software) to compute Fourier transforms are what definition of Fourier transform does it use, and what to do if you want to use a different definition.

The answer to the first question is that Mathematica defines the Fourier transform of f as The answer to the second question is that Mathematica defines a parameterized Fourier transform by where a defaults to 0 and b defaults to 1.

## Examples

For example, if φ(x) = exp(-x²/2), then we can compute Mathematica’s default Fourier transform with

    φ[x_] := Exp[-x^2/2]
FourierTransform[φ[x], x, ω]


This computes But if we set (a, b) to (0, 2π) with

    FourierTransform[φ[x], x, ω, FourierParameters -> {0, 2 Pi}]

we compute ## Theorems under various conventions

Several years ago I got frustrated enough with the multitude of Fourier transform conventions that I made a sort of Rosetta stone, giving several of the most basic theorems under each of eight definitions. I used the parameterization where m is either 1 or 2π, σ is either 1 or -1, and q is either 1 or 2π.

My σ and q are the sign and absolute value of Mathematica’s b. The relation between my m and Mathematica’s a is slightly more complicated and given in the table below. # Zeros of trigonometric polynomials

A periodic function has at least as many real zeros as its lowest frequency Fourier component. In more detail, the Sturm-Hurwitz theorem says that has at least 2n zeros in the interval [0, 2π) if an and bn are not both zero. You could take N to be infinity if you’d like.

Note that the lowest frequency term can be written as for some amplitude c and phase φ as explained here. This function clearly has 2n zeros in each period. The remarkable thing about the Sturm-Hurwitz theorem is that adding higher frequency components can increase the number of zeros, but it cannot decrease the number of zeros.

To illustrate this theorem, we’ll look at a couple random trigonometric polynomials with n = 5 and N = 9 and see how many zeros they have. Theory says they should have at least 10 zeros.

The first has 16 zeros: And the second has 12 zeros: (It’s difficult to see just how many zeros there are in the plots above, but if we zoom in by limiting the vertical axis we can see the zeros more easily. For example, we can see that the second plot does not have a zero between 4 and 5; it almost reaches up to the x-axis but doesn’t quite make it.)

Here’s the code that made these plots.

    import matplotlib.pyplot as plt
import numpy as np

n = 5
N = 9
np.random.seed(20210114)

for p in range(2):
a = np.random.random(size=N+1)
b = np.random.random(size=N+1)

x = np.linspace(0, 2*np.pi, 200)
y = np.zeros_like(x)
for k in range(n, N+1):
y += a[k]*np.sin(k*x) + b[k]*np.cos(k*x)
plt.plot(x, y)
plt.grid()
plt.savefig(f"sturm{p}.png")
plt.ylim([-0.1, 0.1])
plt.savefig(f"sturm{p}zoom.png")
plt.close()


# Cesàro summation

There’s more than one way to sum an infinite series. Cesàro summation lets you compute the sum of series that don’t have a sum in the classical sense.

Suppose we have an infinite series The nth partial sum of the series is given by The classical sum of the series, if it exists, is defined to be the limit of its partial sums. That is, Cesàro summation takes a different approach. Instead of taking the limit of the partial sums, it takes the limit of the averages of the partial sums. To be specific, define and define the Cesàro summation to be the limit of the Cn as n goes to infinity. If a series has a sum in the classical sense, it also has a sum in the Cesàro sense, and the limits are the same. But some series have a Cesàro sum that do not have a classical sum. Or maybe both limits exist but the intermediate steps of Cesàro summation are better behaved, as we’ll see in an example below.

If you express the Cn in terms of the original an terms you get In other words, the nth Cesàro partial sum is a reweighting of the classical partial sums, with the weights changing as a function of n. Note that for fixed i, the fraction multiplying ai goes to 1 as n increases.

## Fejér summation and Gibbs phenomenon

Fejér summation
is Cesàro summation applied to Fourier series. The (ordinary) partial sums of a Fourier series give the best approximation to a function as measured by least squares norm. But the Cesàro partial sums may be qualitatively more like the function being approximated. We demonstrate this below with a square wave. The 30th ordinary partial sum shows the beginnings of Gibbs phenomenon, the “bat ears” at the top of the square wave and their mirror image at the bottom. The 30th Cesàro partial sum is smoother and eliminates Gibbs phenomena near the discontinuity in the square wave.

# Gibbs phenomenon

I realized recently that I’ve written about generalized Gibbs phenomenon, but I haven’t written about its original context of Fourier series. This post will rectify that.

The image below comes from a previous post illustrating Gibbs phenomenon for a Chebyshev approximation to a step function. Although Gibbs phenomena comes up in many different kinds of approximation, it was first observed in Fourier series, and not by Gibbs . This post will concentrate on Fourier series, and will give an example to correct some wrong conclusions one might draw about Gibbs phenomenon from the most commonly given examples.

The uniform limit of continuous function is continuous, and so the Fourier series of a function cannot converge uniformly where the function is discontinuous. But what does the Fourier series do near a discontinuity?

It’s easier to say what the Fourier series does exactly at a discontinuity. If a function is piecewise continuous, then the Fourier series at a jump discontinuity converges to the average of the limits from the left and from the right at that point.

What the Fourier series does on either side of the discontinuity is more interesting. You can see high-frequency oscillations on either side. The series will overshoot on the high side of the jump and undershoot on the low side of the jump.

The amount of overshoot and undershoot is proportional to the size of the gap, about 9% of the gap. The exact proportion, in the limit, is given by the Wilbraham-Gibbs constant Gibbs phenomenon is usually demonstrated with examples that have a single discontinuity at the end of their period, such as a square wave or a saw tooth wave. But Gibbs phenomenon occurs at every discontinuity, wherever located, no matter how many there are.

The following example illustrates everything we’ve talked about above. We start with the function f plotted below on [-π, π] and imagine it extended periodically. 1. It is continuous at the point where it repeats since it equals 0 at -π and π.
2. It has two discontinuities inside [-π, π].
3. One of the discontinuities is larger than the other.

The following plot shows the sum of the first 100 terms in the Fourier series for f plotted over [-2π, 2π]. 1. There nothing remarkable about the series at -π and π.
2. You can see Gibbs phenomenon at the discontinuities of f.
3. The overshoot and undershoot are larger at the larger discontinuity.

Related to the first point above, note that the derivative of f is discontinuous at the period boundary. A discontinuity in the derivative does not cause Gibbs phenomena.

Here’s a close-up plot that shows the wiggling near the discontinuities. ## Gibbs phenomena for other series

 Henry Wilbraham first described what Josiah Gibbs discovered independently 50 years later, what we now call Gibbs phenomenon. This is an example of Stigler’s law of eponymy.

# Square wave, triangle wave, and rate of convergence

There are only a few examples of Fourier series that are relatively easy to compute by hand, and so these examples are used repeatedly in introductions to Fourier series. Any introduction is likely to include a square wave or a triangle wave .

By square wave we mean the function that is 1 on [0, 1/2] and -1 on [1/2, 1], extended to be periodic. Its nth Fourier (sine) coefficient is 4/nπ for odd n and 0 otherwise.

By triangle wave we mean 1 – |2x – 1|, also extended to be periodic. Its nth Fourier (cosine) coefficient is -4/(nπ)² for odd n and 0 otherwise.

This implies the Fourier coefficients for the square wave are O(1/n) and the Fourier coefficients for the triangle wave are O(1/n²). (If you’re unfamiliar with big-O notation, see these notes.)

In general, the smoother a function is, the faster its Fourier coefficients approach zero. For example, we have the following two theorems.

1. If a function f has k continuous derivatives, then the Fourier coefficients are O(1/nk).
2. If the Fourier coefficients of f are O(1/nk+1+ε) for some ε > 0 then f has k continuous derivatives.

There is a gap between the two theorems so they’re not converses.

If a function is continuous but not differentiable, the Fourier coefficients cannot decay any faster than 1/n², so the Fourier coefficients for the triangle wave decay as fast as they can for a non-differentiable function.

## More post on rate of convergence

 The third canonical example is the saw tooth wave, the function f(x) = x copied repeatedly to make a periodic function. The function rises then falls off a cliff to start over. This discontinuity makes it like the square wave: its Fourier coefficients are O(1/n).

# Clipped sine waves

One source of distortion in electronic music is clipping. The highest and lowest portions of a wave form are truncated due to limitations of equipment. As the gain is increased, the sound doesn’t simply get louder but also becomes more distorted as more of the signal is clipped off.

## Clipping 0.2

For example, here is what a sine wave looks like when clipped 20%, i.e. cut off to be between -0.8 and 0.8. A simple sine wave has only one Fourier component, itself. But when we clip the sine wave, we move energy into higher frequency components. We can see that in the Fourier components below. You can show by symmetry that the even-numbered coefficients are exactly zero.

## Clipping 0.6

Here are the corresponding plots for 60% clipping, i.e. the absolute value of the signal is cut off to be 0.4. First the signal and then its Fourier components. Here are the first five sine waves with the amplitudes given by the Fourier coefficients. And here we see how the of the sines above do a pretty good job of reconstructing the original clipped sine. We’d need an infinite number of Fourier components to exactly reconstruct the original signal, but the first five components do most of the work. ## Continuous range of clipping

Next let’s look at the ratio of the energy in the 3rd component to that of the 1st component as we continuously vary the amount of clipping. Now for the 5th harmonic. This one is interesting because it’s not strictly increasing but rather has a little bump before it starts increasing. Finally, here’s the ratio of the energy in all higher frequencies to the energy in the fundamental. 