Circulant matrices, eigenvectors, and the FFT

A circulant matrix is a square matrix in which each row is a rotation of the previous row. This post will illustrate a connection between circulant matrices and the FFT (Fast Fourier Transform).

Circulant matrices

Color in the first row however you want. Then move the last element to the front to make the next row. Repeat this process until the matrix is full.

The NumPy function roll will do this rotation for us. Its first argument is the row to rotate, and the second argument is the number of rotations to do. So the following Python code generates a random circulant matrix of size N.

    import numpy as np

    np.random.seed(20230512)
    N = 5
    row = np.random.random(N)
    M = np.array([np.roll(row, i) for i in range(N)])

Here’s the matrix M with entries truncated to 3 decimal places to make it easier to read.

    [[0.556 0.440 0.083 0.460 0.909]
     [0.909 0.556 0.440 0.083 0.460]
     [0.460 0.909 0.556 0.440 0.083]
     [0.083 0.460 0.909 0.556 0.440]
     [0.440 0.083 0.460 0.909 0.556]]    

Fast Fourier Transform

The Fast Fourier Transform is a linear transformation and so it can be represented by a matrix [1]. This the N by N matrix whose (j, k) entry is ωjk where ω = exp(-2πi/N), with j and k running from 0 to N – 1. [2]

Each element of the FFT matrix corresponds to a rotation, so you could visualize the matrix using clocks in each entry or by a cycle of colors. A few years ago I created a visualization using both clock faces and colors:

Eigenvectors and Python code

Here’s a surprising property of circulant matrices: the eigenvectors of a circulatnt matrix depend only on the size of the matrix, not on the elements of the matrix. Furthermore, these eigenvectors are the columns of the FFT matrix. The eigenvalues depend on the matrix entries, but the eigenvectors do not.

Said another way, when you multiply a circulant matrix by a column of the FFT matrix of the same size, this column will be stretched but not rotated. The amount of stretching depends on the particular circulant matrix.

Here is Python code to illustrate this.

    for i in range(N):
        ω = np.exp(-2j*np.pi/N)
        col1 = np.array([ω**(i*j) for j in range(N)])
        col2 = np.matmul(M, col1)
        print(col1/col2)

In this code col1 is a column of the FFT matrix, and col2 is the image of the column when multiplied by the circulant matrix M. The print statement shows that the ratios of each elements are the same in each position. This ratio is the eigenvalue associated with each eigenvector. If you were to generate a new random circulant matrix, the ratios would change, but the input and output vectors would still be proportional.

Notes

[1] Technically this is the discrete Fourier transform (DFT). The FFT is an algorithm for computing the DFT. Because the DFT is always computed using the FFT, the transformation itself is usually referred to as the FFT.

[2] Conventions vary, so you may see the FFT matrix written differently.

Fixed points of the Fourier transform

This previous post looked at the hyperbolic secant distribution. This distribution has density

f_H(x) = \frac{1}{2} \text{sech} \left(\frac{\pi x}{2} \right)

and characteristic function sech(t). It’s curious that the density and characteristic function are so similar.

The characteristic function is essentially the Fourier transform of the density function, so this says that the hyperbolic secant function, properly scaled, is a fixed point of the Fourier transform. I’ve long known that the normal density is its own Fourier transform, but only recently learned that the same is true of the hyperbolic secant.

Hermite functions

The Hermite functions are also fixed points of the Fourier transform, or rather eigenfuctions of the Fourier transform. The eigenvalues are 1, i, -1, and i. When the eigenvalues are 1, we have fixed points.

There are two conventions for defining the Hermite functions, and multiple conventions for defining the Fourier transform, so the truth of the preceding paragraph depends on the conventions used.

For this post, we will define the Fourier transform of a function f to be

\hat{f}(\omega) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \exp(-i \omega x)\, f(x)\, dx

Then the Fourier transform of exp(-x²/2) is the same function. Since the Fourier transform is linear, this means the same holds for the density of the standard normal distribution.

We will define the Hermite polynomials by

H_n(x) = (-1)^n e^{x^2}\frac{d^n}{dx^n}e^{-x^2}

using the so-called physics convention. Hn is an nth degree polynomial.

The Hermite functions ψn(x) are the Hermite polynomials multiplied by exp(-x²/2). That is,

\psi_n(x) = H_n(x) \exp(-x^2/2)

With these definitions, the Fourier transform of ψn(x) equals (-i)n ψn(x). So when n is a multiple of 4, the Fourier transform of ψn(x) is ψn(x).

[The definition Hermite functions above omits a complicated constant term that depends on n but not on x. So our Hermite functions are proportional to the standard Hermite functions. But proportionality constants don’t matter when you’re looking for eigenfunctions or fixed points.]

Hyperbolic secant

Using the definition of Fourier transform above, the function sech(√(π/2) x) is its own Fourier transform.

This is surprising because the Hermite functions form a basis for L²(ℝ), and all have tails on the order of exp(-x²), but the hyperbolic secant has tails like exp(-x). Each Hermite function eventually decays like exp(-x²), but this happens later as n increases, so an infinite sum of Hermite functions can have thicker tails than any particular Hermite function.

Related posts

Hilbert transform and Fourier series

A few days ago I wrote about the Hilbert transform and gave as an example that the Hilbert transform of sine is cosine. We’ll bootstrap that example to find the Hilbert transform of any periodic function from its Fourier series.

The Hilbert transform of a function f(t) is a function fH(x) defined by

f_H(x) = \frac{1}{\pi} \int_{-\infty}^\infty \frac{f(t)}{t - x}\, dt

where the integral is interpreted in the sense of the Cauchy principal value, the limit as the singularity is approach symmetrically from both sides.

The Hilbert transform shifts and scales conveniently. Shifting a function by any amount h shifts its transform by the same amount. And scaling a function by any amount k > 0 scales its transform the same way. That is, we have the following transform pairs.

\begin{align*} f(t) &\leftrightarrow f_H(x) \\ f(t - h) &\leftrightarrow f_H(x - h) \\ f(kt) &\leftrightarrow f_H(kx) \\ \end{align*}

Now since the Hilbert transform of sin(t) is cos(x), the Hilbert transform of sin(t + π/2) must be cos(x + π/2). But sin(t + π/2) is cos(t), and cos(x + π/2) is sin(x), so the Hilbert transform of cos(t) is -sin(x). In this case, the Hilbert transform has the same pattern as differentiation.

Now if ω > 0 the scaling rule tells us the Hilbert transform of sin(ωt) is cos(ωx) and the Hilbert transform of cos(ωx) is -sin(ωx). Here the analogy with differentiation breaks down because differentiation would bring out a factor of ω from the chain rule [1].

Putting these facts together, if we have a function f written in terms of a Fourier series

f(t) = \sum_{n=1}^\infty \left\{ a_n \sin(nt) + b_n\cos(nt) \right\}

then its Hilbert transform is

f_H(x) = \sum_{n=1}^\infty \left\{ -b_n \sin(nx) + a_n\cos(nx) \right\}

In other words, we replace the b‘s with a‘s and the a‘s with –b‘s. [2]

Notice that there’s no b0 term above. In signal processing terminology, there’s no DC offset. In general a Fourier series has a constant term, and the Hilbert transform of a constant is 0. So again like differentiation, constants go away.

If there is no DC offset, then applying the Hilbert transform to f twice gives –f. If there is a DC offset, applying the Hilbert transform to f twice gives –f with the DC offset removed.

Opposite sign convention

Unfortunately there are two definitions of the Hilbert transform in common use: the one at the top of this post and its negative. What changes if we use the other convention?

We noted above that applying the Hilbert transform to f twice gives –f. This means that the inverse transform is the negative transform [3]. In symbols, if H is the Hilbert transform operator, then –H²f = –Hf and so H-1 = –H. So the disagreement over whether to include a negative sign in the definition of the Hilbert transform amounts to a disagreement over which to call the forward transform and which to call the inverse transform.

The shifting and scaling rules apply to both definitions of the transform. But with the opposite sign convention, the Hilbert transform of sine is negative cosine and the Hilbert transform of cosine is sine. So our bottom line becomes “replace the a‘s with b‘s and the b‘s with –a‘s” in the Fourier series.

Footnotes

[1] Incidentally, the Hilbert transform commutes with differentiation. That is, the transform of the derivative is the derivative of the transform.

[2] This is an example of parallel replacement. We replace an, with –bn and bn, with an at the same time.

[3] For signals with no DC offset. Otherwise the Hilbert transform is not invertible.

Reverse engineering Fourier conventions

The most annoying thing about Fourier analysis is that there are many slightly different definitions of the Fourier transform. One time I got sufficiently annoyed that I creates a sort of Rosetta Stone of Fourier theorems under eight different conventions. Later I discovered that Mathematica supports these same eight definitions, but with slightly different notation.

When I created my Rosetta Stone I wanted to have a set of notes that answered the question “What are the basic Fourier theorems under this convention?” Recently I was reading a reference and wanted to answer the opposite question “Given the theorems this book is stating, what convention must they be using?”

The eight definitions correspond to

\hat{f}(\omega) = \frac{1}{\sqrt{m}} \int_{-\infty}^\infty \exp(\sigma q i \omega x) f(x) \,dx

where m is either 1 or 2π, σ is +1 or -1, and q is 2π or 1.

I’m posting these notes for my future reference and for anyone else who may need to do the same sleuthing.

Notation

For the rest of the post, let F and G be the Fourier transforms of f and g respectively. We write

f(x) \leftrightarrow F(\omega)

for the pair of a function and its Fourier transform.

Define the inner product of f and g as

\langle f, g \rangle = \int_{-\infty}^\infty f(x)\, g(x) \, dx

if f and g are real-value. If the functions are complex-valued, replace g with the complex conjugate if g.

We will sometimes denote the Fourier transform of a function by putting a hat on top of it.

Convolution theorem

The convolution theorem gives a quick way to determine the parameter m. Suppose convolution is defined by

(f*g)(x) = \int_{-\infty}^\infty f(x-y)\, g(y) \, dy

Then

f(x)*g(x) \leftrightarrow \sqrt{m} F(\omega)*G(\omega)

and so you can find m immediately. If f*g = F*G with no extra factor out front, m = 1. Otherwise if there’s a factor of √2π out front, then m = 2π. If there’s any other factor, you’ve got an arcane definition of Fourier transform that isn’t one of the eight considered here.

Some authors, like Walter Rudin, include a scaling factor in the definition of convolution, in which casethe argument of this section doesn’t hold.

Parseval and Plancherel

Parseval’s theorem says that the inner product of f and g is proportional to the inner product of F and G. The proportionality constant depends on the definition of Fourier transform, specifically on m and q, and so you can determine m or q from the form of Parseval’s theorem.

\langle f, g \rangle = k \langle F, G \rangle

If k = 1, then either q = 2π and m = 1 or q = 1 and m = 2π. If you know m, say from the statement of the convolution theorem, then Parseval’s theorem tells you q.

Plancherel’s theorem is the special case of Parseval’s theorem with f = g. It can be used the same way to solve for m or q.

If k = 2π, then q = m = 1. If k= 1/2π, then q = m = 1.

Double transform

Theorems about taking the Fourier transform twice carry the same information as Parseval’s and Plancherel’s theorems, i.e. they also let you determine m or q.

We have

\hat{\hat{f}}(x) = k f(-x)

with the same conclusions based on k as above:

  • If k = 1, then either q = 2π and m = 1 or q = 1 and m = 2π.
  • If k = 2π, then q = m = 1.
  • If k= 1/2π, then q = m = 1.

Shift and differentiation

So far we none of our theorems have allowed us to reverse engineer σ. Either the shift or differentiation theorem will let is find σq.

The shift theorem says

f(x-h) \leftrightarrow \exp(ikh\omega) F(\omega)

where k = σq. Since σ = ±1 and q = 1 or 2π, the product σq determines both σ and q.

Similarly, the differentiation theorem says the Fourier transform of the derivative of f transforms as

f'(x) \leftrightarrow ik\omega F(\omega)

and again k = σq.

Chaos in the frequency domain

Solutions to the non-linear differential equation

x ″ + 0.25x ′ + x(x² – 1) = 0.3 cos t

are chaotic. It’s more common to see plots of chaotic systems in the time domain, so I wanted to write a post looking at the power spectrum in the frequency domain.

The following plot was created by solving the equation above over the time interval [0, 256] at 1024 points, i.e. sampling the solution at 4 Hz. I then took the FFT and multiplied it by its conjugate to get the power spectrum. Then I took the log base 10 and multiplied by 10 to convert to decibels.

By contrast, if we look at the linear equation

x ″ + 0.25x ′ + x = 0.3 cos t

and compute the power spectrum, we get

As is often the case, a small change to the form of a differential equation made a huge change in its behavior.

There’s a spike at 1/2π Hz because the steady-state solution is

x(t) = 1.2 sin(t).

The power spectrum is more than just a spike because there is also an exponentially decaying transient component to the solution.

For more on steady-state and transient components of the solution, see Damped, driven oscillations.

Related posts

Fourier, Gauss, and Heisenberg

Several weeks ago I wrote about the Fourier uncertainty principle which gives a lower bound on the product of the variance of a function f and the variance of its Fourier transform. This post expands on the earlier post by quoting some results from a recent paper [1].

Gaussian density

The earlier post said that the inequality in the Fourier uncertainty principle is exact when f is proportional to a Gaussian probability density. G. H. Hardy proved this result in 1933 in the form of the following theorem.

Let f be a square-integrable function on the real line and assume f and its Fourier transform satisfy the following bounds

\begin{align*} |f(x)| \leq& \,C \exp(-a|x|^2) \\ |\hat{f}(\xi)| \leq& \,C \exp(-b|\xi|^2\,) \\ \end{align*}

for some constant C. Then if ab > 1/4, then f = 0. And if ab = 1/4, f(x) = c exp(-ax²) for some constant c.

Let’s translate this into probability terms by setting

\begin{align*} a =& \,\frac{1}{2\sigma^2} \\ b =& \,\frac{1}{2\tau^2} \end{align*}

Now Hardy’s theorem says that if f is bounded by a multiple of a Gaussian density with variance σ² and its Fourier transform is bounded by a multiple of a Gaussian density with variance τ², then the product of the two variances is no greater than 1. And if the product of the variances equals 1, then f is a multiple of a Gaussian density with variance σ².

Heisenberg uncertainty

Theorem 3 in [1] says that if u(tx) is a solution to the free Schrödinger’s equation

\partial_t u = i \Delta u

then u at different points in time satisfies a theorem similar to Hardy’s theorem. In fact, the authors show that this theorem is equivalent to Hardy’s theorem.

Specifically, if u is a sufficiently smooth solution and

\begin{align*} |u(0,x)| \leq& \,C \exp(-\alpha|x|^2) \\ |u(T,x)| \leq& \,C \exp(-\beta|x|^2) \\ \end{align*}

then αβ > (4T)-2 implies u(t, x) = 0, and αβ = (4T)-2 implies

u(t,x) = c \exp(-(\alpha + i/(4T))|x|^2)

Related posts

[1] Aingeru Fernández-Bertolin and Eugenia Malinnikova. Dynamical versions of Hardy’s uncertainty principle: A survey. Bulletin of the American Mathematical Society. DOI: https://doi.org/10.1090/bull/1729

Computing Fourier series coefficients with the FFT

The Discrete Fourier Transform (DFT) is a mathematical function, and the Fast Fourier Transform (FFT) is an algorithm for computing that function. Since the DFT is almost always computed via the FFT, the distinction between the two is sometimes lost.

It is often not necessary to distinguish between the two. In my previous post, I used the two terms interchangeably because the distinction didn’t matter in that context.

Here I will make a distinction between the DFT and the FFT; I’ll use DFT to refer to the DFT as it is defined in [1], and FFT for the DFT as computed in NumPy‘s FFT routine. The differences are due to varying conventions; this is often an issue.

Suppose we have a function f on an interval [-A/2, A/2] and we sample f at N points where N is an even integer.

Define

 \begin{align*} \Delta x &= A/N \\ x_n &= n \Delta x \\ f_n &= f(x_n) \end{align*}

for n running from –N/2 + 1 to N/2.

DFT of the sequence {fn} is defined in [1] as the sequence {Fk} where

F_k = \frac{1}{N}\sum_{n = -N/2 + 1}^{N/2} f_n \exp(-2 \pi i n k/N)

Now suppose the function f that we sampled has a Fourier series

f(x) = \sum_{k = -\infty}^\infty c_k \exp(2 \pi i k x/A)

The Fourier coefficients ck relate to the DFT output Fk according to the Discrete Poisson Summation Formula [1]:

F_k = c_k + \sum_{j=-\infty}^\infty \left(c_{k+jN} - c_{k-jN}\right)

This means that the ck simply equal the Fk if f has no frequency components higher than N/2 Hz because all the terms in the infinite sum above are zero. That is, if f is band-limited and we have sampled at a frequency higher than the Nyquist frequency, then we can simply read the Fourier coefficients off the DFT.

However, when f is not band-limited, or when it is band-limited but we have not sampled it finely enough, we will have aliasing. Our Fourier coefficients will differ from our DFT output by an error term involving high-frequency Fourier series coefficients.

In application it’s usually too much to hope that your function is exactly band-limited, but it may be approximately band-limited. The Fourier coefficients of smooth functions eventually decay rapidly (see details here) and so the error in approximating Fourier series coefficients by DFT terms is small.

Computing a DFT with the FFT

We defined the DFT of the sequence {fn} above to be the sequence {Fk} where

F_k = \frac{1}{N}\sum_{n = -N/2 + 1}^{N/2} f_n \exp(-2 \pi i n k/N)

and k runs from –N/2 + 1 to N/2.

NumPy, on the other hand, defines the DFT of the sequence {an} to be the sequence {Ak} where

A_k = \sum_{n = 0}^{N-1} a_n \exp(-2 \pi i n k/N)

and k runs from 0 to N-1.

Relative to the definition in the previous post, the NumPy definition difference in three ways:

  1. The normalization factor 1/N is missing.
  2. The input indices are numbered differently.
  3. The output is arranged differently.

The first difference is trivial to overcome: simply divide the FFT output by N.

The second difference is easy to deal with if we think of the inputs to the FFT being samples from a periodic function, which they usually are. The fk come from sampling a periodic function f over an interval [-A/2, A/2]. If we sample the same function over [0, A] we get the an. We have

\begin{align*} f(0) &= f_0 = a_0 \\ f(A/N) &= f_1 = a_1 \\ f(2A/N) &= f_2 = a_2 \\ \cdots \\ f(A) &= f_{N/2} = a_{N/2} \end{align*}

If we extend the fs and the as periodically past their original ranges of definition, then they all agree. But since we start our sampling in difference places, our samples would be listed in different orders if we stored them by increasing index.

Something similar occurs with the output.

For 0 ≤ kN/2,

Fk = Ak.

But for N < k < N,

Fk = ANk.

Example

We’ll use a band-limited function in our example so that we find the Fourier coefficients exactly.

f(x) = 7 cos(6πx) – 5 sin(4πx)

We compute the FFT as follows.

    import numpy as np

    def f(x):
        return 7*np.sin(3*2*np.pi*x) - 5*np.cos(2*2*np.pi*x)

    N = 8
    delta = 1/N
    x = [f(n*delta) for n in range(N)]
    print( np.fft.fft(x)/N )

The output is

[0, 0, -2.5, -3.5i, 0, 3.5i, -2.5, 0]

This says F2 and F-2 = 5/2, and so the coefficient of cos(2·2πx) is 5.

Also F3 = -7/2 and F-3 = 7/2, so the coefficient of cos(3·2πx) is 7. These results follow from Euler’s equation

exp(iθ) = cos(θ) + i sin(θ)

[1] William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.

Support of a signal and its FFT

The previous post looked at the Fourier uncertainty principle. This post looks at an analogous result for the discrete Fourier transform.

The uncertainty principle for the (continuous) Fourier transform says a signal cannot be localized in both the time domain and the frequency domain. The more bunched up a function is, the more spread out its transform is.

The discrete analog of the Fourier uncertainty principle given in [1] says something similar, that a signal and its discrete Fourier transform cannot both be concentrated. The difference is in how concentration is measured.

In the previous post, concentration was measured in terms of variance, very similar to how variance is defined for random variables. But in the discrete case, we do something simpler: we count how many terms are non-zero.

The support of a function is the set of points where the function is not zero. A highly concentrated function is one with small support. The discrete Fourier uncertainty principle says that a non-zero signal and its Fourier transform cannot both have small support.

Specifically, let f be a sequence of length N and let F be its discrete Fourier transform. Then

| support of f | × | support of F | ≥ N.

where | S | is the number of elements in a set S.

As with the continuous case, we have a lower bound on the product of the uncertainty of something and its Fourier transform. The discrete form is simpler in that the lower bound does not depend on one’s convention for defining the Fourier transform. As with the continuous case, there are many conventions for defining the DFT. But these variations in convention don’t effect counting zeros.

Examples

As pointed out in [1], you can construct examples where the inequality above holds exactly. These sequences are evenly spaced constants. If N = pq then the FFT of a sequence of 1s every p places will have non-zero elements every q places.

For example,

    >>> import numpy as np
    >>> np.fft.fft([1,0,0,1,0,0])
    array([2, 0, 2, 0, 2, 0])

In this instance, the original sequence has support of size 2, its transform has support of size 3, and the product of the support sizes is their length 6.

Note that multiplying a Python list (not a NumPy array) by an integer concatenates the list to itself that many times. The example above could be read as saying the FFT of

    [1, 0, 0]*2

is

    [2, 0]*3

.

As another example, we can show that the FFT of

    [1, 0, 0, 0, 0]*3

is

    [3, 0, 0]*5

.

The code below is complicated a bit by NumPy bookkeeping, but essentially it shows that the FFT of [1,0,0,0,0]*3 is [3,0,0]*5.

    >>> x = np.fft.fft([1, 0, 0, 0, 0]*3)
    >>> y = 3*np.array([1, 0, 0]*5, dtype=complex)
    >>> np.all(x == y)
    True

The lower bound on the product of support sizes is tight for multiples and rotations of sequences like those above.

For example,

    >>> np.fft.fft([0, 7, 0, 0, 7, 0])
    array([14. +0.j        ,  0. +0.j        , -7.-12.12435565j,
            0. +0.j        , -7.+12.12435565j,  0. +0.j        ])

Here we took the first example, changed the 1s to 7s, and rotated everything one position. The FFT is a little more complicated in this case, but it’s still the case that the input has support of size 2 and the output has support of size 3.

The only sequences for which the discrete uncertainty principle lower bound is tight are those like we’ve seen above: multiples and rotations of evenly spaced 1s. For any other sequences the inequality above is strict. For example, here’s what happens if our 1’s are not evenly spaced.

    >>> np.fft.fft([1, 0, 0, 0, 1, 0])
    array([2. +0.00000000e+00j, 0.5+8.66025404e-01j, 0.5-8.66025404e-01j,
           2. -5.55111512e-17j, 0.5+8.66025404e-01j, 0.5-8.66025404e-01j])

Now the input has support 2, the output has support 6.

Related posts

[1] David L. Donoho and Philip B. Stark. Uncertainty Principles and Signal Recovery. SIAM Journal on Applied Mathematics, June 1989, Vol. 49, No. 3, pp. 906-931

Fourier uncertainty principle

Heisenberg’s uncertainty principle says there is a limit to how well you can know both the position and momentum of anything at the same time.  The product of the uncertainties in the two quantities has a lower bound.

There is a closely related principle in Fourier analysis that says a function and its Fourier transform cannot both be localized. The more concentrated a signal is in the time domain, the more spread out it is in the frequency domain.

There are many ways to quantify how localized or spread out a function is, and corresponding uncertainty theorems. This post will look at the form closest to the physical uncertainty principle of Heisenberg, measuring the uncertainty of a function in terms of its variance. The Fourier uncertainty principle gives a lower bound on the product of the variance of a function and the variance of its Fourier transform.

Variance

When you read “variance” above you might immediately thing of variance as in the variance of a random variable. Variance in Fourier analysis is related to variance in probability, but there’s a twist.

If f is a real-valued function of a real variable, its variance is defined to be

V(f) = \int_{-\infty}^\infty x^2 \, f(x)^2 \, dx

This is the variance of a random variable with mean 0 and probability density |f(x)|². The twist alluded to above is that f is not a probability density, but |f(x)|² is.

Since we said f is a real-valued function, we could leave out the absolute value signs and speak of f(x)² being a probability density. In quantum mechanics applications, however, f is complex-valued and |f(x)|² is a probability density. In other words, we multiply f by its complex conjugate, not by itself.

The Fourier variance defined above applies to any f for which the integral converges. It is not limited to the case of |f(x)|² being a probability density, i.e. when |f(x)|² integrates to 1.

Uncertainty principle

The Fourier uncertainty principle is the inequality

V(f) \, \, V(\hat{f}) \geq C\, ||f||_2^2 \,\,||\hat{f}||_2^2

where the constant C depends on your convention for defining the Fourier transform [1]. Here ||f||2² is the integral of f², the square of the L² norm.

Perhaps a better way to write the inequality is

\frac{V(f)}{||\phantom{\hat{f}}\!\!f\,||_2^2} \; \frac{V(\hat{f})}{||\phantom{\hat{.}}\hat{f}\,\,||_2^2} \geq C

for non-zero f. Rather than look at the variances per se, we look at the variances relative to the size of f. This form is scale invariant: if we multiply f by a constant, the numerators and denominators are multiplied by that constant squared.

The inequality is exact when f is proportional to a Gaussian probability density. And in this case the uncertainty is easy to interpret. If f is proportional to the density of a normal distribution with standard deviation σ, then its Fourier transform is proportional to the density of a normal distribution with standard deviation 1/σ, if you use the radian convention described in [1].

Example

We will evaluate both sides of the Fourier uncertainty principle with

    h[x_] := 1/(x^4 + 1)

and its Fourier transform

    g[w_] := FourierTransform[h[x], x, w]

We compute the variances and the squared norms with

    v0 = Integrate[x^2 h[x]^2, {x, -Infinity, Infinity}] 
    v1 = Integrate[w^2 g[w]^2, {w, -Infinity, Infinity}]
    n0 = Integrate[    h[x]^2, {x, -Infinity, Infinity}]
    n1 = Integrate[    g[w]^2, {w, -Infinity, Infinity}]

The results in mathematical notation are

\begin{align*} V(h) &= \frac{\pi}{4 \sqrt{2}} \\ V(\hat{h}) &= \frac{5\pi}{8\sqrt{2}} \\ || h ||_2^2 &= \frac{3\pi}{4 \sqrt{2}} \\ ||\hat{h}||_2^2 &= \frac{3\pi}{4 \sqrt{2}} \end{align*}

From here we can calculate that the ratio of the left side to the right side of the uncertainty principle is 5/18, which is larger than the lower bound C = 1/4.

By the way, it’s not a coincidence that h and its Fourier transform have the same norm. That’s always the case. Or rather, that is always the case with the Fourier convention we are using here. In general, the L² norm of the Fourier transform is proportional to the L² norm of the function, where the proportionality constant depends on your definition of Fourier transform but not on the function.

Here is a page that lists the basic theorems of Fourier transforms under a variety of conventions.

Related posts

[1] In the notation of the previous post C = 1/(4b²). That is, C = 1/4 if your definition has a term exp(± i ω t) and C = 1/16π² if your definition has a exp(±2 π i ω t) term.

Said another way, if you express frequency in radians, as is common in pure math, C = 1/4. But if you express frequency in Hertz, as is common in signal processing, C = 1/16π².

Fourier transform definitions also have varying constants outside the integral, e.g. possibly dividing by √2π, but this factor effects both sides of the Fourier uncertainty principle equally.

Fourier transforms in Mathematica

Unfortunately there are many slightly different ways to define the Fourier transform. So the first two questions when using Mathematica (or any other software) to compute Fourier transforms are what definition of Fourier transform does it use, and what to do if you want to use a different definition.

The answer to the first question is that Mathematica defines the Fourier transform of f as

\hat{f}(\omega) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \exp(i\omega x)\, f(x) \, dx

The answer to the second question is that Mathematica defines a parameterized Fourier transform by

\hat{f}(\omega) = \sqrt{\frac{|b|}{(2\pi)^{1-a}}} \, \int_{-\infty}^\infty \exp(ib\omega x)\, f(x) \, dx

where a defaults to 0 and b defaults to 1.

Examples

For example, if φ(x) = exp(-x²/2), then we can compute Mathematica’s default Fourier transform with

    φ[x_] := Exp[-x^2/2]
    FourierTransform[φ[x], x, ω]

This computes

 \hat{\varphi}(\omega) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \exp(i\omega x)\, \varphi(x) \, dx = \exp(-\omega^2/2)

But if we set (a, b) to (0, 2π) with

    FourierTransform[φ[x], x, ω, FourierParameters -> {0, 2 Pi}]

we compute

\hat{\varphi}(\omega) = \int_{-\infty}^\infty \exp(2\pi i\omega x)\, \varphi(x) \, dx = \sqrt{2\pi} \exp(-2\pi^2\omega^2/2)

Theorems under various conventions

Several years ago I got frustrated enough with the multitude of Fourier transform conventions that I made a sort of Rosetta stone, giving several of the most basic theorems under each of eight definitions. I used the parameterization

\hat{f}(\omega) = \frac{1}{\sqrt{m}} \int_{-\infty}^\infty \exp(\sigma q i\omega x)\, f(x) \, dx

where m is either 1 or 2π, σ is either 1 or -1, and q is either 1 or 2π.

My σ and q are the sign and absolute value of Mathematica’s b. The relation between my m and Mathematica’s a is slightly more complicated and given in the table below.

\begin{tabular}{lllll} \hline \(\sigma\) & m & q & a & b\\ \hline \textpm{} 1 & 1 & 1 & 1 & \textpm{} 1\\ \textpm{} 1 & 1 & 2\(\pi\) & 0 & \textpm{} 2\(\pi\)\\ \textpm{} 1 & 2\(\pi\) & 1 & 0 & \textpm{} 1\\ \textpm{} 1 & 2\(\pi\) & 2\(\pi\) & -1 & \textpm{} 2\(\pi\)\\ \hline \end{tabular}