The Clausen function

I ran across the Clausen function the other day, and when I saw a plot of the function my first thought was that it looks sorta like a sawtooth wave.

Plot of Clausen function Cl_2

I wondered whether it also sounds like a sawtooth wave, and indeed it does. More on that shortly.

The Clausen function can be defined in terms of its Fourier series:

\text{Cl}_2(x) = \sum_{k=1}^\infty \frac{\sin(kx)}{k^2}

The function commonly known as the Clausen function is one of a family of functions, hence the subscript 2. The Clausen functions for all non-negative integers n are defined by replacing 2 with n on both sides of the defining equation.

The Fourier coefficients decay quadratically, as do those of a triangle wave or sawtooth wave, as discussed here. This implies the function Cl2(x) cannot have a continuous derivative. In fact, the derivative of Cl2(x) is infinite at 0. This follows quickly from the integral representation of the function.

\text{Cl}_2(x)=-\int_0^x\log \left|2\sin\frac{t}{2} \right|\, dt

The fundamental theorem of calculus shows that the derivative

\text{Cl}'_2(x)=-\log \left|2\sin\frac{x}{2} \right|

blows up at 0.

Now suppose we create an audio clip of Cl2(440x). This creates a sound with pitch A 440, but rather than a sinewave it has an unpleasant buzzing sound, much like a sawtooth wave.

clausen2.wav

 

The harshness of the sound is due to the slow decay of the Fourier coefficients; the Fourier coefficients of more pleasant musical sounds decay much faster than quadratically.

Related posts

Connecting the FFT and quadratic reciprocity

Some readers will look at the title of this post and think “Ah yes, the FFT. I use it all the time. But what is this quadratic reciprocity?”

Others will look at the same title and think “Gauss called the quadratic reciprocity theorem the jewel in the crown of mathematics. But what is this FFT thing? I think I remember an engineer saying something about that.”

Gauss proved a theorem that relates quadratic reciprocity and the FFT. For distinct odd primes p and q, the following equation holds.

\left(\frac{p}{q}\right) \left(\frac{q}{p}\right) = \frac{\text{Tr} {\cal F}_{pq}}{ \text{Tr} {\cal F}_p\, \text{Tr} {\cal F}_q}

I’ll spend the rest of this post unpacking this equation.

Legendre symbols

The expressions on the left are not fractions but rather Legendre symbols. The parentheses are not for grouping but are part of the symbol.

The Legendre symbol

\left(\frac{a}{r}\right)

is defined to be 0 if a is a multiple of r, 1 if a has a square root mod r, and −1 otherwise.

Fourier transforms

The Discrete Fourier Transform (DFT) of a vector of length n multiplies the vector by the n by n Fourier matrix Fp whose j, k entry is equal to exp(2πi jk / n). The Fast Fourier Transform (FFT) is a way to compute the DFT more quickly than directly multiplying by the Fourier matrix. Since the DFT is nearly always computed using the FFT algorithm, the DFT is commonly referred to as the FFT.

Matrix trace

The trace of a matrix is the sum of the elements along the main diagonal. So the trace of the Fourier matrix of size n is

\text{Tr} {\cal F}_n = \sum_{j=1}^n \exp(2\pi ij^2/n)

Numerical illustration

The quadratic reciprocity theorem, also due to Gauss, is usually stated as

\left(\frac{p}{q}\right) \left(\frac{q}{p}\right) = (-1)^{\frac{p-1}{2}\frac{q-1}{2}}

We can illustrate the theorem at the top of the page numerically with the following Python code, using the quadratic reciprocity theorem to evaluate the product of the Legendre symbols.

from numpy import exp, pi

tr = lambda p: sum(exp(2j*pi*k**2/p) for k in range(1, p+1))
p, q = 59, 17
print( tr(p*q)/(tr(p)*tr(q)) )
print( (-1)**((p-1)*(q-1)/4) ) 

The first print statement produces (0.9999999999998136-1.4048176871018313e-13j) due to some loss of precision due to floating point calculations, but this is essentially 1, which is what the second print statement produces.

If we change q to 19, both statements print −1 (after rounding the first result).

Quadratic Gauss sum

We can quickly prove

\left(\frac{p}{q}\right) \left(\frac{q}{p}\right) = \frac{\text{Tr} {\cal F}_{pq}}{ \text{Tr} {\cal F}_p\, \text{Tr} {\cal F}_q}

if we assume the quadratic reciprocity theorem and the following equation for the trace of the Fourier matrix.

\text{Tr} {\cal F}_n = \sum_{j=1}^n \exp(2\pi ij^2/n) =
\left\{
	\begin{array}{ll}
		\sqrt{n}  & \mbox{if } n \equiv 1 \bmod{4} \\
		0 & \mbox{if } n \equiv 2 \bmod{4} \\
                i\sqrt{n} & \mbox{if } n \equiv 3 \bmod{4} \\
                (1+i)\sqrt{n} & \mbox{if } n \equiv 0 \bmod{4} \\
	\end{array}
\right.

This proof is historically backward. It assumes quadratic reciprocity, but Gauss proved quadratic reciprocity by first proving the equation we’re trying to prove. He then obtained the expression on the right hand side of the quadratic reciprocity theorem using the equation above for the trace of the Fourier matrix.

The trace of the Fourier matrix is now called a quadratic Gauss sum. It’s a special case of more general sums that Gauss studied, motivated by his exploration of quadratic reciprocity.

Incidentally, Gauss gave many proofs of the quadratic reciprocity theorem. I don’t know where the proof outlined hear fits into the sequence of proofs he developed.

Related posts

Eigenvectors of the DFT matrix

When is the discrete Fourier transform of a vector proportional to the original vector? And when that happens, what is the proportionality constant?

In more formal language, what can we say about the eigenvectors and eigenvalues of the DFT matrix?

Setup

I mentioned in the previous post that Mathematica’s default convention for defining the DFT has mathematical advantages. One of these is that it makes the DFT an isometry, that is, taking the DFT of a vector does not change its norm. We will use Mathematica’s convention here because that will simplify the discussion. Under this convention, the DFT matrix of size N is the square matrix whose (j, k) entry is

ωjk / √N

where ω = exp(-2π i/N) and the indices j and k run from 0 to N − 1.

Eigenvalues

Using the definition above, if you take the discrete Fourier transform of a vector four times, you end up back where you started. With other conventions, taking the DFT four times takes you to a vector that is proportional to the original vector, but not the same.

It’s easy to see what the eigenvalues of the DFT are. If transforming a vector multiplies it by λ, then λ4 = 1. So λ = ±1 or ±i. This answers the second question at the top of the post: if the DFT of a vector is proportional to the original vector, the proportionality constant must be a fourth root of 1.

Eigenvectors

The eigenvectors of the DFT, however, are not nearly so simple.

Suppose N = 4k for some k > 1 (which it nearly always is in practice). I would expect by symmetry that the eigenspaces of 1, −1, i and −i would each have dimension k, but that’s not quite right.

In [1] the authors proved that the eigenspaces associated with 1, −1, i and −i have dimension k+1, k, k−1, and k respectively.

This seems strange to me in two ways. First, I’d expect all the spaces to have the same dimension. Second, if the spaces did not have the same dimension, I’d expect 1 and −1 to differ, not i and −i. Usually when you see i and −i together like this, they’re symmetric. But the span of the eigenvectors associated with i has dimension one less than the dimension of the span of the eigenvectors associated with −i. I don’t see why this should be. I’ve downloaded [1] but haven’t read it yet.

[1] J. H. McClellan; T. W. Parks (1972). “Eigenvalues and eigenvectors of the discrete Fourier transformation”. IEEE Transactions on Audio and Electroacoustics. 20 (1): 66–74.

DFT conventions: NumPy vs Mathematica

Just as there are multiple conventions for defining the Fourier transform, there are multiple conventions for defining the discrete Fourier transform (DFT), better known as the fast Fourier transform (FFT). [1]

This post will look at two DFT conventions, one used in Python’s NumPy library, and one used in Mathematica. There are more conventions in use, but this post will just look at these two.

In some sense the differences between conventions are trivial, but trivial doesn’t mean unimportant [1]. If you don’t know that there are multiple conventions, you could be quite puzzled when the output of a FFT doesn’t match your expectations.

NumPy definition

NumPy’s fft and related functions define the discrete Fourier transform of a sequence a0, a1, …, aN−1 to be the sequence A0, A1, …, AN−1 given by

A_k = \sum_{m=0}^{N-1} a_m \exp(-2\pi i mk/N)

Mathematica definition

Mathematica’s Fourier function defines the discrete Fourier transform of a sequence u1, u2, …, uN to be the sequence v1, v2, …, vN given by

v_k = \frac{1}{\sqrt{N}} \sum_{m=1}^{N} u_m \exp\big(-2\pi i (m-1)(k-1)/N\big)

This is the default definition in Mathematica, but not the only possibility. More on that below in the discussion of compatibility.

Motivation

Python arrays are indexed from 0 while Mathematica arrays are indexed starting from 1. This is why the inputs and outputs are numbered as they are.

Subtracting 1 from the m and k indices makes the two definitions visually less similar, but the terms in the two summations are the same. The only difference between the two implementations is the scaling factor in front of the sum.

Why does Mathematica divide the sum by √N while NumPy does not? As is often the case when there are differing conventions for defining the same thing, the differences are a result of which theorems you want to simplify. Mathematica complicates the definition of the DFT slightly, but in exchange makes the DFT and its inverse more symmetric.

The choice of scaling factor is consistent with the user bases of the two languages. Python skews more toward engineering and applied math, while Mathematica skews more toward pure math. In light of this, the choices made by Python and Mathematica seem inevitable.

Compatibility

Like Mathematica’s continuous Fourier transform function FourierTransform, its discrete Fourier transform function Fourier takes an optional FourierParameters argument for compatibility with other conventions. Setting the a parameter to 1 eliminates the √N term and produces a result consistent with NumPy.

There are more variations in DFT definitions. For example, some definitions of the DFT do not have a negative sign inside the exponential. Mathematica can accommodate this by setting b to −1 in thel FourierParameters argument. There are other possibilities too. In some implementations, for example, the 0 frequency DC term is in the middle rather than at the beginning.

[1] The FFT is an algorithm for computing the DFT, but the transform itself is often called the FFT.

[2] In classical education, the trivium consisted of grammar, logic, and rhetoric. The original meaning of “trivial” is closer to “foundational” than to “insignificant.”

DFT mandalas

Math books often use some illustration from the book contents as cover art. When they do, there’s often some mystery to the cover art, and a sense of accomplishment when you get far enough into the book to understand the significance of the cover. (See examples here.)

William L. Briggs and Van Emden Henson wrote such a book, The DFT: An Owner’s Manual for the Discrete Fourier Transform. The cover features the following image.

At first glance, this image might look like a complete graph, one with an edge from every node along the circle to every other node. But that’s not it. For one thing, there’s only one line that goes through the center of the circle. And when you look closer it’s not as symmetric as it may have seemed at first.

Here’s the same image plotted in blue, except this time I reduced the alpha channel of the lines. Making the lines less opaque makes it possible to see that some lines are drawn more often than others, the darker lines being the ones that have been traced more than once.

What does this drawing represent? The explanation is given in the last exercise at the end of the first chapter.

The cover and frontmatter of this book display several polygonal, mandala-shaped figures which were generated using the DFT.

The exercise goes into some detail and invites the reader to reproduce versions of the cover figure with N = 4 or N = 8 nodes around the circle. For n from 1 to N, take the DFT (discrete Fourier transform) of the nth standard basis vector en and draw lines connecting the components of the DFT. These components are

Fk = exp(-2πink/N) / N

for k = 1 to N.

The cover also has smaller figures which correspond to the same sort of image for other values of N. For example, here is the figure for N = 8.

Related posts

Circulant matrices, eigenvectors, and the FFT

A circulant matrix is a square matrix in which each row is a rotation of the previous row. This post will illustrate a connection between circulant matrices and the FFT (Fast Fourier Transform).

Circulant matrices

Color in the first row however you want. Then move the last element to the front to make the next row. Repeat this process until the matrix is full.

The NumPy function roll will do this rotation for us. Its first argument is the row to rotate, and the second argument is the number of rotations to do. So the following Python code generates a random circulant matrix of size N.

    import numpy as np

    np.random.seed(20230512)
    N = 5
    row = np.random.random(N)
    M = np.array([np.roll(row, i) for i in range(N)])

Here’s the matrix M with entries truncated to 3 decimal places to make it easier to read.

    [[0.556 0.440 0.083 0.460 0.909]
     [0.909 0.556 0.440 0.083 0.460]
     [0.460 0.909 0.556 0.440 0.083]
     [0.083 0.460 0.909 0.556 0.440]
     [0.440 0.083 0.460 0.909 0.556]]    

Fast Fourier Transform

The Fast Fourier Transform is a linear transformation and so it can be represented by a matrix [1]. This the N by N matrix whose (j, k) entry is ωjk where ω = exp(-2πi/N), with j and k running from 0 to N – 1. [2]

Each element of the FFT matrix corresponds to a rotation, so you could visualize the matrix using clocks in each entry or by a cycle of colors. A few years ago I created a visualization using both clock faces and colors:

Eigenvectors and Python code

Here’s a surprising property of circulant matrices: the eigenvectors of a circulant matrix depend only on the size of the matrix, not on the elements of the matrix. Furthermore, these eigenvectors are the columns of the FFT matrix. The eigenvalues depend on the matrix entries, but the eigenvectors do not.

Said another way, when you multiply a circulant matrix by a column of the FFT matrix of the same size, this column will be stretched but not rotated. The amount of stretching depends on the particular circulant matrix.

Here is Python code to illustrate this.

    for i in range(N):
        ω = np.exp(-2j*np.pi/N)
        col1 = np.array([ω**(i*j) for j in range(N)])
        col2 = np.matmul(M, col1)
        print(col1/col2)

In this code col1 is a column of the FFT matrix, and col2 is the image of the column when multiplied by the circulant matrix M. The print statement shows that the ratios of each elements are the same in each position. This ratio is the eigenvalue associated with each eigenvector. If you were to generate a new random circulant matrix, the ratios would change, but the input and output vectors would still be proportional.

Notes

[1] Technically this is the discrete Fourier transform (DFT). The FFT is an algorithm for computing the DFT. Because the DFT is always computed using the FFT, the transformation itself is usually referred to as the FFT.

[2] Conventions vary, so you may see the FFT matrix written differently.

Fixed points of the Fourier transform

This previous post looked at the hyperbolic secant distribution. This distribution has density

f_H(x) = \frac{1}{2} \text{sech} \left(\frac{\pi x}{2} \right)

and characteristic function sech(t). It’s curious that the density and characteristic function are so similar.

The characteristic function is essentially the Fourier transform of the density function, so this says that the hyperbolic secant function, properly scaled, is a fixed point of the Fourier transform. I’ve long known that the normal density is its own Fourier transform, but only recently learned that the same is true of the hyperbolic secant.

Hermite functions

The Hermite functions are also fixed points of the Fourier transform, or rather eigenfuctions of the Fourier transform. The eigenvalues are 1, i, -1, and i. When the eigenvalues are 1, we have fixed points.

There are two conventions for defining the Hermite functions, and multiple conventions for defining the Fourier transform, so the truth of the preceding paragraph depends on the conventions used.

For this post, we will define the Fourier transform of a function f to be

\hat{f}(\omega) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \exp(-i \omega x)\, f(x)\, dx

Then the Fourier transform of exp(-x²/2) is the same function. Since the Fourier transform is linear, this means the same holds for the density of the standard normal distribution.

We will define the Hermite polynomials by

H_n(x) = (-1)^n e^{x^2}\frac{d^n}{dx^n}e^{-x^2}

using the so-called physics convention. Hn is an nth degree polynomial.

The Hermite functions ψn(x) are the Hermite polynomials multiplied by exp(-x²/2). That is,

\psi_n(x) = H_n(x) \exp(-x^2/2)

With these definitions, the Fourier transform of ψn(x) equals (-i)n ψn(x). So when n is a multiple of 4, the Fourier transform of ψn(x) is ψn(x).

[The definition Hermite functions above omits a complicated constant term that depends on n but not on x. So our Hermite functions are proportional to the standard Hermite functions. But proportionality constants don’t matter when you’re looking for eigenfunctions or fixed points.]

Hyperbolic secant

Using the definition of Fourier transform above, the function sech(√(π/2) x) is its own Fourier transform.

This is surprising because the Hermite functions form a basis for L²(ℝ), and all have tails on the order of exp(-x²), but the hyperbolic secant has tails like exp(-x). Each Hermite function eventually decays like exp(-x²), but this happens later as n increases, so an infinite sum of Hermite functions can have thicker tails than any particular Hermite function.

Related posts

Hilbert transform and Fourier series

A few days ago I wrote about the Hilbert transform and gave as an example that the Hilbert transform of sine is cosine. We’ll bootstrap that example to find the Hilbert transform of any periodic function from its Fourier series.

The Hilbert transform of a function f(t) is a function fH(x) defined by

f_H(x) = \frac{1}{\pi} \int_{-\infty}^\infty \frac{f(t)}{t - x}\, dt

where the integral is interpreted in the sense of the Cauchy principal value, the limit as the singularity is approach symmetrically from both sides.

The Hilbert transform shifts and scales conveniently. Shifting a function by any amount h shifts its transform by the same amount. And scaling a function by any amount k > 0 scales its transform the same way. That is, we have the following transform pairs.

\begin{align*} f(t) &\leftrightarrow f_H(x) \\ f(t - h) &\leftrightarrow f_H(x - h) \\ f(kt) &\leftrightarrow f_H(kx) \\ \end{align*}

Now since the Hilbert transform of sin(t) is cos(x), the Hilbert transform of sin(t + π/2) must be cos(x + π/2). But sin(t + π/2) is cos(t), and cos(x + π/2) is sin(x), so the Hilbert transform of cos(t) is −sin(x). In this case, the Hilbert transform has the same pattern as differentiation.

Now if ω > 0 the scaling rule tells us the Hilbert transform of sin(ωt) is cos(ωx) and the Hilbert transform of cos(ωx) is −sin(ωx). Here the analogy with differentiation breaks down because differentiation would bring out a factor of ω from the chain rule [1].

Putting these facts together, if we have a function f written in terms of a Fourier series

f(t) = \sum_{n=1}^\infty \left\{ a_n \sin(nt) + b_n\cos(nt) \right\}

then its Hilbert transform is

f_H(x) = \sum_{n=1}^\infty \left\{ -b_n \sin(nx) + a_n\cos(nx) \right\}

In other words, we replace the b‘s with a‘s and the a‘s with −b‘s. [2]

Notice that there’s no b0 term above. In signal processing terminology, there’s no DC offset. In general a Fourier series has a constant term, and the Hilbert transform of a constant is 0. So again like differentiation, constants go away.

If there is no DC offset, then applying the Hilbert transform to f twice gives −f. If there is a DC offset, applying the Hilbert transform to f twice gives −f with the DC offset removed.

Opposite sign convention

Unfortunately there are two definitions of the Hilbert transform in common use: the one at the top of this post and its negative. What changes if we use the other convention?

We noted above that applying the Hilbert transform to f twice gives −f. This means that the inverse transform is the negative transform [3]. In symbols, if H is the Hilbert transform operator, then −H²f = −Hf and so H−1 = −H. So the disagreement over whether to include a negative sign in the definition of the Hilbert transform amounts to a disagreement over which to call the forward transform and which to call the inverse transform.

The shifting and scaling rules apply to both definitions of the transform. But with the opposite sign convention, the Hilbert transform of sine is negative cosine and the Hilbert transform of cosine is sine. So our bottom line becomes “replace the a‘s with b‘s and the b‘s with −a‘s” in the Fourier series.

Footnotes

[1] Incidentally, the Hilbert transform commutes with differentiation. That is, the transform of the derivative is the derivative of the transform.

[2] This is an example of parallel replacement. We replace an, with –bn and bn, with an at the same time.

[3] For signals with no DC offset. Otherwise the Hilbert transform is not invertible.

Reverse engineering Fourier conventions

The most annoying thing about Fourier analysis is that there are many slightly different definitions of the Fourier transform. One time I got sufficiently annoyed that I creates a sort of Rosetta Stone of Fourier theorems under eight different conventions. Later I discovered that Mathematica supports these same eight definitions, but with slightly different notation.

When I created my Rosetta Stone I wanted to have a set of notes that answered the question “What are the basic Fourier theorems under this convention?” Recently I was reading a reference and wanted to answer the opposite question “Given the theorems this book is stating, what convention must they be using?”

The eight definitions correspond to

\hat{f}(\omega) = \frac{1}{\sqrt{m}} \int_{-\infty}^\infty \exp(\sigma q i \omega x) f(x) \,dx

where m is either 1 or 2π, σ is +1 or −1, and q is 2π or 1.

I’m posting these notes for my future reference and for anyone else who may need to do the same sleuthing.

Notation

For the rest of the post, let F and G be the Fourier transforms of f and g respectively. We write

f(x) \leftrightarrow F(\omega)

for the pair of a function and its Fourier transform.

Define the inner product of f and g as

\langle f, g \rangle = \int_{-\infty}^\infty f(x)\, g(x) \, dx

if f and g are real-value. If the functions are complex-valued, replace g with the complex conjugate if g.

We will sometimes denote the Fourier transform of a function by putting a hat on top of it.

Convolution theorem

The convolution theorem gives a quick way to determine the parameter m. Suppose convolution is defined by

(f*g)(x) = \int_{-\infty}^\infty f(x-y)\, g(y) \, dy

Then

f(x)*g(x) \leftrightarrow \sqrt{m} F(\omega)*G(\omega)

and so you can find m immediately. If f*g = F*G with no extra factor out front, m = 1. Otherwise if there’s a factor of √2π out front, then m = 2π. If there’s any other factor, you’ve got an arcane definition of Fourier transform that isn’t one of the eight considered here.

Some authors, like Walter Rudin, include a scaling factor in the definition of convolution, in which casethe argument of this section doesn’t hold.

Parseval and Plancherel

Parseval’s theorem says that the inner product of f and g is proportional to the inner product of F and G. The proportionality constant depends on the definition of Fourier transform, specifically on m and q, and so you can determine m or q from the form of Parseval’s theorem.

\langle f, g \rangle = k \langle F, G \rangle

If k = 1, then either q = 2π and m = 1 or q = 1 and m = 2π. If you know m, say from the statement of the convolution theorem, then Parseval’s theorem tells you q.

Plancherel’s theorem is the special case of Parseval’s theorem with f = g. It can be used the same way to solve for m or q.

If k = 2π, then q = m = 1. If k= 1/2π, then q = m = 1.

Double transform

Theorems about taking the Fourier transform twice carry the same information as Parseval’s and Plancherel’s theorems, i.e. they also let you determine m or q.

We have

\hat{\hat{f}}(x) = k f(-x)

with the same conclusions based on k as above:

  • If k = 1, then either q = 2π and m = 1 or q = 1 and m = 2π.
  • If k = 2π, then q = m = 1.
  • If k= 1/2π, then q = m = 1.

Shift and differentiation

So far we none of our theorems have allowed us to reverse engineer σ. Either the shift or differentiation theorem will let is find σq.

The shift theorem says

f(x-h) \leftrightarrow \exp(ikh\omega) F(\omega)

where k = σq. Since σ = ±1 and q = 1 or 2π, the product σq determines both σ and q.

Similarly, the differentiation theorem says the Fourier transform of the derivative of f transforms as

f'(x) \leftrightarrow ik\omega F(\omega)

and again k = σq.

Chaos in the frequency domain

Solutions to the non-linear differential equation

x ″ + 0.25x ′ + x(x² − 1) = 0.3 cos t

are chaotic. It’s more common to see plots of chaotic systems in the time domain, so I wanted to write a post looking at the power spectrum in the frequency domain.

The following plot was created by solving the equation above over the time interval [0, 256] at 1024 points, i.e. sampling the solution at 4 Hz. I then took the FFT and multiplied it by its conjugate to get the power spectrum. Then I took the log base 10 and multiplied by 10 to convert to decibels.

By contrast, if we look at the linear equation

x ″ + 0.25x ′ + x = 0.3 cos t

and compute the power spectrum, we get

As is often the case, a small change to the form of a differential equation made a huge change in its behavior.

There’s a spike at 1/2π Hz because the steady-state solution is

x(t) = 1.2 sin(t).

The power spectrum is more than just a spike because there is also an exponentially decaying transient component to the solution.

For more on steady-state and transient components of the solution, see Damped, driven oscillations.

Related posts