Numerically finding roots of a cubic

The analog of the quadratic formula for cubic equations is cumbersome. A lot of people naturally say “Forget all that. If I need to find the roots of a cubic, I’ll just use a numerical method like Newton’s method.” Sounds good.

Where to start?

But how do you know where to look for the roots? You have to have some idea where a root is before you can use a numerical method.

For the bisection method, you have to know an interval where the polynomial changes sign. For Newton’s method, if you don’t start close enough, you diverge. In fact, you can make complex fractals by plotting the points where Newton’s method converges and diverges.

Cornelius Lanczos gives an elegant explanation of how to solve cubic equations at the beginning of his book Applied Analysis. His book was published in 1956, before the days of ubiquitous computers, and so he couldn’t say “Plot the function and scroll around until you can see roughly where the zeros are.”

Lanczos does a sequence of change of variables that reduce the problem to finding a unique root in the unit interval.



f(x) = ax³ + bx² + cx + d

where the coefficients are all real numbers. We want to find all real values of x where f(x) = 0.

The main task is to find a root of f(x). Once we’ve found a root r, we can divide by (xr) to reduce the problem to a quadratic.

Reduce to standard form

By a change of variables we can reduce the problem to finding a root of

x³ + bx² + cx  – 1 = 0.

How? First of all, you can always make the leading coefficient 1 by dividing by a if necessary. Second, you can make the constant term negative by replacing x by a new variable equal to –x. Third, you can make the constant term -1 by replacing x with d-1/3x.

We’ve divided by a and d above. How do we know they’re not zero? If they were zero, that would be good news because it would make things easier.

We assume a is not zero because otherwise we don’t have a cubic equation. And we assume d is not zero because otherwise we can factor out an x and we’re left with a quadratic equation.

Bracketing a root

After our changes of variable,

f(x) = x³ + bx² + cx  – 1.

So f(0) = -1 and f(x) goes to infinity as x goes to infinity, and this means f(x) must have at least one positive root.

Evaluate f(1). If f(1) = 0 then you’re very lucky because you’ve found your root.

If f(1) > 0 then there is a root in (0, 1) since the polynomial changes signs over the interval.

If f(1) < 0 then the function has a root in (1,∞) . In that case, replace x with 1/x and then the new polynomial has a root somewhere in (0, 1).


If f(x) equals zero somewhere in the interval (0, 1), could it be zero at two or three places in the interval?

It cannot have just two roots in the interval because the sign is different at the two ends of the interval.

It also cannot have three roots in the interval. We know from Vieta’s formula that the product of the roots of f equals 1. It cannot have three roots in the interval (0, 1) because the product of three positive numbers less than 1 is less than 1.

If f(x) had a zero somewhere in (1,∞) before replacing x with 1/x, an analogous argument applies. It can’t have two zeros because it changes signs over the interval, and it can’t have three zeros in the interval because if three numbers are each greater than 1, their product is greater than 1.

In either case, possibly after replacing x with 1/x, we know f has a unique root in the interval (0, 1), and now we can apply a numerical method.

More general functions

This post looked at a very special function, a cubic polynomial with real coefficients, and showed that after a change of variables it has a unique root in an interval. The next post shows how to determine how many zeros a more general, complex-valued function has in a region of the complex plane.

Related posts

Numerical footnote

Yesterday’s post said that that you could construct a chain of linear relationships between the hypergeometric function

F(a, b; c; z)


F(a+i, b+j; c+k; z)

for integers i, j, and k. Toward the end of the post I said that this could be used to speed up programs by computing function values from previous values rather than from scratch. This is true, but you need to check whether there are numerical issues.

You have to be careful when using recurrence relations numerically. It’s common for a recurrence to be numerically stable in one direction but unstable in the opposite direction. I wrote about this here and gave examples.

The stability problems for recurrence relations are the discrete analog of the more familiar problem of sensitive dependence on initial conditions for differential equations described here.


Quasi-Monte Carlo integration: periodic and nonperiodic

Monte Carlo integration, or “integration by darts,” is a general method for evaluating high-dimensional integrals. Unfortunately it’s slow. This motivated the search for methods that are like Monte Carlo integration but that converge faster.

Quasi-Monte Carlo (QMC) methods use low discrepancy sequences that explore a space more efficiently than random samples. These sequences are deterministic but in some ways behave like random sequences, hence the name.

On the right problem, QMC integration can be very efficient. On other problems QMC can be mediocre. This post will give an example of each.

Computing ζ(3)

The first example comes from computing the sum

\sum_{n=1}^\infty \frac{1}{n^3}

which equals ζ(3), the Riemann zeta function evaluated at 3.

A few days ago I wrote about ways to compute ζ(3). The most direct way of computing the sum is slow. There are sophisticated ways of computing the sum that are much faster.

There is an integral representation of ζ(3), and so one way to numerically compute ζ(3) would be to numerically compute an integral.

\int_0^1\!\int_0^1\!\int_0^1 \frac{1}{1 + xyz}\, dx\, dy\, dz = \dfrac{3}{4}\, \zeta(3)

Let’s see how well quasi-Monte Carlo integration works on this integral. We will use a lattice rule, evaluating our integrand at the points of a quasi-Monte Carlo sequence

\left( \{ n\sqrt{2}\}, \{ n\sqrt{3}\}, \{ n\sqrt{5}\} \right )

and averaging the results. (Lattice sequences are a special case of quasi-Monte Carlo sequences.)

The notation {x} denotes the fractional part of x. For example, {π} = 0.14159….

Here’s some Python code to compute the integral.

    import numpy as np

    def f(x): return 1/(1 + x[0]*x[1]*x[2])

    v = np.sqrt(np.array([2,3,5])) 
    def g(N): return sum( f(n*v % 1) for n in range(1,N+1) ) / N

And here’s a plot that shows how well this does as a function of N.

The results are mediocre. The averages are converging to the right value as N increases, but the accuracy isn’t great. When N = 100 the error is 0.001, larger than the error from summing 100 terms of the sum defining ζ(3).

I want to make two points about this example before moving on to the next example.

First, this doesn’t mean that evaluating ζ(3) by computing an integral is necessarily a bad idea; a more efficient integration method might compute ζ(3) fairly efficiently.

Second, this doesn’t mean that QMC is a poor integration method, only that there are better ways to compute this particular integral. QMC is used a lot in practice, especially in financial applications.

Periodic integrands

The lattice rule above works better on periodic integrands. Of course in practice a lot of integrands are not periodic. But if you have an efficient way to evaluate periodic integrands, you can do a change of variables to turn your integration problem into one involving a periodic integrand.

We’ll use the same QMC sequence as above with the integrand

f(x, y, z) = x (1-x) sin πy

which has integral 1/3π over the unit cube. Here’s a plot of the accuracy as the number of integration points N increases.

This is much better than before. The error is 0.0007 when N = 100. As N increases, the error goes down more rapidly than with the example above. When computing ζ(3), the error is about 0.001, whether N = 100 or 1,000. In this example, when N = 1,000 the error drops to 0.00003.

This example is periodic in x, y, and z, but the periodic extension is only continuous. The extension has a cusp when x or y take on integer values, and so the derivative doesn’t exist at those points.

If we square our integrand, setting

f(x, y, z) = x² (1-x)² sin² πy

then the periodic extension is differentiable, and the integration method converges more quickly. The exact integral over the unit cube is 1/60.

Now our approximation error is about 7.3 ×10-5 when N = 100, 7.8 ×10-6 when N = 1,000, and 9.5 ×10-8 when N = 10,000.

Lattice rules work best on smooth, periodic integrands. If the integrand is smooth but not periodic and you want to use a lattice rule for integration, you’ll get better results if you first do a change of variables to create a smooth, periodic integrand.

Related posts

Error estimates for splines with more boundary conditions

Yesterday I wrote about rates of convergence for natural cubic splines. This brief post reports similar results for more boundary conditions.

As explained in the earlier post, a cubic spline that interpolates a function f at n+1 points satisfies 4n -2 equations in 4n variables. Two more equations are necessary to uniquely determine the interpolating cubic spline.

A natural cubic spline S over [a, b] satisfies

S”(a) = 0
S”(b) = 0.

There are other ways of coming up with two more equations. You might match first derivatives at the end points

S‘(a) = f‘(a)
S‘(b) = f‘(b)

or second derivatives

S”(a) = f”(a)
S”(b) = f”(b).

Or if f is periodic you could require first and second derivatives of the spline to match at a and b.

S‘(a) = S‘(b)
S”(a) = S”(b).

In each case, you get the same order of convergence as we reported earlier for the natural cubic spline, assuming f is 4 times continuously differentiable over [a, b]. The error goes down like O(1/n4).

Source: D. Kershaw. A Note on the Convergence of Interpolatory Cubic Splines. SIAM Journal on Numerical Analysis, Mar., 1971, Vol. 8, No. 1 pp. 67-74.


When we use natural cubic splines to fit exp(x) over [0, 2π] in the earlier post, the error was fairly large. We get better results when we specify the derivatives at the two ends of the interval. Here’s what we get for 11 points:

The error is about 7 times smaller than with natural cubic splines.

And here’s what we get for 21 points:

In this case the error is about 8x smaller than with natural cubic splines.

When we go from 11 to 21 points, specifying the derivatives on each end, the error drops by about a factor of 15, close value of 16 we’d expect in the limit.

Incidentally, to specify the boundary conditions of the spline in Python we need to add an option parameter bc_type in the call to CubicSpline. Otherwise the code is the same as before. In our example

    bc_type=((1, 1), (1, np.exp(2*np.pi)))

We set bc_type equal to a pair of pairs. The first pair gives a boundary condition on the left end and the second gives a boundary condition on the right. The first argument of the inner pair in both cases is 1, because we’re specifying 1st derivatives. Use a 2 to specify 2nd derivatives. The second argument to the pair the value of the derivative: exp(0) = 1 on the left, and exp(2π) on the right.

Rate of natural cubic spline convergence

Suppose you want to approximate a function with a polynomial by interpolating it at evenly spaced points. You might reasonably expect that the more points you use, the better the approximation will be. That might be true, but it might not. As explained here, for some functions the maximum approximation error actually increases as the number of points increases. This is called the Runge phenomenon.

What about cubic splines? Instead of fitting one high-degree polynomial, you fit a different cubic polynomial on each sub-interval, with the constraint that the pieces fit together smoothly. Specifically, the spline must be twice differentiable. This means that the first and second derivatives from both sides match at the end of each interval.

Say our function f is defined over an interval [a, b] and we break the interval into n sub-intervals. This means we interpolate f at n+1 evenly spaced points. We have 4n degrees of freedom: four polynomial coefficients over each interval. Interpolation adds 2n constraints: the value of the polynomial is specified at the ends of each sub-interval. Matching first and second derivatives at each of the n-1 interior nodes gives 2(n-1) constrains. This is a total of 4n – 2 equations for 4n unknowns.

We get the two more equations we need by specifying something about the derivatives at a and b. A natural cubic spline sets the second derivatives equal to zero at a and b.

So if we interpolate f at n+1 evenly spaced points using a natural cubic spline, does the splines converge uniformly to f as we increase n? Indeed they do. Nothing like Runge phenomenon can happen. If f is four times continuously differentiable over [ab], then convergence is uniform on every compact subinterval of (ab) and the rate of convergence is O(1/n4). If the second derivative of f is zero at a and b, then the rate of convergence is O(1/n4) over the whole interval [a, b].

Source: Kendall E. Atkinson. On the Order of Convergence of Natural Cubic Spline Interpolation. SIAM Journal on Numerical Analysis, Vol. 5, No. 1 (Mar., 1968), pp. 89-101.

Update: See this post for results for other kinds of cubic splines, i.e. cubic splines with different boundary conditions.


We will look at sin(x) and exp(x) on the interval [0, 2π]. Note that the second derivatives of former at zero at both ends of the interval, but this is not true for the latter.

We will look at the error when interpolating our functions at 11 points and 21 points (i.e. 10 sub-intervals and 20 sub-intervals) with natural cubic splines. For the sine function we expect the error to go down by about a factor of 16 everywhere. For the exponential function, we expect the error to go down by about a factor of 16 in the interior of [0, 2π]. We’ll see what happens when we zoom in on the interval [2, 4].

We’re basing our expectations on a theorem about what happens as the number of nodes n goes to infinity, and yet we’re using fairly small values of n. So we shouldn’t expect too close of an agreement with theory, but we’ll see how close our prediction gets.


The following Python code will plot the error in the interpolation by natural cubic splines.

    import numpy as np
    from scipy.interpolate import CubicSpline
    import matplotlib.pyplot as plt

    for n in [11, 21]:
        for f in [np.sin, np.exp]:
            knots = np.linspace(0, 2*np.pi, n)
            cs = CubicSpline(knots, f(knots))
            x = np.linspace(0, 2*np.pi, 200)
            plt.plot(x, f(x) - cs(x))
            plt.title(f"Interpolating {f.__name__} at {n} points")

Notice an interesting feature of NumPy: we call the __name__ method on sin and exp to get their names as strings to use in the plot title and in the graphic file.

Here’s what we get for the sine function.

First for 10 intervals:

interpolation error for 11 points

Then for 20 intervals:

interpolation error for 11 points

When we doubled the number of intervals, the maximum error went down by about 30x, better than the theoretical upper bound on error.

Here’s what we get for the exponential function.

First for 10 intervals:

interpolation error for 11 points

Then for 20 intervals:

interpolation error for 11 points

The error in approximating the exponential function by splines is much greater than the error in approximating the sine function. That’s because the former acts less like a polynomial than the latter. Polynomial approximation, and piecewise polynomial approximation, works better on things that behave like polynomials.

When we doubled the number of intervals, the maximum error went down by a factor of 12. We shouldn’t be surprised that the error to go down by a factor of less than 16 since the hypotheses for the theorem that gives a factor of 16 aren’t satisfied.

Here’s what we get when we zoom in on the interval [2, 4].

For 10 subintervals:

interpolation error for 11 points on subinterval

And for 20 subintervals:

interpolation error for 21 points on subinterval

The error is much smaller over the interior interval. And when we double the number of interpolation points, the error over [2, 4] goes down by about a factor of 3 just as it did for sine.

Note that in these last two plots, we’re still interpolating over the entire interval [0, 2π], but we’re looking at the error over [2, 4]. We’re zooming in on part of the error plot, not allocating our interpolation points specifically to the smaller interval.

Related posts

Monotonic interpolation

Accuracy isn’t everything. Sometimes when you’re approximating a function you care more about matching a function’s qualitative behavior than matching its numerical values.

One example is interpolating monotonic data. Say your data show that increasing input always increases output. It’s likely you want a function that interpolates your data to have the same property and that you might sacrifice some accuracy to assure monotonicity. In a visual application, you might get a distracting wiggle in what should be a smoothly bending curve. In software used to control a physical device, it could lead to unexpected behavior, or even disaster, if an increase in input led to a decrease in output, even if it was a small decrease.

Linear interpolation preserves monotonicity: if in your data increasing x leads to increasing y, the same is true of a linear interpolation to the data. But for higher-order interpolation this isn’t the case. A simple example would be the three points (-1, 1), (2, 4), and (3, 9). The unique quadratic polynomial interpolating these three points is x², which is not monotonic in any interval containing 0. We’d like to have a smoother and more accurate [1] way to interpolate data while still preserving monotonicity.

It’s unlikely that a polynomial interpolating monotone data will be monotone. I suspect that given a set of monotonic data, it’s more likely that a polynomial spline interpolant will be monotone than a polynomial interpolant because the former is more flexible. Formulating this suspicion into a conjecture that could be settled theoretically, or even tested empirically, would be a non-trivial but interesting exercise.

A cubic spline interpolating monotone data is not necessarily monotone. However, there is a way to determine derivative conditions so that a Hermite spline fitting the data will be monotone. Fritsch and Carleson published an algorithm for this in 1980.

Related posts

[1] If the function sampled by your data is smooth, then in general a smooth approximation to the function will be more accurate than a piecewise linear approximation.

A gentle introduction to QMC and discrepancy

The previous post looked at the sequence

kn mod 1

where k = log2(3). This value of k arose out of exploring perfect fifths and octaves.

We saw that the sequence above fills in the unit interval fairly well, though not exactly regularly. The first 12 elements of the sequence nearly divide the unit interval into 12 nearly equal parts. And the first 53 elements of the sequence divide the interval into 53 nearly equal parts.

There’s something interesting going on here. We could set k = 1/12 and divide the unit interval into exactly 12 equal parts. But if we did, then the same sequence couldn’t also divide the interval in 53 equal parts. Our sequence fills in the unit interval fairly uniformly, more uniformly than a random sequence, but less evenly than a sequence designed to obtain a particular spacing.

Our sequence is delicately positioned somewhere between being completely regular and being completely irregular (random). The sequence is dense in the interval, meaning that it comes arbitrarily close to any point in the interval eventually. It could not do that if it were more regular. And yet it is more regular than a random sequence in a way that we’ll now make precise.


Let’s pick an interval and see how often the sequence lands in that interval and compare the results to a random sequence. For my example I chose [0.1, 0.2], but I’d get similar results with any other interval.

The horizontal axis is the length of the sequence and the vertical axis is the proportion of sequence elements that fall into the test interval.

Our sequence spends about 1/10 of its time in this interval of length 1/10. The same is true of a random sequence, eventually, and with high probability. Our sequence achieves the limiting behavior of a random sequence, but it achieves it more quickly, and it does so with certainty, not just with high probability.

Here’s another example just to show that our initial choice of interval didn’t matter. We use the interval

[π – 3, e – 2] = [0.141…, 0.718…]

and we get analogous results.

The interval has width 0.57, and our sequence spends 57% of its time in the interval.

Quasi-Monte Carlo and low discrepancy

Our sequence is an example of a Quasi-Monte Carlo (QMC) sequence. The name implies that the sequence is something like a random sequence, but better in some sense. QMC sequences are said to have low discrepancy, meaning that the difference (discrepancy) between the number of points that land in an interval and the expected number is low, lower than what you’d expect from a random (Monte Carlo) sequence.

To define discrepancy rigorously we get rid of our arbitrary choice of interval and take the supremum over all subintervals. Or we could define the star-discrepancy by taking the sumpremum over subintervals that must start with 0 on the left end.

Even lower discrepancy

It turns out we could do better with a different choice of k. For any irrational k the sequence

kn mod 1

has low discrepancy, i.e. lower than you’d expect from a random sequence, but the discrepancy is as small as possible when k is the golden ratio.


We’ve been looking at a QMC sequence in an interval, but QMC sequences in higher dimension are defined analogously. In any volume of space, the visits the space more uniformly than a random sequence.

QMC sequences are usually applied in higher dimensions, and the primary application is numerical integration. Quasi-Monte Carlo integration is more efficient than Monte Carlo integration, though the error is harder to estimate; QMC integration often does better in practice than pessimistic theoretical bounds predict. A hybrid of QMC and MC can retain the efficiency of QMC and a some of the error estimation of MC.

See more on QMC sequences are used in numerical integration.

Mathematical stability vs numerical stability

Is 0 a stable fixed point of f(x) = 4x (1-x)?

If you set x = 0 and compute f(x) you will get exactly 0. Apply f a thousand times and you’ll never get anything but zero.

But this does not mean 0 is a stable attractor, and in fact it is not stable.

It’s easy to get mathematical stability and numerical stability confused, because the latter often illustrates the former. In this post I point out that the points on a particular unstable orbit cannot be represented exactly in floating point numbers, so iterations that start at the floating  point representations of these points will drift away.

In the example here, 0 can be represented exactly, and so we do have a computationally stable fixed point. But 0 is not a stable fixed point in the mathematical sense. Any starting point close to 0 but not exactly 0 will eventually go all over the place.

If we start at some very small ε > 0, then f(ε) ≈ 4ε. Every iteration multiplies the result by approximately 4, and eventually the result is large enough that the approximation no longer holds.

For example, suppose we start with x = 10-100. After 100 iterations we have

x = 4100 10-100 = 1.6 × 10-40.

If we were to plot this, we wouldn’t see any movement. But by the time we iterate our function 200 times, the result is chaotic. I get the following:

Calculating square roots

Here’s a simple way to estimate the square root of a number x. Take a guess g at the root and compute the average of g and x/g.

If you want to compute square roots mentally or with pencil and paper, how accurate can you get with this method? Could you, for example, get within 1%?

Initial guess

Obviously the answer depends on your guess. One way to form an initial guess is to round x up to the nearest square and take the root of that as your guess. In symbols, we take ⌈√x⌉ as our guess.

For example, if x = 38, you could take 7 as your guess since 49 is the next square after 38. In that case you’d get

(7 + 38/7)/2 = 6.214

The correct value to three places is 6.164, so our error is about 0.8%.

But we could do better. 38 is much closer to 36 than to 49, so we could take 6 as our initial guess. That is, ⌊√38⌋. Then we have

(6 + 38/6) = 6.167

and our relative error is less than 0.04%.

We’ll look at three strategies:

  1. ⌊√x
  2. ⌈√x
  3. ⌊√x⌋ or ⌈√x

In strategy 3, we choose the guess whose square is closer to x.

Initial accuracy

Here’s what we get when we use the method above to estimate the square roots of the numbers from 1 to 100.

Guess 1 has maximum error 15.5%, where as guess 2 and guess 3 have maximum error 6%.

Guess 2, taking the ceiling, is generally better than guess 1, taking the floor. But guess 3, taking the better of the two, is even better.

More important than which of the three methods used to find the initial guess is being closer to the far end of the range. We could assume x is between 25 and 100 if we first multiply x by a square if necessary to get it into that range. To find the square root of 13, for example, we multiply by 4 and calculate the square root of 13 as half the square root of 52.

If we assume x is between 25 and 100, the maximum errors for the three initial guess methods are 1.4%, 1.3%, and 0.4%.

So to answer our initial question, yes, you can get relative error less than 1%, but you have to do a couple things. You have to multiply by a square to get your number into the range [25, 100] and you have to use the third method of guessing a starting approximation.

Initial and final error

You don’t have to move your number into the range [25, 100] if you put a little more effort into your initial guess. In the example of x = 13 mentioned above, multiplying by 4 before taking the square root is essentially the same as taking 7/2 as your initial guess for √13 instead of using 3 or 4 as an initial guess.

In short, our discussion of the range on x was really a discussion of initial error in disguise. The size of x determines the quality of our initial guesses, but the size of x itself doesn’t matter to the relative error.

Here’s a plot of the final error as a function of the error in the initial guess.

Notice two things. First, the final error is roughly proportional to the square of the error in the initial guess. Second, it’s a little better to guess high than to guess low, which explains why the second method above for guessing starting points is a little better than the first.


Another way to get more accuracy is to repeat our process. That is, after we find our estimate, we use it as a new guess and apply our method again. For example, suppose we want to compute the square root of 30. We would find

(5 + 30/5)/2 = 5.5
(5.5 + 30/5.5)/2 = 5.47727

which is correct to four decimal places

We can find square roots to any accuracy we wish by applying this method enough times. In fact, what we’re doing is applying a special case Newton’s root-finding method.

We showed above that the error after each iteration is on the order of the square of the previous error. So roughly speaking, each iteration doubles the number of correct decimal places.

Higher-order roots

See the next post for an analogous method for computing cube roots and nth roots in general.

Related posts

Computing Fourier series coefficients with the FFT

The Discrete Fourier Transform (DFT) is a mathematical function, and the Fast Fourier Transform (FFT) is an algorithm for computing that function. Since the DFT is almost always computed via the FFT, the distinction between the two is sometimes lost.

It is often not necessary to distinguish between the two. In my previous post, I used the two terms interchangeably because the distinction didn’t matter in that context.

Here I will make a distinction between the DFT and the FFT; I’ll use DFT to refer to the DFT as it is defined in [1], and FFT for the DFT as computed in NumPy‘s FFT routine. The differences are due to varying conventions; this is often an issue.

Suppose we have a function f on an interval [-A/2, A/2] and we sample f at N points where N is an even integer.


 \begin{align*} \Delta x &= A/N \\ x_n &= n \Delta x \\ f_n &= f(x_n) \end{align*}

for n running from –N/2 + 1 to N/2.

DFT of the sequence {fn} is defined in [1] as the sequence {Fk} where

F_k = \frac{1}{N}\sum_{n = -N/2 + 1}^{N/2} f_n \exp(-2 \pi i n k/N)

Now suppose the function f that we sampled has a Fourier series

f(x) = \sum_{k = -\infty}^\infty c_k \exp(2 \pi i k x/A)

The Fourier coefficients ck relate to the DFT output Fk according to the Discrete Poisson Summation Formula [1]:

F_k = c_k + \sum_{j=-\infty}^\infty \left(c_{k+jN} - c_{k-jN}\right)

This means that the ck simply equal the Fk if f has no frequency components higher than N/2 Hz because all the terms in the infinite sum above are zero. That is, if f is band-limited and we have sampled at a frequency higher than the Nyquist frequency, then we can simply read the Fourier coefficients off the DFT.

However, when f is not band-limited, or when it is band-limited but we have not sampled it finely enough, we will have aliasing. Our Fourier coefficients will differ from our DFT output by an error term involving high-frequency Fourier series coefficients.

In application it’s usually too much to hope that your function is exactly band-limited, but it may be approximately band-limited. The Fourier coefficients of smooth functions eventually decay rapidly (see details here) and so the error in approximating Fourier series coefficients by DFT terms is small.

Computing a DFT with the FFT

We defined the DFT of the sequence {fn} above to be the sequence {Fk} where

F_k = \frac{1}{N}\sum_{n = -N/2 + 1}^{N/2} f_n \exp(-2 \pi i n k/N)

and k runs from –N/2 + 1 to N/2.

NumPy, on the other hand, defines the DFT of the sequence {an} to be the sequence {Ak} where

A_k = \sum_{n = 0}^{N-1} a_n \exp(-2 \pi i n k/N)

and k runs from 0 to N-1.

Relative to the definition in the previous post, the NumPy definition difference in three ways:

  1. The normalization factor 1/N is missing.
  2. The input indices are numbered differently.
  3. The output is arranged differently.

The first difference is trivial to overcome: simply divide the FFT output by N.

The second difference is easy to deal with if we think of the inputs to the FFT being samples from a periodic function, which they usually are. The fk come from sampling a periodic function f over an interval [-A/2, A/2]. If we sample the same function over [0, A] we get the an. We have

\begin{align*} f(0) &= f_0 = a_0 \\ f(A/N) &= f_1 = a_1 \\ f(2A/N) &= f_2 = a_2 \\ \cdots \\ f(A) &= f_{N/2} = a_{N/2} \end{align*}

If we extend the fs and the as periodically past their original ranges of definition, then they all agree. But since we start our sampling in difference places, our samples would be listed in different orders if we stored them by increasing index.

Something similar occurs with the output.

For 0 ≤ kN/2,

Fk = Ak.

But for N < k < N,

Fk = ANk.


We’ll use a band-limited function in our example so that we find the Fourier coefficients exactly.

f(x) = 7 cos(6πx) – 5 sin(4πx)

We compute the FFT as follows.

    import numpy as np

    def f(x):
        return 7*np.sin(3*2*np.pi*x) - 5*np.cos(2*2*np.pi*x)

    N = 8
    delta = 1/N
    x = [f(n*delta) for n in range(N)]
    print( np.fft.fft(x)/N )

The output is

[0, 0, -2.5, -3.5i, 0, 3.5i, -2.5, 0]

This says F2 and F-2 = 5/2, and so the coefficient of cos(2·2πx) is 5.

Also F3 = -7/2 and F-3 = 7/2, so the coefficient of cos(3·2πx) is 7. These results follow from Euler’s equation

exp(iθ) = cos(θ) + i sin(θ)

[1] William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.