Mathematics of Deep Note

THX deepnote logo score

I just finished listening to the latest episode of Twenty Thousand Hertz, the story behind “Deep Note,” the THX logo sound.

There are a couple mathematical details of the sound that I’d like to explore here: random number generation, and especially Pythagorean tuning.

Random number generation

First is that part of the construction of the sound depended on a random number generator. The voices start in a random configuration and slowly reach the target D major chord at the end.

Apparently the random number generator was not seeded in a reproducible way. This was only mentioned toward the end of the show, and a teaser implies that they’ll go more into this in the next episode.

Pythagorean tuning

The other thing to mention is that the final chord is based on Pythagorean tuning, not the more familiar equal temperament.

The lowest note in the final chord is D1. (Here’s an explanation of musical pitch notation.) The other notes are D2, A2, D3, A3, D4, A4, D5, A5, D6, and F#6.


Octave frequencies are a ratio of 2:1, so if D1 is tuned to 36 Hz, then D2 is 72 Hz, D3 is 144 Hz, D4 is 288 Hz, D5 is 576 Hz, and D6 is 1152 Hz.


In Pythagorean tuning, fifths are in a ratio of 3:2. In equal temperament, a fifth is a ratio of 27/12 or 1.4983 [1], a little less than 3/2. So Pythagorean fifths are slightly bigger than equal temperament fifths. (I explain all this here.)

If D2 is 72 Hz, then A2 is 108 Hz. It follows that A3 would be 216 Hz, A4 would be 432 Hz (flatter than the famous A 440), and A5 would be 864 Hz.

Major thirds

The F#6 on top is the most interesting note. Pythagorean tuning is based on fifths being a ratio of 3:2, so how do you get the major third interval for the highest note? By going up by fifths 4 times from D4, i.e. D4 -> A4 -> E5 -> B5 -> F#6.

The frequency of F#6 would be 81/16 of the frequency of D4, or 1458 Hz.

The F#6 on top has a frequency 81/64 that of the D# below it. A Pythagorean major third is a ratio of 81/64 = 1.2656, whereas an equal temperament major third is f 24/12 or 1.2599 [2]. Pythagorean tuning makes more of a difference to thirds than it does to fifths.

A Pythagorean major third is sharper than a major third in equal temperament. Some describe Pythagorean major chords as brighter or sweeter than equal temperament chords. That the effect the composer was going for and why he chose Pythagorean tuning.


Then after specifying the exact pitches for each note, the composer actually changed the pitches of the highest voices a little to make the chord sound fuller. This makes the three voices on each of the highest notes sound like three voices, not just one voice. Also, the chord shimmers a little bit because the random effects from the beginning of Deep Note never completely stop, they are just diminished over time.

Related posts


[1] The exponent is 7/12 because a half step is 1/12 of an octave, and a fifth is 7 half steps.

[2] The exponent is 4/12 because a major third is 4 half steps.

Musical score above via THX Ltd on Twitter.

Stirling numbers, including negative arguments

Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there.

Stirling numbers of the second kind

The Stirling number of the second kind

S_2(n,k) = \left\{ {n \atop k} \right\}

counts the number of ways to partition a set of n elements into k non-empty subsets. The notation on the left is easier to use inline, and the subscript reminds us that we’re talking about Stirling numbers of the second kind. The notation on the right suggests that we’re dealing with something analogous to binomial coefficients, and the curly braces suggest this thing might have something to do with counting sets.

Since the nth Bell number counts the total number of ways to partition a set of n elements into any number of non-empty subsets, we have

B_n = \sum_{k=1}^n \left\{ {n \atop k}\right\}

Another connection to Bell is that S2(n, k) is the sum of the coefficients in the partial exponential Bell polynomial Bn, k.

Stirling numbers of the first kind

The Stirling numbers of the first kind

S_1(n,k) = \left[ {n \atop k} \right]

count how many ways to partition a set into cycles rather than subsets. A cycle is a sort of ordered subset. The order of elements matters, but only a circular way. A cycle of size k is a way to place k items evenly around a circle, where two cycles are considered the same if you can rotate one into the other. So, for example, [1, 2, 3] and [2, 3, 1] represent the same cycle, but [1, 2, 3] and [1, 3, 2] represent different cycles.

Since a set with at least three elements can be arranged into multiple cycles, Stirling numbers of the first kind are greater than or equal to Stirling numbers of the second kind, given the same arguments.

Addition identities

We started out by saying Stirling numbers were like binomial coefficients, and here we show that they satisfy addition identities similar to binomial coefficients.

For binomial coefficients we have

{n \choose k} = {n - 1 \choose k} + {n-1 \choose k-1}.

To see this, imagine we start with the set of the numbers 1 through n. How many ways can we select a subset of k items? We have selections that exclude the number 1 and selections that include the number 1. These correspond to the two terms on the right side of the identity.

The analogous identities for Stirling numbers are

\left\{{n \atop k} \right\}= k \left\{ {n-1 \atop k} \right\} + \left\{ {n-1 \atop k-1} \right\}

\left[ {n \atop k} \right]= (n-1) \left[ {n-1 \atop k} \right] + \left[ {n-1 \atop k-1} \right]
The combinatorial proofs of these identities are similar to the argument above for binomial coefficients. If you want to partition the numbers 1 through n into k subsets (or cycles), either 1 is in a subset (cycle) by itself or not.

More general arguments

Everything above has implicitly assumed n and k were positive, or at least non-negative, numbers. Let’s look first at how we might handle zero arguments, then negative arguments.

It works out best if we define S1(0, k) and S2(0, k) to be 1 if k is 0 and zero otherwise. Also we define S1(n, 0) and S2(n, 0) to be 1 if n is 0 and zero otherwise. (See Concrete Mathematics.)

When either n or k is zero the combinatoric interpretation is strained. If we let either be negative, there is no combinatorial interpretation, though it can still be useful to consider negative arguments, much like one does with binomial coefficients.

We can take the addition theorems above, which are theorems for non-negative arguments, and treat them as definitions for negative arguments. When we do, we get the following beautiful dual relationship between Stirling numbers of the first and second kinds:

\left\{{n \atop k} \right\}= \left[ {-k \atop -n} \right]

Related posts

Tetrahedral numbers

Start with the sequence of positive integers:

1, 2, 3, 4, …

Now take partial sums, the nth term of the new series being the sum of the first n terms of the previous series. This gives us the triangular numbers, so called because they count the number of coins at each level of a triangular arrangement:

1, 3, 6, 10, …

If we repeat this again we get the tetrahedral numbers, the number of balls on each level of a tetrahedral stack of balls.

1, 4, 10, 20, …

We can repeat this process and general define Tn, d, the nth tetrahedral number of dimension d, recursively. We define Tn, 1 = n and for d > 1,

T(n, d) = \sum_{i=1}^n T(i, d-1)

This is just a formalization of the discussion above.

It turns out there’s a simple expression for tetrahedral number of all dimensions:

T(n, d) = \binom{n+d-1}{d}

Here’s a quick proof by induction. The theorem clearly holds when n = 1 or d = 1. Now assume it hold for all n < m and d < m.

\begin{eqnarray*} T(n, m) &=& \sum_{i=1}^n T(n, m-1) \\ &=& T(n, m-1) + \sum_{i=1}^{n-1} T(n, m-1) \\ &=& T(n, m-1) + T(n-1, m) \\ &=& \binom{n+m-2}{m-1} + \binom{n+m-2}{m} \\ &=& \binom{n+m-1}{m} \end{eqnarray*}

The last line follows from the binomial addition identity

\binom{a}{b} = \binom{a-1}{b} + \binom{a-1}{b-1}

It also turns out that Tn, d is the number of ways to select d things from a set of n with replacement.

Related posts:

Bell numbers

The nth Bell number is the number of ways to partition a set of n labeled items. It’s also equal to the following sum.

B_n = \frac{1}{e}\sum_{k=0}^\infty \frac{k^n}{k!}

You may have to look at that sum twice to see it correctly. It looks a lot like the sum for en except the roles of k and n are reversed in the numerator.

The sum reminded me of the equation

\frac{d}{de}e^x = x e^{x-1}

that I blogged about a few weeks ago. It’s correct, but you may have to stare at it a little while to see why.

Incidentally, the nth Bell number is the nth moment of a Poisson random variable with mean 1.

There’s also a connection with Bell polynomials. The nth Bell number is the sum of the coefficients of the nth complete Bell polynomial. The nth Bell polynomial is sum over k of the partial exponential Bell polynomials Bn,k. The partial (exponential) Bell polynomials are defined here.

Computing Bell numbers

You can compute Bell numbers in Python with sympy.bell and in Mathematical with BellB. You can compute Bell numbers by hand, or write your own function in a language that doesn’t provide one, with the recursion relation B0 = 1 and

B_n = \sum_{k=0}^{n-1}{ n-1 \choose k}B_k

for n > 0. Bell numbers grow very quickly, faster than exponential, so you’ll need extended precision integers if you want to compute very many Bell numbers.

For theoretical purposes, it is sometimes helpful to work with the exponential generating function of the Bell numbers. The nth Bell number is n! times the coefficient of xn in exp(exp(x) – 1).

\sum_{n=0}^\infty \frac{B_n}{n!} x^n = \exp(\exp(x) -1)

An impractical but amusing way to compute Bell numbers would be via simulation, finding the expected value of nth powers of random draws from a Poisson distribution with mean 1.

Central limit theorem and Runge phenomena

I was playing around with something this afternoon and stumbled on something like Gibbs phenomena or Runge phenomena for the Central Limit Theorem.

The first place most people encounter Gibbs phenomena is in Fourier series for a step function. The Fourier series develops “bat ears” near the discontinuity. Here’s an example I blogged about before not with Fourier series but with analogous Chebyshev series.

Gibbs phenomena for Chebyshev interpolation

The series converges rapidly in the middle of the flat parts, but under-shoots and over-shoots near the jumps in the step function.

Runge phenomena is similar, where interpolating functions under- and over-shoot the function they’re approximating.

Runge example

Both plots above come from this post.

Here’s the example I ran across with the central limit theorem. The distribution of the average of a set of exponential random variables converges to the distribution of a normal random variable. The nice thing about the exponential distribution is that the averages have a familiar distribution: a gamma distribution. If each exponential has mean 1, the average has a gamma distribution with shape N and scale 1/N. The central limit theorem says this is converging in distribution to a normal distribution with the same mean and variance.

The plot below shows the difference between the density function of the average of N exponential random variables and the density function for its normal approximation, for N = 10 and for N = 400.

Notice that the orange line, corresponding to N = 400, is very flat, most of the time. That is, the normal approximation fits very well. But there’s this spike in the middle, something reminiscent of Gibbs phenomena or Runge phenomena. Going from 10 to 400 samples the average error decreases quite a bit, but the maximum error doesn’t go down much at all.

If you go back and look at the Central Limit Theorem, or its more quantitative counterpart the Berry-Esseen theorem, you’ll notice that it applies to the distribution function, not the density, or in other words, the CDF, not the PDF. I think the density functions do converge in this case because the exponential function has a smooth density, but the rate of convergence depends on the norm. It looks like the convergence is fast in square (L²) norm, but slow in sup norm. A little experiment shows that this is indeed the case.

Maybe the max norm doesn’t converge at all, i.e. the densities don’t converge pointwise. It looks like the max norm may be headed toward a horizontal asymptote, just like Gibbs phenomena.

Update: It seems we do not have uniform convergence. If we let N = 1,000,000, the sup norm of the error 0.1836. It appears the sup norm of the error is approaching a lower bound of approximately this value.

Related posts

Calendars and continued fractions

Calendars are based on three frequencies: the rotation of the Earth on its axis, the rotation of the moon around the Earth, and the rotation of the Earth around the sun. Calendars are complicated because none of these periods is a simple multiple of any other. The ratios are certainly not integers, but they’re not even fractions with a small denominator.

As I wrote about the other day in the context of rational approximations for π, the most economical rational approximations of an irrational number come from convergents of continued fractions. The book Calendrical Calculations applies continued fractions to various kinds of calendars.

Ratio of years to days

The continued fraction for the number of days in a year is as follows.

365.2424177 = 365 + \cfrac{1}{4 + \cfrac{1}{7+ \cfrac{1}{1 + \cfrac{1}{2 + \cfrac{1}{1 + \cfrac{1}{5 + \ldots }}}}}}

This means that the best approximations for the length of a year are 365 days plus a fraction of 1/4, or 7/29, or 8/33, or 23/95, etc.

You could have one leap day every four years. More accurate would be 7 leap days every 29 years, etc. The Gregorian calendar has 97 leap days every 400 years.

Ratio of years to months

If we express the ratio of the length of the year to the length of the month, we get

\frac{365.24244}{29.53059} = 12 +\cfrac{1}{2 + \cfrac{1}{1+ \cfrac{1}{2 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{18 + \cfrac{1}{3 + \ldots }}}}}}}

By taking various convergents we get 37/3, 99/8, 136/11, etc. So if you want to design successively more accurate lunisolar calendars, you’d have 37 lunar months every 3 years, 99 lunar months every 8 years, etc.

Related posts

Combinatorics, just beyond the basics

Most basic combinatorial problems can be solved in terms of multiplication, permutations, and combinations. The next step beyond the basics, in my experience, is counting selections with replacement. Often when I run into a problem that is not quite transparent, it boils down to this.

Examples of selection with replacement

Here are three problems that reduce to counting selections with replacement.

Looking ahead in an experiment

For example, suppose you’re running an experiment in which you randomize to n different treatments and you want to know how many ways the next k subjects can be assigned to treatments. So if you had treatments A, B, and C, and five subjects, you could assign all five to A, or four to A and one to B, etc. for a total of 21 possibilities. Your choosing 5 things from a set of 3, with replacement because you can (and will) assign the same treatment more than once.

Partial derivatives

For another example, if you’re taking the kth order partial derivatives of a function of n variables, you’re choosing k things (variables to differentiate with respect to) from a set of n (the variables). Equality of mixed partials for smooth functions says that all that matters is how many times you differentiate with respect to each variable. Order doesn’t matter: differentiating with respect to x and then with respect to y gives the same result as taking the derivatives in the opposite order, as long as the function you’re differentiating has enough derivatives. I wrote about this example here.

Sums over even terms

I recently had an expression come up that required summing over n distinct variables, each with a non-negative even value, and summing to 2k. How many ways can that be done? As many as dividing all the variable values in half and asking that they sum to k. Here the thing being chosen the variable, and since the indexes sum to k, I have a total of k choices to make, with replacement since I can chose a variable more than once. So again I’m choosing k things with replacement from a set of size n.


I wrote up a set of notes on sampling with replacement that I won’t duplicate here, but in a nutshell:

\left( {n \choose k} \right) = {n + k - 1 \choose k}

The symbol on the left is Richard Stanley’s suggested notation for the number of ways to select k things with replacement from a set of size n. It turns out that this equals the expression on the right side. The derivation isn’t too long, but it’s not completely obvious either. See the aforementioned notes for details.

I started by saying basic combinatorial problems can be reduced to multiplication, permutations, and combinations. Sampling with replacement can be reduced to a combination, the right side of the equation above, but with non-obvious arguments. Hence the need to introduce a new symbol, the one on the right, that maps directly to problem statements.

Enumerating possibilities

Sometimes you just need to count how many ways one can select k things with replacement from a set of size n. But often you need to actually enumerate the possibilities, say to loop over them in software.

In conducting a clinical trial, you may want to enumerate all the ways pending data could turn out. If you’re going to act the same way, no matter how the pending data work out, there’s no need to wait for the missing data before proceeding. This is something I did many times when I worked at MD Anderson Cancer Center.

When you evaluate a multivariate Taylor series, you need to carry out a sum that has a term for corresponding to each partial derivative. You could naively sum over all possible derivatives, not taking into account equality of mixed partials, but that could be very inefficient, as described here.

The example above summing over even partitions of an even number also comes out of a problem with multivariate Taylor series, one in which odd terms drop out by symmetry.

To find algorithms for enumerating selections with replacement, see Knuth’s TAOCP, volume 4, Fascicle 3.

Related posts

10 best rational approximations for pi

It’s easy to create rational approximations for π. Every time you write down π to a few decimal places, that’s a rational approximation. For example, 3.14 = 314/100. But that’s not the best approximation.

Think of the denominator of your fraction as something you have to buy. If you have enough budget to buy a three-digit denominator, then you’re better off buying 355/113 rather than 314/100 because the former is more accurate. The approximation 355/113 is accurate to 6 decimal places, whereas 314/100 is only accurate to 2 decimal places.

There’s a way to find the most economical rational approximations to π, or any other irrational number, using continued fractions. If you’re interested in details, see the links below.

Here are the 10 best rational approximations to π, found via continued fractions.

    | Fraction       | Decimals |
    | 3              |      0.8 |
    | 22/7           |      2.9 |
    | 333/106        |      4.1 |
    | 355/113        |      6.6 |
    | 103993/33102   |      9.2 |
    | 104348/33215   |      9.5 |
    | 208341/66317   |      9.9 |
    | 312689/99532   |     10.5 |
    | 833719/265381  |     11.1 |
    | 1146408/364913 |     11.8 |

If you only want to know the number of correct decimal places, ignore the fractional parts of the numbers in the Decimals column above. For example, 22/7 gives two correct decimal places. But it almost gives three. (Technically, the Decimals column gives the -log10 of the absolute error.)

In case you’re curious, here’s a plot of the absolute errors on a log scale.

Errors in rational approximations to pi

Related posts

Fixed points of logistic function

Here’s an interesting problem that came out of a logistic regression application. The input variable was between 0 and 1, and someone asked when and where the logistic transformation

f(x) = 1/(1 + exp(abx))

has a fixed point, i.e. f(x) = x.

So given logistic regression parameters a and b, when does the logistic curve given by yf(x) cross the line yx? Do they always cross? Can they cross twice?

There’s always at least one solution. Because f(x) is strictly between 0 and 1, the function

g(x) = f(x) – x

is positive at 0 and negative at 1, and by the intermediate value theorem g(x) must be zero for some x between 0 and 1.

Sometimes f has only one fixed point. It may have two or three fixed points, as demonstrated in the graph below. The case of two fixed points is unstable: the logistic curve is tangent to the line yx at one point, and a tiny change would turn this tangent point into either no crossing or two crossings.

Logistic curves crossing identity line

If |b| < 1, then you can show that the function f is a contraction map on [0, 1]. In that case there is a unique solution to f(x) = x, and you can find it by starting with an arbitrary value for x and repeatedly applying f to it. For example, if a= 1 and b = 0.8 and we start with x= 0, after applying f ten times we get xf(x) = 0.233790157.

There are a couple questions left to finish this up. How can you tell from a and b how many fixed points there will be? The condition |b| < 1 is sufficient for f to be a contraction map on [0, 1]. Can you find necessary and sufficient conditions?

Related post: Sensitivity of logistic regression prediction

Line art

A new video from 3Blue1Brown is about visualizing derivatives as stretching and shrinking factors. Along the way they consider the function f(x) = 1 + 1/x.

Iterations of f converge on the golden ratio, no matter where you start (with one exception). The video creates a graph where they connect values of x on one line to values of f(x) on another line. Curiously, there’s an oval that emerges where no lines cross.

Here’s a little Python I wrote to play with this:

    import matplotlib.pyplot as plt
    from numpy import linspace

    N = 70
    x = linspace(-3, 3, N)
    y = 1 + 1/x

    for i in range(N):
        plt.plot([x[i], y[i]], [0, 1])
    plt.xlim(-5, 5)

And here’s the output:

In the plot above I just used matplotlib’s default color sequence for each line. In the plot below, I used fewer lines (N = 50) and specified the color for each line. Also, I made a couple changes to the plot command, specifying the color for each line and putting the x line above the y line.

        plt.plot([x[i], y[i]], [0, 1], c="#243f6a")

If you play around with the Python code, you probably want to keep N even. This prevents the x array from containing zero.

Update: Here’s a variation that extends the lines connecting (x, 0) and (y, 1). And I make a few other changes while I’m at it.

    N = 200
    x = linspace(-10, 10, N)
    y = 1 + 1/x
    z = 2*y-x

    for i in range(N):
        plt.plot([x[i], z[i]], [0, 2], c="#243f6a")
    plt.xlim(-10, 10)

Making a career out of the chain rule

When I was a teenager, my uncle gave me a calculus book and told me that mastering calculus was the most important thing I could do for starting out in math. So I learned the basics of calculus from that book. Later I read Michael Spivak’s two calculus books. I took courses that built on calculus, and studied generalizations of calculus such as calculus with complex variables, calculus in Banach spaces, etc. I taught calculus. After a while, I started to feel maybe I’d mastered calculus.

Last year I started digging into automatic differentiation and questioned whether I really had mastered calculus. At a high level, automatic differentiation is “just” an application of the chain rule in many variables. But you can make career out of exploring automatic differentiation (AD), and many people do. The basics of AD are not that complicated, but you can go down a deep rabbit hole exploring optimal ways to implement AD in various contexts.

You can make a career out of things that seem even simpler than AD. Thousands of people have made a career out of solving the equation Ax = b where A is a large matrix and the vector b is given. In high school you solve two equations in two unknowns, then three equations in three unknowns, and in principle you could keep going. But you don’t solve a sparse system of millions of equations the same way. When you consider efficiency, accuracy, limitations of computer arithmetic, parallel computing, etc. the humble equation Ax = b is not so simple to solve.

As Richard Feynman said, nearly everything is really interesting if you go into it deeply enough.

Related posts:

Ellipsoid geometry and Haumea

To first approximation, Earth is a sphere. A more accurate description is that the earth is an oblate spheroid, the polar axis being a little shorter than the equatorial diameter. See details here. Other planets are also oblate spheroids as well. Jupiter is further from spherical than the earth is more oblate.

The general equation for the surface of an ellipsoid is

frac{x^2}{a^2} + frac{y^2}{b^2} + frac{z^2}{c^2} = 1

An ellipsoid has three semi-axes: ab, and c. For a sphere, all three are equal to the radius. For a spheroid, two are equal. In an oblate spheroid, like Earth and Jupiter, abc. For a prolate spheroid like Saturn’s moon Enceladus, abc.

Artists conception of Haumea

The dwarf planet Haumea is far from spherical. It’s not even a spheroid. It’s a general (tri-axial) ellipsoid, with a, b, and c being quite distinct. It’s three semi-axes are approximately 1000, 750, and 500 kilometers. The dimensions are not known very accurately. The image above is an artists conception of Haumea and its ring, via Astronomy Picture of the Day.

This post explains how to compute the volume and surface area of an ellipsoid. See that post for details. Here we’ll just copy the Python code from that post.

    from math import sin, cos, acos, sqrt, pi
    from scipy.special import ellipkinc, ellipeinc
    def volume(a, b, c):
        return 4*pi*a*b*c/3
    def area(a, b, c):
        phi = acos(c/a)
        k = a*sqrt(b**2 - c**2)/(b*sqrt(a**2 - c**2))
        E = ellipeinc(phi, k)
        F = ellipkinc(phi, k)
        elliptic = E*sin(phi)**2 + F*cos(phi)**2
        return 2.0*pi*c**2 + 2*pi*a*b*elliptic/sin(phi)
    a, b, c = 1000, 750, 500
    print(volume(a, b, c))
    print(area(a, b, c))

Related post: Planets evenly spaced on a log scale

Optimal low-rank matrix approximation

Matrix compression

Suppose you have an m by n matrix A, where m and n are very large, that you’d like to compress. That is, you’d like to come up with an approximation of A that takes less data to describe.

For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix could be a lower resolution representation that takes less space.

I heard a rumor many years ago that space probes would compress images by interpreting an image as a matrix and sending back a few eigenvalues and eigenvectors. That sounded fascinating, but what about images that aren’t square? If a matrix M is not square, then you can’t have Mx = λx for a scalar λ because the left and right sides of the equation would have different dimensions.

Truncated SVD

Approximate a rectangular matrix requires using something more general than eigenvalues and eigenvectors, and that is singular values and singular vectors.

Suppose our matrix A has the singular value decomposition

A = U \Sigma V^*

This can be written as

A = \sum_{i=1}^p \sigma_i u_i v_i^*

where the σi are the singular values and p is the number of singular values that are non-zero. The ui and vi are the ith columns of U and V respectively. Singular values are nonnegative, and listed in decreasing order.

We can form an approximation to A by truncating the sum above. Instead of taking all the singular values, and their corresponding left and right singular vectors, we only take the k largest singular values and their corresponding vectors.

A_k = \sum_{i=1}^k \sigma_i u_i v_i^*

This is called the truncated SVD.

We started by assuming A had dimensions m by n and that both were large. Storing A requires mn numbers. Storing Ak requires only k(1 + mn) numbers. This is a big savings if k is much smaller than m and n.

Eckart-Young theorem

So how good an approximation is Ak ? Turns out it is optimal, in the least squares sense. This is the content of the Eckart-Young theorem. It says that the best least squares (2-norm) approximation of A by a rank k matrix is given by Ak.

Not only that, the theorem says the 2-norm error is given by the first singular value that we didn’t use, i.e.

|| A - A_k ||_2 = \sigma_{k+1}

Related posts

The quadratic formula and low-precision arithmetic

What could be interesting about the lowly quadratic formula? It’s a formula after all. You just stick numbers into it.

Well, there’s an interesting wrinkle. When the linear coefficient b is large relative to the other coefficients, the quadratic formula can give wrong results when implemented in floating point arithmetic.

Quadratic formula and loss of precision

The quadratic formula says that the roots of

ax^2 + bx + c = 0

are given by

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

That’s true, but let’s see what happens when we have ac = 1 and b = 108.

    from math import sqrt

    def quadratic(a, b, c):
        r = sqrt(b**2 - 4*a*c)
        return ((-b + r)/(2*a), (-b -r)/(2*a))

    print( quadratic(1, 1e8, 1) )

This returns

    (-7.450580596923828e-09, -100000000.0)

The first root is wrong by about 25%, though the second one is correct.

What happened? The quadratic equation violated the cardinal rule of numerical analysis: avoid subtracting nearly equal numbers. The more similar two numbers are, the more precision you can lose from subtracting them. In this case √(b² – 4ac) is very nearly equal to b.

If we ask Python to evaluate

    1e8 - sqrt(1e16-4)

we get 1.49e-8 when the correct answer would be 2.0e-8.

Improving precision for quadratic formula

The way to fix the problem is to rationalize the numerator of the quadratic formula by multiplying by 1 in the form

\frac{-b \mp \sqrt{b^2 - 4ac}}{-b \mp \sqrt{b^2 - 4ac}}

(The symbol ∓ is much less common than ±. It must means that if you take the the + sign in the quadratic formula, take the – sign above, and vice versa.)

When we multiply by the expression above and simplify we get

\frac{2c}{-b \mp \sqrt{b^2 - 4ac}}

Let’s code this up in Python and try it out.

    def quadratic2(a, b, c):
        r = sqrt(b**2 - 4*a*c)
        return (2*c/(-b - r), 2*c/(-b+r))

    print( quadratic2(1, 1e8, 1) )

This returns

    (-1e-08, -134217728.0)

So is our new quadratic equation better? It gives the right answer for the first root, exact to within machine precision. But now the second root is wrong by 34%. Why is the second root wrong? Same reason as before: we subtracted two nearly equal numbers!

The familiar version of the quadratic formula computes the larger root correctly, and the new version computes the smaller root correctly. Neither version is better overall. We’d be no better off or worse off always using the new quadratic formula than the old one. Each one is better when it avoids subtracting nearly equal numbers.

The solution is to use both quadratic formulas, using the appropriate one for the root you’re trying to calculate.

Low precision arithmetic

Is this a practical concern? Yes, and here’s why: Everything old is new again.

The possible inaccuracy in the quadratic formula was serious before double precision (64-bit floating point) arithmetic became common. And back in the day, engineers were more likely to be familiar with the alternate form of the quadratic formula. You can still run into quadratic equations that give you trouble even in double precision arithmetic, like the example above, but it’s less likely to be a problem when you have more bits at your disposal.

Now we’re interested in low-precision arithmetic again. CPUs have gotten much faster, but moving bits around in memory has not. Relative to CPU speed, memory manipulation has gotten slower. That means we need to be more concerned with memory management and less concerned about floating point speed.

Not only is memory juggling slower relative to CPU, it also takes more energy. According to Gustafson, reading 64 bits from DRAM requires 65 times as much energy as doing a floating point combined multiply-add because it takes place off-chip. The table below, from page 6 of Gustafson’s book, gives the details. Using lower precision floating point saves energy because more numbers can be read in from memory in the same number of bits. (Here pJ = picojoules.)

Operation Energy consumedWhere
Perform a 64-bit floating point multiply-add64 pJon-chip
Load or store 64 bits of register data6 pJon-chip
Read 64 bits from DRAM4200 pJoff-chip

So we might be interested in low-precision arithmetic to save energy in a battery powered mobile device, or to save clock cycles in a server application manipulating a lot of data. This means that the numerical tricks that most people have forgotten about are relevant again.

Related posts

Off by one character

There was a discussion on Twitter today about a mistake calculus students make:

\frac{d}{dx}e^x = x e^{x-1}

I pointed out that it’s only off by one character:

\frac{d}{de}e^x = x e^{x-1}

The first equation is simply wrong. The second is correct, but a gross violation of convention, using x as a constant and e as a variable.