Names for extremely large numbers

Names for extremely large numbers are kinda pointless. The purpose of a name is to communicate something, and if the meaning of the name is not widely known, the name doesn’t serve that purpose. It’s even counterproductive if two people have incompatible ideas what the word means. It’s much safer to simply use scientific notation for extremely large numbers.

Obscure or confusing

Everyone knows what a million is. Any larger than that and you may run into trouble. The next large number name, billion, means 109 to some people and 1012 to others.

When writing for those who take one billion to mean 109, your audience may or may not know that one trillion is 1012 according to that convention. You cannot count on people knowing the names of larger numbers: quadrillion, quintillion, sextillion, etc.

To support this last claim, let’s look at the frequency of large number names according to Google’s Ngram viewer. Presumably the less often a word is used, the less likely people are to recognize it.

Here’s a bar chart for the data from 2005, plotting on a logarithmic scale. The chart has a roughly linear slope, which means the frequency of the words drops exponentially. Million is used more than 24,000 times as often as septillion.

Frequency of large number names on log scale

Here’s the raw data.

    | Name        |    Frequency |
    | Million     | 0.0096086564 |
    | Billion     | 0.0030243298 |
    | Trillion    | 0.0002595139 |
    | Quadrillion | 0.0000074383 |
    | Quintillion | 0.0000016745 |
    | Sextillion  | 0.0000006676 |
    | Septillion  | 0.0000003970 |

Latin prefixes for large numbers

There is a consistency to these names, though they are not well known. Using the convention that these names refer to powers of 1000, the pattern is

[Latin prefix for n] + llion = 1000n+1

So for example, billion is 10002+1 because bi- is the Latin prefix for 2. A trillion is 10003+1 because tri- corresponds to 3, and so forth. A duodecillion is 100013 because the Latin prefix duodec- corresponds to 12.

    | Name              | Meaning |
    | Billion           |   10^09 |
    | Trillion          |   10^12 |
    | Quadrillion       |   10^15 |
    | Quintillion       |   10^18 |
    | Sextillion        |   10^21 |
    | Septillion        |   10^24 |
    | Octillion         |   10^27 |
    | Nonillion         |   10^30 |
    | Decillion         |   10^33 |
    | Undecillion       |   10^36 |
    | Duodecillion      |   10^39 |
    | Tredecillion      |   10^42 |
    | Quattuordecillion |   10^45 |
    | Quindecillion     |   10^48 |
    | Sexdecillion      |   10^51 |
    | Septendecillion   |   10^54 |
    | Octodecillion     |   10^57 |
    | Novemdecillion    |   10^60 |
    | Vigintillion      |   10^63 |


Here’s the Mathematica code that was used to create the plot above.

        {0.0096086564, 0.0030243298, 0.0002595139, 0.0000074383, 
         0.0000016745, 0.0000006676, 0.0000003970}, 
         ChartLabels -> {"million", "billion", "trillion", "quadrillion", 
                         "quintillion", "sextillion", "septillion"}, 
         ScalingFunctions -> "Log", 
         ChartStyle -> "DarkRainbow"

Fraktur symbols in mathematics

When mathematicians run out of symbols, they turn to other alphabets. Most math symbols are Latin or Greek letters, but occasionally you’ll run into Russian or Hebrew letters.

Sometimes math uses a new font rather than a new alphabet, such as Fraktur. This is common in Lie groups when you want to associate related symbols to a Lie group and its Lie algebra. By convention a Lie group is denoted by an ordinary Latin letter and its associated Lie algebra is denoted by the same letter in Fraktur font.

lower case alphabet in Fraktur


To produce Fraktur letters in LaTeX, load the amssymb package and use the command \mathfrak{}.

Symbols such as \mathfrak{A} are math symbols and can only be used in math mode. They are not intended to be a substitute for setting text in Fraktur font. This is consistent with the semantic distinction in Unicode described below.


The Unicode standard tries to distinguish the appearance of a symbol from its semantics, though there are compromises. For example, the Greek letter Ω has Unicode code point U+03A9 but the symbol Ω for electrical resistance in Ohms is U+2621 even though they are rendered the same [1].

The letters a through z, rendered in Fraktur font and used as mathematical symbols, have Unicode values U+1D51E through U+1D537. These values are in the “Supplementary Multilingual Plane” and do not commonly have font support [2].

The corresponding letters A through Z are encoded as U+1D504 through U+1D51C, though interestingly a few letters are missing. The code point U+1D506, which you’d expect to be Fraktur C, is reserved. The spots corresponding to H, I, and R are also reserved. Presumably these are reserved because they are not commonly used as mathematical symbols. However, the corresponding bold versions U+1D56C through U+ID585 have no such gaps [3].


[1] At least they usually are. A font designer could choose provide different glyphs for the two symbols. I used the same character for both because some I thought some readers might not see the Ohm symbol properly rendered.

[2] If you have the necessary fonts installed you should see the alphabet in Fraktur below:
𝔞 𝔟 𝔠 𝔡 𝔢 𝔣 𝔤 𝔥 𝔦 𝔧 𝔨 𝔩 𝔪 𝔫 𝔬 𝔭 𝔮 𝔯 𝔰 𝔱 𝔲 𝔳 𝔴 𝔵 𝔶 𝔷

I can see these symbols from my desktop and from my iPhone, but not from my Android tablet. Same with the symbols below.

[3] Here are the bold upper case and lower case Fraktur letters in Unicode:
𝖆 𝖇 𝖈 𝖉 𝖊 𝖋 𝖌 𝖍 𝖎 𝖏 𝖐 𝖑 𝖒 𝖓 𝖔 𝖕 𝖖 𝖗 𝖘 𝖙 𝖚 𝖛 𝖜 𝖝 𝖞 𝖟

King of high dimensional space

rook, king, knight

Which has more mobility on an empty chessboard: a rook, a king or a knight?

On an ordinary 8 by 8 chessboard, a rook can move to any one of 14 squares, no matter where it starts. A knight and a king both have 8 possible moves from the middle of the board, fewer moves from an edge.

If we make the board bigger or higher dimensional, what happens to the relative mobility of each piece? Assume we’re playing chess on a hypercube lattice, n points along each edge, in d dimensions. So standard chess corresponds to n = 8 and d = 2.

A rook can move to any square in a coordinate direction, so it can choose one of n-1 squares in each of d dimensions, for a total of (n-1)d possibilities.

A king can move a distance of 0, 1, or -1 in each coordinate. In d dimensions, this gives him 3d – 1 options. (Minus one for the position he’s initially on: moving 0 in every coordinate isn’t a move!)

What about a knight? There are C(d, 2) ways to choose two coordinate directions. For each pair of directions, there are 8 possibilities: two ways to choose which direction to move two steps in, and then whether to move + or – in each direction. So there are 4d(d – 1) possibilities.

In short:

  • A king’s mobility increases exponentially with dimension and is independent of the size of the board.
  • A rook’s mobility increases linearly with dimension and linearly with the size of the board.
  • A knight’s mobility increases quadratically with dimension and independent of the size of the board.

The rules above are not the only way to generalize chess rules to higher dimensions. Here’s an interesting alternative way to define a knight’s moves: a knight can move to any lattice point a distance √5 away. In dimensions up to 4, this corresponds to the definition above. But in dimension 5 and higher, there’s a new possibility: moving one step in each of five dimensions. In that case, the number of possible knight moves increases with dimension like d5.

Related post: A knight’s random walk in 3 or 4 dimensions


3D chess knight moves

I’ve written before about a knight’s random walk on an ordinary chess board. In this post I’d like to look at the generalization to three dimensions (or more).

three dimensional lattice

So what do we mean by 3D chess? For this post, we’ll have a three dimensional lattice of possible positions, of size 8 by 8 by 8. You could think of this as a set of 8 ordinary chess boards stacked vertically. To generalize a knight’s move to this new situation, we’ll say that a knight move consists of moving 2 steps in one direction and 1 step in an orthogonal direction. For example, he might move up two levels and over one position horizontally.

Suppose our knight walks randomly through the chess lattice. At each point, he evaluates all possible moves and chooses one randomly with all possible moves having equal probability. How long on average will it take our knight to return to where he started?

As described in the post about the two dimensional problem, we can find the average return time using a theorem about Markov chains.

The solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the edges connect legal knight moves. The general solution for the time to first return is simply 2N/k where N is the number of edges in the graph, and k is the number of edges meeting at the starting point.

The problem reduces to counting N and k. This is tedious in two dimensions, and gets harder in higher dimensions. Rather than go through a combinatorial argument, I’ll show how to compute the result with a little Python code.

To count the number of edges N, we’ll add up the number of edges at each node in our graph, and then divide by 2 since this process will count each edge twice. We will iterate over our lattice, generate all potential moves, and discard those that would move outside the lattice.

from numpy import all
from itertools import product, combinations

def legal(v):
    return all([ 0 <= t < 8 for t in v])

count = 0
for position in product(range(8), range(8), range(8)):
    # Choose two directions
    for d in combinations(range(3), 2):
        # Move 1 step in one direction
        # and 2 steps in the other.
        for step in [1, 2]:
            for sign0 in [-1, 1]:
                for sign1 in [-1, 1]:
                    move = list(position)
                    move[d[0]] += sign0*step
                    move[d[1]] += sign1*(3-step)
                    if legal(move):
                        count += 1
print(count // 2)

This tells us that there are N = 4,032 nodes in our graph of possible moves. The number of starting moves k depends on our starting point. For example, if we start at a corner, then we have 6 possibilities. In this case we should expect our knight to return to his starting point in an average of 2*4032/6 = 1,344 moves.

We can easily modify the code above. To look at different size lattices, we could change all the 8’s above. The function legal would need more work if the lattice was not the same size in each dimensions.

We could also look at four dimensional chess by adding one more range to the position loop and changing the combinations to come from range(4) rather than range(3). In case you’re curious, in four dimensions, the graph capturing legal knight moves in an 8 by 8 by 8 by 8 lattice would have 64,512 edges. If our knight started in a corner, he’d have 12 possible starting moves, so we’d expect him to return to his starting position on average after 5,275 moves.

Fibonacci meets Pythagoras

The sum of the squares of consecutive Fibonacci numbers is another Fibonacci number. Specifically we have

F_n^2 + F_{n+1}^2 = F_{2n+1}

and so we have the following right triangle.

The hypotenuse will always be irrational because the only Fibonacci numbers that are squares are 1 and 144, and 144 is the 12th Fibonacci number.

Bounds on the central binomial coefficient

It’s hard to find bounds on binomial coefficients that are both simple and accurate, but in 1990, E. T. H. Wang upper and lower bounds on the central coefficient that are both, valid for n at least 4.

\frac{2^{2n-1}}{\sqrt{n}} < \binom{2n}{n} < \frac{2^{2n-1/2}}{\sqrt{n}}

Here are a few numerical results to give an idea of the accuracy of the bounds on a log scale. The first column is the argument n, The second is the log of the CBC (central binomial coefficient) minus the log of the lower bound. The third column is the log of the upper bound minus the log of the CBC.

    | n |  lower |  upper |
    | 1 | 0.0000 | 0.3465 |
    | 2 | 0.0588 | 0.2876 |
    | 3 | 0.0793 | 0.2672 |
    | 4 | 0.0896 | 0.2569 |
    | 5 | 0.0958 | 0.2507 |
    | 6 | 0.0999 | 0.2466 |
    | 7 | 0.1029 | 0.2436 |
    | 8 | 0.1051 | 0.2414 |
    | 9 | 0.1069 | 0.2396 |

And here’s a plot of the same data taking n out further.

Wang's upper and lower bounds on CBC

So the ratio of the upper bound to the CBC and the ratio of the CBC to the lower bound both quickly approach an asymptote, and the lower bound is better.

Refactored code for division algebras

An earlier post included code for multiplying quaternions, octonions, and sedenions. The code was a little clunky, so I refactor it here.

    def conj(x):
        xstar = -x
        xstar[0] *= -1
        return xstar 

    def CayleyDickson(x, y):
        n = len(x)

        if n == 1:
            return x*y

        m = n // 2
        a, b = x[:m], x[m:]
        c, d = y[:m], y[m:]
        z = np.zeros(n)
        z[:m] = CayleyDickson(a, c) - CayleyDickson(conj(d), b)
        z[m:] = CayleyDickson(d, a) + CayleyDickson(b, conj(c))
        return z

The CayleyDickson function implements the Cayley-Dickson construction and can be used to multiply real, complex, quaternion, and octonion numbers. In fact, it can be used to implement multiplication in any real vector space of dimension 2n. The numeric types listed above correspond to n = 0, 1, 2, and 3. These are the only normed division algebras over the reals.

When n = 4 we get the sedenions, which are not a division algebra because they contain zero divisors, and the code can be used for any larger value of n as well. As noted before, the algebraic properties degrade as n increases, though I don’t think they get any worse after n = 4.

If you wanted to make the code more robust, you could add code to verify that the arguments x and y have the same length, and that their common length is a power of 2. (You could just check that the length is either 1 or even; if it’s not a power of 2 the recursion will eventually produce an odd argument.)

Related posts

How close is octonion multiplication to being associative?

Quaternion multiplication is associative but not commutative. An earlier post looked at the average size of the commutator xy – yx as a measure of how far quaternion multiplication is from being commutative.

This post looks at an analogous question for octonions. Octonion mulitplication is neither commutative nor associative. So in this post we look at the associator of three octonions (xy)z – x(yz) as a measure of how far octonion multiplication is from being associative.

(A post from yesterday looked at how close octonion multiplication comes to being associative in an algebraic sense, looking at weak forms of associativity. This post looks at how close multiplication comes to being associative in an analytical sense, looking at norm distances rather than algebraic identities.)

We will use a simulation to look at the average norm of the octonion associator over the octionions of unit length, analogous to the earlier post that looked at the commutator of the quaternions. We developed code for octonion multiplication in the previous post and will reuse that code here. We also developed code for generating random unit-length octonions in the same post.

Here’s code to find the average norm of the associator and plot a histogram of its values.

    import matplotlib.pyplot as plt

    # omult is octonion multiplication. 
    # See previous post.
    def associator(x, y, z):
        return omult(omult(x, y), z) - 
               omult(x, omult(y, z))

    N = 100000
    s = 0
    h = np.zeros(N)
    for n in range(N):
        x = random_unit_octonion()
        y = random_unit_octonion()
        z = random_unit_octonion()    
        t = norm(associator(x, y, z))
        s += t
        h[n] = t

    plt.hist(h, bins=50)

This gives an average of 1.095 and the histogram below.

histogram of octonion associator norm values

Update: Greg Egan calculated the exact mean value for the norm of the associator to be 147456/(42875 π) ≈ 1.0947336. Here are the details.

Python code for octonion and sedenion multiplication

The previous post discussed octonions. This post will implement octonion multiplication in Python, and then sedenion multiplication.

Cayley-Dickson construction

There’s a way to bootstrap quaternion multiplication into octonion multiplication, so we’ll reuse the quaternion multiplication code from an earlier post. It’s called the Cayley-Dickson construction. For more on this construction , see John Baez’s treatise on octonions.

If you represent an octonion as a pair of quaternions, then multiplication can be defined by

(a, b) (c, d) = (acd*b, da + bc*)

where a star superscript on a variable means (quaternion) conjugate.

Note that this isn’t the only way to define multiplication of octonions. There are many (480?) isomorphic ways to define octonion multiplication.

(You can take the Cayley-Dickson construction one step further, creating sedenions as pairs of octonions. We’ll also provide code for sedenion multiplication below.)

Code for octonion multiplication

    import numpy as np

    # quaternion multiplication
    def qmult(x, y):
        return np.array([
            x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3],
            x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2],
            x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1],
            x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0]

    # quaternion conjugate
    def qstar(x):
        return x*np.array([1, -1, -1, -1])

    # octonion multiplication
    def omult(x, y):
        # Split octonions into pairs of quaternions
        a, b = x[:4], x[4:]
        c, d = y[:4], y[4:]
        z = np.zeros(8)
        z[:4] = qmult(a, c) - qmult(qstar(d), b)
        z[4:] = qmult(d, a) + qmult(b, qstar(c))
        return z

Update: See this post for refactored code. Handles quaternions, octonions, sedenions, etc. all with one function.

Unit tests

Here are some unit tests to verify that our multiplication has the expected properties. We begin by generating three octonions with norm 1.

    from numpy.linalg import norm

    def random_unit_octonion():
        x = np.random.normal(size=8)
        return x / norm(x)

    x = random_unit_octonion()
    y = random_unit_octonion()
    z = random_unit_octonion()

We said in the previous post that octonions satisfy the “alternative” condition, a weak form of associativity. Here we verify this property as a unit test.

    eps = 1e-15

    # alternative identities

    a = omult(omult(x, x), y)
    b = omult(x, omult(x, y))
    assert( norm(a - b) < eps )

    a = omult(omult(x, y), y)
    b = omult(x, omult(y, y))
    assert( norm(a - b) < eps )

We also said in the previous post that the octonions satisfy the “Moufang” conditions.

    # Moufang identities
    a = omult(z, omult(x, omult(z, y)))
    b = omult(omult(omult(z, x), z), y)
    assert( norm(a - b) < eps )

    a = omult(x, omult(z, omult(y, z)))
    b = omult(omult(omult(x, z), y), z)
    assert( norm(a - b) < eps )

    a = omult(omult(z,x), omult(y, z))
    b = omult(omult(z, omult(x, y)), z)
    assert( norm(a  - b) < eps )

    a = omult(omult(z,x), omult(y, z))
    b = omult(z, omult(omult(x, y), z))
    assert( norm(a - b) < eps )

And finally, we verify that the product of unit length octonions has unit length.

    # norm condition
    n = norm(omult(omult(x, y), z))
    assert( abs(n - 1) < eps )

The next post uses the octionion multiplication code above to look at the distribution of the associator (xy)z – x(yz) to see how far multiplication is from being associative.

Sedenion multiplication

The only normed division algebras over the reals are the real numbers, complex numbers, quaternions, and octonions. These are real vector spaces of dimension 1, 2, 4, and 8.

You can proceed analogously and define a real algebra of dimension 16 called the sedenions. When we go from complex numbers to quaternions we lose commutativity. When we go from quaternions to octonions we lose full associativity, but retain a weak version of associativity. Even weak associativity is lost when we move from octonions to sedenions.  Non-zero octonions form an alternative loop with respect to multiplication, but sedenions do not.

Sedenions have multiplicative inverses, but there are also some zero divisors, i.e. non-zero vectors whose product is zero. So senenions do not form a division algebra. If you continue the Cayley-Dickson construction past the sedenions, you keep getting zero divisors, so no more division algebras.

Here’s Python code for sedenion multiplication, building on the code above.

    def ostar(x):
        mask = -np.ones(8)
        mask[0] = 1
        return x*mask
    # sedenion multiplicaton
    def smult(x, y):
        # Split sedenions into pairs of octonions
        a, b = x[:8], x[8:]
        c, d = y[:8], y[8:]
        z = np.zeros(16)
        z[:8] = omult(a, c) - omult(ostar(d), b)
        z[8:] = omult(d, a) + omult(b, ostar(c))
        return z

As noted above, see this post for more concise code that also generalizes further.

Weakening the requirements of a group

A group is a set with a binary operation that

  1. closed
  2. associative,
  3. has an identity, and
  4. has inverses.

This post will look at each of these properties and what happens if you modify or remove it.


Closed means that applying the binary operation to any two elements of the set yields another elements of the set. (I saw a book one time that had at the top of some chapter a quote something like “The product of two real numbers is a real number; we should be surprised if it were a head of cabbage.”) For example, the positive integers are closed under multiplication but not under division.

A set with a closed binary operation is called a magma.


Associative means that you can parenthesize an expression any way you want and always get the same result. It may be practically important how you parenthesize an expression. For example, matrix multiplication can far fewer operations if you carry out the multiplications in the right order. In programming, left folds and right folds (such as foldl and foldr in Haskell) may have very different efficiency, but they’re supposed to return the same result if you’re folding an associative operation.

A magma with an associative operation is called a semigroup.

Differential equation theory is concerned with analytic semigroups. If you advance the solution of an evolution equation by a positive time increment t and then by an increment s, you end up at the same place as if you’d advanced the solution by ts.

Notice that the time increment is required to be positive. That’s because the initial conditions for an evolution equation might reside in a different function space than the solutions at any future time. That’s the case, for example, with the heat equation, where initial conditions can be rough but the solution smooths out immediately as you move forward in time.


If you add the requirement that a semigroup has an identity, you get a monoid. Partial differential equation theory is more concerned with analytic semigroups than analytic monoids for the reason explored above: you might not have an identity.

Positive integers with multiplication form a monoid but not a group because they have an identity, 1, but positive integers other than 1 do not have a multiplicative inverse that is an integer.

A monoid in which every element has an inverse is a group.


So far we’ve built up a progression of stronger and stronger requirements leading up to groups, starting with requiring that our operation is associative. What if we don’t have associativity, but we do have other properties?

Non-associative operations are less common. Perhaps the best known operation that is not associative is octonion multiplication.

There are four division algebras over the reals: the reals themselves, complex numbers, quaternions, and octonions. Real and complex numbers are both fields. Quaternions are not a field because multiplication is not commutative. Octonions are even further from being a field because multiplication is not even associative.

But division is possible for octonions, except for 0, and so non-zero octonions form a quasigroup, a magma with division.


A quasigroup that has an identity element is called a loop. Note that for quasigroups we said “division is possible,” not that elements have inverses. If elements do have inverses, then you can divide by an element by multiplying by its inverse. But you can define division by the ability to solve equations. We say you can divide b by a if axb and yab have unique solutions.

A loop with associativity is a group.

Alternative loops

Loops are not required to have an associative property, but they may satisfy a weak form of associativity.

An alternative loop satisfies x(xy) = (xx)y and x(yy) = (xy)y. Non-zero octonions form an alternative loop with respect to multiplication.

Moufang loops

A Moufang loop is a loop that satisfies these requirements:

  1. z(x(zy)) = ((zx)z)y
  2. x(z(yz)) = ((xz)y)z
  3. (zx)(yz) = (z(xy))z
  4. (zx)(yz) = z((xy)z)

These conditions imply the alternative loop conditions, so Moufang loops are alternative loops.

Moufang loop are still not required to be associative, but they’re getting closer. And as you might have anticipated, non-zero octonions with respect to multiplication form a Moufang loop.

Related posts

How far is xy from yx on average for quaternions?

Given two quaternions x and y, the product xy might equal the product yx, but in general the two results are different.

How different are xy and yx on average? That is, if you selected quaternions x and y at random, how big would you expect the difference xy – yx to be? Since this difference would increase proportionately if you increased the length of x or y, we can just consider quaternions of norm 1. In other words, we’re looking at the size of xy – yx relative to the size of xy.

Here’s simulation code to explore our question.

    import numpy as np
    def random_unit_quaternion():
        x = np.random.normal(size=4)
        return x / np.linalg.norm(x)
    def mult(x, y):
        return np.array([
            x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3],
            x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2],
            x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1],
            x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0]
    N = 10000
    s = 0
    for _ in range(N):
        x = random_unit_quaternion()
        y = random_unit_quaternion()
        s += np.linalg.norm(mult(x, y) - mult(y, x))

In this code x and y have unit length, and so xy and yx also have unit length. Geometrically, x, y, xy, and yx are points on the unit sphere in four dimensions.

When I ran the simulation above, I got a result of 1.13, meaning that on average xy and yx are further from each other than they are from the origin.

To see more than the average, here’s a histogram of ||xyyx|| with N above increased to 100,000.


I imagine you could work out the distribution exactly, though it was quicker and easier to write a simulation. We know the distribution lives on the interval [0, 2] because xy and yx are points on the unit sphere. Looks like the distribution is skewed toward its maximum value, and so xy and yz are more likely to be nearly antipodal than nearly equal.

Update: Greg Egan worked out the exact mean and distribution.

Objectives and constraints

Objectives and constraints are symmetrical in a mathematical sense but are asymmetrical in a psychological sense. By taking dual formulations, you can reverse the mathematical role of objectives and constraints, but in application objectives are more obvious than constraints.

In the question “What is the minimum value of x² over the interval [1, 5]?” the function f(x) = x² is the objective function and 1 ≤ x ≤ 5 is the constraint. If someone says the minimum is 0, they’ve minimized the objective function but ignored the constraint. This is clear in a such a simple problem, but failure to consider constraints can be much more subtle.

Objectives tend to be easily quantifiable—maximize profit, minimize energy consumption, etc.— but constraints tend to be less quantifiable—the solution has to be testable and maintainable, has to be legal, has to be something people will buy or vote for, etc.

When children ask “Why don’t you just …” it’s because they see a way to improve some objective, but the “just” part shows that they are either completely unaware of a relevant constraint or are unaware of how difficult it would be to overcome the constraint. As you mature, you become aware of more constraints. You realize that things that seem grossly subopitmal are actually close to optimal when you consider the necessary constraints. There may be room for improvement, but not as much as you imagined and at a higher cost.

Big opportunities open up when constraints change. Maybe an idea was abandoned because it would require more calculation than anyone could carry out by hand, and now’s the time to revisit it. Or maybe an idea was never developed because it would require instantaneous communication between people at multiple points on the globe. No problem now.

In both the examples above, a constraint was relaxed: computation and communication have gotten far less expensive. Increased constraints create opportunities as well. When the price of something goes up, its alternatives become more economical by comparison. Whether an oil field is worth developing, for example, depends on the current price of oil.

If I ask “Why hasn’t someone done this before?” I’m skeptical if the answer is “Because I’m smarter than everyone else who has tried.” But if the answer is “Because constraints have changed” then I’m much more receptive.

Related post: Boundary conditions are the hard part

Mathematics of Deep Note

THX deepnote logo score

I just finished listening to the latest episode of Twenty Thousand Hertz, the story behind “Deep Note,” the THX logo sound.

There are a couple mathematical details of the sound that I’d like to explore here: random number generation, and especially Pythagorean tuning.

Random number generation

First is that part of the construction of the sound depended on a random number generator. The voices start in a random configuration and slowly reach the target D major chord at the end.

Apparently the random number generator was not seeded in a reproducible way. This was only mentioned toward the end of the show, and a teaser implies that they’ll go more into this in the next episode.

Pythagorean tuning

The other thing to mention is that the final chord is based on Pythagorean tuning, not the more familiar equal temperament.

The lowest note in the final chord is D1. (Here’s an explanation of musical pitch notation.) The other notes are D2, A2, D3, A3, D4, A4, D5, A5, D6, and F#6.


Octave frequencies are a ratio of 2:1, so if D1 is tuned to 36 Hz, then D2 is 72 Hz, D3 is 144 Hz, D4 is 288 Hz, D5 is 576 Hz, and D6 is 1152 Hz.


In Pythagorean tuning, fifths are in a ratio of 3:2. In equal temperament, a fifth is a ratio of 27/12 or 1.4983 [1], a little less than 3/2. So Pythagorean fifths are slightly bigger than equal temperament fifths. (I explain all this here.)

If D2 is 72 Hz, then A2 is 108 Hz. It follows that A3 would be 216 Hz, A4 would be 432 Hz (flatter than the famous A 440), and A5 would be 864 Hz.

Major thirds

The F#6 on top is the most interesting note. Pythagorean tuning is based on fifths being a ratio of 3:2, so how do you get the major third interval for the highest note? By going up by fifths 4 times from D4, i.e. D4 -> A4 -> E5 -> B5 -> F#6.

The frequency of F#6 would be 81/16 of the frequency of D4, or 1458 Hz.

The F#6 on top has a frequency 81/64 that of the D# below it. A Pythagorean major third is a ratio of 81/64 = 1.2656, whereas an equal temperament major third is f 24/12 or 1.2599 [2]. Pythagorean tuning makes more of a difference to thirds than it does to fifths.

A Pythagorean major third is sharper than a major third in equal temperament. Some describe Pythagorean major chords as brighter or sweeter than equal temperament chords. That the effect the composer was going for and why he chose Pythagorean tuning.


Then after specifying the exact pitches for each note, the composer actually changed the pitches of the highest voices a little to make the chord sound fuller. This makes the three voices on each of the highest notes sound like three voices, not just one voice. Also, the chord shimmers a little bit because the random effects from the beginning of Deep Note never completely stop, they are just diminished over time.

Related posts


[1] The exponent is 7/12 because a half step is 1/12 of an octave, and a fifth is 7 half steps.

[2] The exponent is 4/12 because a major third is 4 half steps.

Musical score above via THX Ltd on Twitter.

Stirling numbers, including negative arguments

Stirling numbers are something like binomial coefficients. They come in two varieties, imaginatively called the first kind and second kind. Unfortunately it is the second kind that are simpler to describe and that come up more often in applications, so we’ll start there.

Stirling numbers of the second kind

The Stirling number of the second kind

S_2(n,k) = \left\{ {n \atop k} \right\}

counts the number of ways to partition a set of n elements into k non-empty subsets. The notation on the left is easier to use inline, and the subscript reminds us that we’re talking about Stirling numbers of the second kind. The notation on the right suggests that we’re dealing with something analogous to binomial coefficients, and the curly braces suggest this thing might have something to do with counting sets.

Since the nth Bell number counts the total number of ways to partition a set of n elements into any number of non-empty subsets, we have

B_n = \sum_{k=1}^n \left\{ {n \atop k}\right\}

Another connection to Bell is that S2(n, k) is the sum of the coefficients in the partial exponential Bell polynomial Bn, k.

Stirling numbers of the first kind

The Stirling numbers of the first kind

S_1(n,k) = \left[ {n \atop k} \right]

count how many ways to partition a set into cycles rather than subsets. A cycle is a sort of ordered subset. The order of elements matters, but only a circular way. A cycle of size k is a way to place k items evenly around a circle, where two cycles are considered the same if you can rotate one into the other. So, for example, [1, 2, 3] and [2, 3, 1] represent the same cycle, but [1, 2, 3] and [1, 3, 2] represent different cycles.

Since a set with at least three elements can be arranged into multiple cycles, Stirling numbers of the first kind are greater than or equal to Stirling numbers of the second kind, given the same arguments.

Addition identities

We started out by saying Stirling numbers were like binomial coefficients, and here we show that they satisfy addition identities similar to binomial coefficients.

For binomial coefficients we have

{n \choose k} = {n - 1 \choose k} + {n-1 \choose k-1}.

To see this, imagine we start with the set of the numbers 1 through n. How many ways can we select a subset of k items? We have selections that exclude the number 1 and selections that include the number 1. These correspond to the two terms on the right side of the identity.

The analogous identities for Stirling numbers are

\left\{{n \atop k} \right\}= k \left\{ {n-1 \atop k} \right\} + \left\{ {n-1 \atop k-1} \right\}

\left[ {n \atop k} \right]= (n-1) \left[ {n-1 \atop k} \right] + \left[ {n-1 \atop k-1} \right]
The combinatorial proofs of these identities are similar to the argument above for binomial coefficients. If you want to partition the numbers 1 through n into k subsets (or cycles), either 1 is in a subset (cycle) by itself or not.

More general arguments

Everything above has implicitly assumed n and k were positive, or at least non-negative, numbers. Let’s look first at how we might handle zero arguments, then negative arguments.

It works out best if we define S1(0, k) and S2(0, k) to be 1 if k is 0 and zero otherwise. Also we define S1(n, 0) and S2(n, 0) to be 1 if n is 0 and zero otherwise. (See Concrete Mathematics.)

When either n or k is zero the combinatoric interpretation is strained. If we let either be negative, there is no combinatorial interpretation, though it can still be useful to consider negative arguments, much like one does with binomial coefficients.

We can take the addition theorems above, which are theorems for non-negative arguments, and treat them as definitions for negative arguments. When we do, we get the following beautiful dual relationship between Stirling numbers of the first and second kinds:

\left\{{n \atop k} \right\}= \left[ {-k \atop -n} \right]

Related posts

Tetrahedral numbers

Start with the sequence of positive integers:

1, 2, 3, 4, …

Now take partial sums, the nth term of the new series being the sum of the first n terms of the previous series. This gives us the triangular numbers, so called because they count the number of coins at each level of a triangular arrangement:

1, 3, 6, 10, …

If we repeat this again we get the tetrahedral numbers, the number of balls on each level of a tetrahedral stack of balls.

1, 4, 10, 20, …

We can repeat this process and general define Tn, d, the nth tetrahedral number of dimension d, recursively. We define Tn, 1 = n and for d > 1,

T(n, d) = \sum_{i=1}^n T(i, d-1)

This is just a formalization of the discussion above.

It turns out there’s a simple expression for tetrahedral number of all dimensions:

T(n, d) = \binom{n+d-1}{d}

Here’s a quick proof by induction. The theorem clearly holds when n = 1 or d = 1. Now assume it hold for all n < m and d < m.

\begin{eqnarray*} T(n, m) &=& \sum_{i=1}^n T(n, m-1) \\ &=& T(n, m-1) + \sum_{i=1}^{n-1} T(n, m-1) \\ &=& T(n, m-1) + T(n-1, m) \\ &=& \binom{n+m-2}{m-1} + \binom{n+m-2}{m} \\ &=& \binom{n+m-1}{m} \end{eqnarray*}

The last line follows from the binomial addition identity

\binom{a}{b} = \binom{a-1}{b} + \binom{a-1}{b-1}

It also turns out that Tn, d is the number of ways to select d things from a set of n with replacement.

Related posts: