A Menger sponge is created by starting with a cube a recursively removing chunks of it. Draw a 3×3 grid on one face of the cube, then remove the middle square, all the way through the cube. Then do the same for each of the eight remaining squares. Repeat this process over and over, and do it for each face.

The holes are all rectangular, so it’s surprising that the geometry is so varied when you slice open a Menger sponge. For example, when you cut it on the diagonal, you can see stars! (I wrote about this here.)

I mentioned this blog post to a friend at Go 3D Now, a company that does 3D scanning and animation, and he created the video below. The video starts out by taking you through the sponge, then at about the half way part the sponge splits apart.

Harmonic numbers are sort of a discrete analog of logarithms since

As n goes to infinity, the difference between H_{n} and log n is Euler’s constant γ = 0.57721… [1]

How would you compute H_{n}? For small n, simply use the definition. But if n is very large, there’s a way to approximate H_{n} without having to do a large sum.

Since in the limit H_{n} – log n goes to γ, a crude approximation would be

But we could do much better by adding a couple terms to the approximation above. [2] That is,

The error in the approximation above is between 0 and 1/120n^{4}.

So if you used this to compute the 1000th harmonic number, the error would be less than one part in 120,000,000,000,000. Said another way, for n = 1000 the approximation differs from the exact value in the 15th significant digit, approximately the resolution of floating point numbers (i.e. IEEE 754 double precision).

And the formula is even more accurate for larger n. If we wanted to compute the millionth harmonic number, the error in our approximation would be somewhere around the 26th decimal place.

* * *

[1] See Julian Havil’s excellent Gamma: Exploring Euler’s Constant. It’s popular-level book, but more sophisticated than most such books.

[2] There’s a sequence of increasingly accurate approximations that keep adding reciprocals of even powers of n, based on truncating an asymptotic series. See Concrete Mathematics for details.

A few days ago I wrote about how powers of the golden ratio are nearly integers but powers of π are not. This post is similar but takes a little different perspective. Instead of looking at how close powers are to the nearest integers, we’ll look at how close they are to their floor, the largest integer below. Put another way, we’ll throw away the integer parts and look at the decimal parts.

First a theorem:

For almost all x > 1, the sequence (x^{n}) for n = 1, 2, 3, … is u.d. mod 1. [1]

Here “almost all” is a technical term meaning that the set of x‘s for which the statement above does not hold has Lebesgue measure zero. The abbreviation “u.d.” stands for “uniformly distributed.” A sequence uniformly distributed mod 1 if the fractional parts of the sequence are distributed like uniform random variables.

Even though the statement holds for almost all x, it’s hard to prove for particular values of x. And it’s easy to find particular values of x for which the theorem does not hold.

From [1]:

… it is interesting to note that one does not know whether sequences such as (e^{n}), (π^{n}), or even ((3/2)^{n}) are u.d. mod 1 or not.

Obviously powers of integers are not u.d. mod 1 because their fractional parts are all 0. And we’ve shown before that powers of the golden ratio and powers of the plastic constant are near integers, i.e. their fractional parts cluster near 0 and 1.

The curious part about the quote above is that it’s not clear whether powers of 3/2 are uniformly distributed mod 1. I wouldn’t expect powers of any rational number to be u.d. mod 1. Either my intuition was wrong, or it’s right but hasn’t been proved, at least not when [1] was written.

The next post will look at powers of 3/2 mod 1 and whether they appear to be uniformly distributed.

The ladder of abstractions in category theory starts with categories, then functors, then natural transformations. Unfortunately, natural transformations don’t seem very natural when you first see the definition. This is ironic since the original motivation for developing category theory was to formalize the intuitive notion of a transformation being “natural.” Historically, functors were defined in order to define natural transformations, and categories were defined in order to define functors, just the opposite of the order in which they are introduced now.

A category is a collection of objects and arrows between objects. Usually these “arrows” are functions, but in general they don’t have to be.

A functor maps a category to another category. Since a category consists of objects and arrows, a functor maps objects to objects and arrows to arrows.

A natural transformation maps functors to functors. Sounds reasonable, but what does that mean?

You can think of a functor as a way to create a picture of one category inside another. Suppose you have some category and pick out two objects in that category, A and B, and suppose there is an arrow f between A and B. Then a functor F would take A and B and give you objects FA and FB in another category, and an arrow Ff between FA and FB. You could do the same with another functor G. So the objects A and B and the arrow between them in the first category have counterparts under the functors F and G in the new category as in the two diagrams below.

A natural transformation α between F and G is something that connects these two diagrams into one diagram that commutes.

The natural transformation α is a collection of arrows in the new category, one for every object in the original category. So we have an arrow α_{A} for the object A and another arrow α_{B} for the object B. These arrows are called the components of α at A and B respectively.

Note that the components of α depend on the objects A and B but not on the arrow f. If f represents any other arrow from A to B in the original category, the same arrows α_{A} and α_{B} fill in the diagram.

Natural transformations are meant to capture the idea that a transformation is “natural” in the sense of not depending on any arbitrary choices. If a transformation does depend on arbitrary choices, the arrows α_{A} and α_{B} would not be reusable but would have to change when f changes.

The next post will discuss the canonical examples of natural and unnatural transformations.

I wondered what a collage of these images would look like, so I used the ImageQuilts software by Edward Tufte and Adam Schwartz to create the image below.

In my previous post, I discussed how number theory and topology relate to other areas of math. Part of that was to show a couple diagrams from Jean Dieudonné’s book Panorama of Pure Mathematics, as seen by N. Bourbaki. That book has only small star-shaped diagrams considering one area of math at a time. I’ve created a diagram that pastes these local views into one grand diagram. Along the way I’ve done a little editing because the original diagrams were not entirely consistent.

Here’s a condensed view of the graph. You can find the full image here.

The graph is so dense that it’s hard to tell which areas have the most or least connections. Here are some tables to clarify that. First, counting how many areas an particular area contributes to, i.e. number of outgoing arrows.

|-------------------------------------+---------------|
| Area | Contributions |
|-------------------------------------+---------------|
| Homological algebra | 12 |
| Lie groups | 11 |
| Algebraic and differential topology | 10 |
| Categories and sheaves | 9 |
| Commutative algebra | 9 |
| Commutative harmonic analysis | 9 |
| Algebraic geometry | 8 |
| Differential geometry and manifolds | 8 |
| Integration | 8 |
| Partial differential equations | 8 |
| General algebra | 7 |
| Noncommutative harmonic analysis | 6 |
| Ordinary differential equations | 6 |
| Spectral theory of operators | 6 |
| Analytic geometry | 5 |
| Automorphic and modular forms | 5 |
| Classical analysis | 5 |
| Mathematical logic | 5 |
| Abstract groups | 4 |
| Ergodic theory | 4 |
| Probability theory | 4 |
| Topological vector spaces | 4 |
| General topology | 3 |
| Number theory | 3 |
| Von Neumann algebras | 2 |
| Set theory | 1 |
|-------------------------------------+---------------|

Next, counting the sources each area draws on, i.e. counting incoming arrows.

|-------------------------------------+---------|
| Area | Sources |
|-------------------------------------+---------|
| Algebraic geometry | 13 |
| Number theory | 12 |
| Lie groups | 11 |
| Noncommutative harmonic analysis | 11 |
| Algebraic and differential topology | 10 |
| Analytic geometry | 10 |
| Automorphic and modular forms | 10 |
| Ordinary differential equations | 10 |
| Ergodic theory | 9 |
| Partial differential equations | 9 |
| Abstract groups | 8 |
| Differential geometry and manifolds | 8 |
| Commutative algebra | 6 |
| Commutative harmonic analysis | 6 |
| Probability theory | 5 |
| Categories and sheaves | 4 |
| Homological algebra | 4 |
| Spectral theory of operators | 4 |
| Von Neumann algebras | 4 |
| General algebra | 2 |
| Mathematical logic | 1 |
| Set theory | 1 |
| Classical analysis | 0 |
| General topology | 0 |
| Integration | 0 |
| Topological vector spaces | 0 |
|-------------------------------------+---------|

Finally, connectedness, counting incoming and outgoing arrows.

|-------------------------------------+-------------|
| Area | Connections |
|-------------------------------------+-------------|
| Lie groups | 22 |
| Algebraic geometry | 21 |
| Algebraic and differential topology | 20 |
| Noncommutative harmonic analysis | 17 |
| Partial differential equations | 17 |
| Differential geometry and manifolds | 16 |
| Homological algebra | 16 |
| Ordinary differential equations | 16 |
| Analytic geometry | 15 |
| Automorphic and modular forms | 15 |
| Commutative algebra | 15 |
| Commutative harmonic analysis | 15 |
| Number theory | 15 |
| Categories and sheaves | 13 |
| Ergodic theory | 13 |
| Abstract groups | 12 |
| General algebra | 10 |
| Spectral theory of operators | 10 |
| Probability theory | 9 |
| Integration | 8 |
| Mathematical logic | 6 |
| Von Neumann algebras | 6 |
| Classical analysis | 5 |
| Topological vector spaces | 4 |
| General topology | 3 |
| Set theory | 2 |
|-------------------------------------+-------------|

There are some real quirks here. The most foundational areas get short shrift. Set theory contributes to only one area of math?! Topological vector spaces don’t depend on anything, not even topology?!

I suspect Dieudonné had in mind fairly high-level contributions. Topological vector spaces, for example, obviously depend on topology, but not deeply. You could do research in the area while seldom drawing on more than an introductory topology course. Elementary logic and set theory are used everywhere, but most mathematicians have no need for advanced logic or set theory.

Areas of math all draw on and contribute to each other. But there’s a sort of trade imbalance between areas. Some, like analytic number theory, are net importers. Others, like topology, are net exporters.

Analytic number theory uses the tools of analysis, especially complex analysis, to prove theorems about integers. The first time you see this it’s quite a surprise. But then you might expect that since analysis contributes to number theory, then number theory must contribute to analysis. But it doesn’t much.

Topology imports ideas from algebra. But it exports more than in imports, to algebra and to other areas. Topology started as a generalization of geometry. Along the way it developed ideas that extend far beyond geometry. For example, computer science, which ostensibly has nothing to do with whether you can continuously deform one shape into another, uses ideas from category theory that were developed initially for topology.

Here’s how Jean Dieudonné saw things. The following are my reconstructions of a couple diagrams from his book Panorama of Pure Mathematics, as seen by N. Bourbaki. An arrow from A to B means that A contributes to B, or B uses A.

For number theory, some of Dieudonné’s arrows go both ways, some only go into number theory. No arrows go only outward from number theory.

With topology, however, there’s a net flux out arrows going outward.

These diagrams are highly subjective. There’s plenty of room for disagreement. Dieudonné wrote his book 35 years ago, so you might argue that they were accurate at the time but need to be updated. In any case, the diagrams are interesting.

Update: See the next post of a larger diagram, sewing together little diagrams like the ones above.

I’d never given any thought to the order of polar coordinates until yesterday. They’re written (r, θ). That’s just how it is. But a friend pointed out two reasons why this bothers him.

First, r is typically a function of θ, just as y is typically a function of x. But in rectangular coordinates, the independent variable is the first element of a pair while in polar coordinates it is the second element.

Second, if you’re going to walk a mile northwest, how do you proceed? You first face northwest, then you walk for a mile. You don’t walk a mile to the east and then walk 135° counterclockwise along an arc centered where you started. That is, you set your θ first, then increase your r.

The (r, θ) convention isn’t going to change. Maybe the only take-away from this discussion is to appreciate someone’s confusion who sees polar coordinates for the first time and wants to reverse their order.

(I don’t use j for imaginary unit, except when writing Python code. The i notation is too firmly engraved in my brain. But I understand why j has advantages.)

You can associate a matrix with a graph and find the eigenvalues of that matrix. This gives you a spectrum associated with the graph. This spectrum tells you something about the graph analogous to the way the spectrum of a star’s light tells you something about the chemical composition of the start.

So maybe it would be useful to visualize graph spectra the way you visualize chemical spectra. How could you do this?

First, scale the spectrum of the graph to the visual spectrum. There are different kinds of matrices you can associate with a graph, and these have different natural ranges. But if we look at the spectrum of the Laplacian matrix, the eigenvalues are between 0 and n for a graph with n nodes.

Eigenvalues typically correspond to frequencies, not wavelengths, but chemical spectra are often displayed in terms of wave length. We can’t be consistent with both, so we’ll think of our eigenvalues as wavelengths. We could easily redo everything in terms of frequency.

This means we map 0 to violet (400 nm) and n to red (700 nm). So an eigenvalue of λ will be mapped to 400 + 300 λ/n. That tells us where to graph a spectral line, but what color should it be? StackOverflow gives an algorithm for converting wavelength to RGB values. Here’s a little online calculator based on the same code.

Here are some examples. First we start with a random graph.

In 1838, Rev. Henry Moseley discovered that a large number of mollusk shells and other shells can be described using three parameters: k, T, and D.

First imagine a thin wire running through the coil of the shell. In cylindrical coordinates, this wire follows the parameterization

r = e^{kθ} z = Tt

If T = 0 this is a logarithmic spiral in the (r, θ) plane. For positive T, the spiral is stretched so that its vertical position is proportional to its radius.

Next we build a shell by putting a tube around this imaginary wire. The radius R of the tube at each point is proportional to the r coordinate: R = Dr.

The image above was created using k = 0.1, T = 2.791, and D = 0.8845 using Øyvind Hammer’s seashell generating software. You can download Hammer’s software for Windows and experiment with your own shell simulations by adjusting the parameters.

In his new book The Perfect Shape, Øyvind Hammer shows how to create a graph something like the poster for Alfred Hitchcock’s movie Vertigo.

Hammer’s code uses a statistical language called Past that I’d never heard of. Here’s my interpretation of his code using Python.

import matplotlib.pyplot as plt
from numpy import arange, sin, cos, exp
i = arange(5000)
x1 = 1.0*cos(i/10.0)*exp(-i/2500.0)
y1 = 1.4*sin(i/10.0)*exp(-i/2500.0)
d = 450.0
vx = cos(i/d)*x1 - sin(i/d)*y1
vy = sin(i/d)*x1 + cos(i/d)*y1
plt.plot(vx, vy, "k")
h = max(vy) - min(vy)
w = max(vx) - min(vx)
plt.axes().set_aspect(w/h)
plt.show()

This code produces what’s called a harmonograph, related to the motion of a pendulum free to move in x and y directions:

It’s not exactly the same as the movie poster, but it’s definitely similar. If you find a way to modify the code to make it closer to the poster, leave a comment below.

The other day I ran across a blog post by Brian Hayes that linked to an article by David Cantrell on how to compute the inverse of the gamma function. Cantrell gives an approximation in terms of the Lambert W function.

In this post we’ll write a little Python code to kick the tires on Cantrell’s approximation. The post also illustrates how to do some common tasks using SciPy and matplotlib.

Here are the imports we’ll need.

import matplotlib.pyplot as plt
from scipy import pi, e, sqrt, log, linspace
from scipy.special import lambertw, gamma, psi
from scipy.optimize import root

First of all, the gamma function has a local minimum k somewhere between 1 and 2, and so it only makes sense to speak of its inverse to the left or right of this point. Gamma is strictly increasing for real values larger than k.

To find k we look for where the derivative of gamma is zero. It’s more common to work with the derivative of the logarithm of the gamma function than the derivative of the gamma function itself. That works just as well because gamma has a minimum where its log has a minimum. The derivative of the log of the gamma function is called ψ and is implemented in SciPy as scipy.special.psi. We use the function scipy.optimize.root to find where ψ is zero.

The root function returns more information than just the root we’re after. The root(s) are returned in the arrayx, and in our case there’s only one root, so we take the first element of the array:

How well goes this algorithm work? For starters, we’ll see how well it does when we do a round trip, following the exact gamma with the approximate inverse.

x = linspace(5, 30, 100)
plt.plot(x, AIG(gamma(x)))
plt.show()

This produces the following plot:

We get a straight line, as we should, so next we do a more demanding test. We’ll look at the absolute error in the approximate inverse. We’ll use a log scale on the x-axis since gamma values get large quickly.

y = gamma(x)
plt.plot(y, x- AIG(y))
plt.xscale("log")
plt.show()

This shows the approximation error is small, and gets smaller as its argument increases.

Cantrell’s algorithm is based on an asymptotic approximation, so it’s not surprising that it improves for large arguments.

The following quote gives a good description of the value of abstract mathematics. The quote speaks specifically of “universal algebra,” but consistent with the spirit of the quote you could generalize it to other areas of mathematics, especially areas such as category theory.

Universal algebra is the study of features common to familiar algebraic systems … [It] places the algebraic notions in their proper setting; it often reveals connexions between seemingly different concepts and helps to systemize one’s thoughts. … [T]his approach does not usually solve the whole problem for us, but only tidies up a mass of rather trivial detail, allowing us to concentrate our powers on the hard core of the problem.

Suppose you fill the components of a matrix with random values. How are the eigenvalues distributed?

We limit our attention to large, symmetric matrices. We fill the entries of the matrix on the diagonal and above the diagonal with random elements, then fill in the elements below the diagonal by symmetry.

If we choose our random values from a thin-tailed distribution, then Wigner’s semicircle law tells us what to expect from our distribution. If our matrices are large enough, then we expect the probability density of eigenvalues to be semicircular. To illustrate this, we’ll fill a matrix with samples from a standard normal distribution and compute its eigenvalues with the following Python code.

import numpy as np
import matplotlib.pyplot as plt
N = 5000
A = np.random.normal(0, 1, (N, N))
B = (A + A.T)/np.sqrt(2)
eigenvalues = np.linalg.eigvalsh(B)
print(max(eigenvalues), min(eigenvalues))
plt.hist(eigenvalues, bins=70)
plt.show()

We first create an N by N non-symmetric matrix, then make it symmetric by adding it to its transpose. (That’s easier than only creating the upper-triangular elements.) We divide by the square root of 2 to return the variance of each component to its original value, in this case 1. The eigenvalues in this particular experiment ran from -141.095 to 141.257 and their histogram shows the expected semi-circular shape.

Wigner’s semicircle law does not require the samples to come from a normal distribution. Any distribution with finite variance will do. We illustrate this by replacing the normal distribution with a Laplace distribution with the same variance and rerunning the code. That is, we change the definition of the matrix A to

A = np.random.laplace(0, np.sqrt(0.5), (N, N))

and get very similar results. The eigenvalues ran from -140.886 to 141.514 and again we see a semicircular distribution.

But what happens when we draw samples from a heavy-tailed distribution, i.e. one without finite variance? We’ll use a Student-t distribution with ν = 1.8 degrees of freedom. When ν > 2 the t-distribution has variance ν/(ν – 2), but for smaller values of ν it has no finite variance. We change the definition of the matrix A to the following:

A = np.random.standard_t(1.8, (N, N))

and now the distribution is quite different.

In this case the minimum eigenvalue was -9631.558 and the largest was 9633.853. When the matrix components are selected from a heavy-tailed distribution, the eigenvalues also have a heavier-tailed distribution. In this case, nearly all the eigenvalues are between -1000 and 1000, but the largest and smallest were 10 times larger. The eigenvalues are more variable, and their distribution looks more like a normal distribution and less like a semicircle.

The distributions for all thin-tailed symmetric matrices are the same. They have a universal property. But each heavy-tailed distribution gives rise to a different distribution on eigenvalues. With apologies to Tolstoy, thin-tailed matrices are all alike; every thick-tailed matrix is thick-tailed in its own way.

Update: As the first comment below rightfully points out, the diagonal entries should be divided by 2, not sqrt(2). Each of the off-diagonal elements of A + A^{T} are the sum of two independent random variables, but the diagonal elements are twice what they were in A. To put it another way, the diagonal elements are the sum of perfectly correlated random variables, not independent random variables.

I reran the simulations with the corrected code and it made no noticeable difference, either to the plots or to the range of the eigenvalues.