Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

Overtones and Barbershop Quartets

Posted on 24 April 2025 by John

I’ve heard that barbershop quartets often sing the 7th in a dominant 7th a little flat in order to bring the note closer in tune with the overtone series. This post will quantify that assertion.

The overtones of a frequency f are 2f, 3f, 4f, 5f, etc. The overtone series is a Fourier series.

Here’s a rendering of the C below middle C and its overtones.

$\score { \new Staff { \clef treble <c c' g' c'' e'' bes''>1 } \layout {} \midi {} }$

We perceive sound on a logarithmic scale. So although the overtone frequencies are evenly spaced, they sound like they’re getting closer together.

Overtones and equal temperament

Let’s look at the notes in the chord above and compare the frequencies between the overtone series and equal temperament tuning.

Let f be the frequency of the lowest note. The top four notes in this overtone series have frequencies 4f, 5f, 6f, and 7f. They form a C⁷ chord [1].

In equal temperament, these four notes have frequencies 2^24/12 f, 2^28/12 f, 2^31/12 f, and 2^34/12 f. This works out to 4, 5.0397, 5.9932, and 7.1272 times the fundamental frequency f

The the highest note, the B♭, is the furthest from its overtone counterpart. The frequency is higher than that of the corresponding overtone, so you’d need to perform it a little flatter to bring it in line with the overtone series. This is consistent with the claim at the top of the post.

Differences in cents

How far apart are 7f and 7.1272f in terms of cents, 100ths of a semitone?

The difference between two frequencies, f₁ and f₂, measured in cents is

1200 log₂(f₁ / f₂).

To verify this, note that this says an octave equals 1200 cents, because log₂ 2 = 1.

So the difference between the B♭ in equal temperament and in the 7th note of the overtone series is 31 cents.

The difference between the E and the 5th overtone is 14 cents, and the difference between the G and the 6th overtone is only 2 cents.

Equipentatonic scale

Posted on 24 April 2025 by John

I ran across a video that played around with the equipentatonic scale [1]. Instead of dividing the octave into 12 equal parts, as is most common in Western music, you divide the octave into 5 equal parts. Each note in the equipentatonic scale has a frequency 2^1/5 times its predecessor.

The equipentatonic scale is used in several non-Western music systems. For example, the Javanese slendro tuning system is equipentatonic, as is Babanda music from Uganda.

In the key of C, the nearest approximants of the notes in the equipentatonic scale are C, D, F, G, A#. Approximate equipentatonic scale

In the image above [2], the D is denoted as half sharp, 50 cents higher than D. (A cent is 1/100 of a half step.) The actual pitch is a D plus 40 cents, so the half sharp is closer, but still not exactly right.

The F should be 20 cents lower, the G should be 20 cents higher, and the A# should be 10 cents higher.

Notation

The conventional chromatic scale is denoted 12-TET, an abbreviation for 12 tone equal temperament. In general n-TET denotes the scale that results from dividing the octave into n parts. The previous discussion was looking at how 5-TET aligns with 12-TET.

A step in 5-TET corresponds to 2.4 steps in 12-TET. This is approximately 5 steps in 24-TET, the scale we’d get by adding quarter tones between all the notes in the chromatic scale.

Math

When we talk of dividing the octave evenly into n parts, we mean evenly on a logarithmic scale because we perceive music on a log scale.

The notation will be a little cleaner if we start counting from 0. Then the kth note the n-TET scale is proportional to 2^k/n.

The proportionality constant is the starting pitch. So if we start on middle C, the frequencies are 262 × 2^k/n Hz. The nth frequency is twice the starting frequency, i.e. an octave higher.

We can measure how well an m-TET scale can be approximated by notes from an n-TET scale with the following function:

$d(m, n) = \max_j \min_k \left|\frac{j}{m} - \frac{k}{n} \right|$

Note that this function is asymmetric: d(m, n) does not necessarily equal d(n, m). For example, d(12, 24) = 0 because a quarter tone scale contains exact matches for every note in a semitone scale. But d(24, 12) = 1/24 because the quarter tone scale contains notes in between the notes of the semitone scale.

The equipentatonic scale lines up better with the standard chromatic scale than would a 7-note scale or an 11-note scale. That is, d(5, 12) is smaller than d(7, 12) or d(11, 12). Something similar holds if we replace 12 with 24: d(5, 24) is smaller than d(m, 24) for any m > 1 that is relatively prime to 24.

[1] The video first presents 10-TET and then defines 5-TET as taking every other note from this scale.

[2] The image was created with the following Lilypond code.

\score {
  \new Staff {
    \relative c' {
      c1 dih f g aih c \bar "|."
    }
  }
  \layout {
    \context {
      \Staff
      \remove "Bar_engraver"
      \remove "Time_signature_engraver"
    }
  }
}

The multiple coupon collector problem

Posted on 23 April 2025 by John

I’ve written about the Coupon Collector Problem and variations several times, most recently here.

Brian Beckman left a comment linking to an article he wrote, which in turn links to a paper on the Double Dixie Cup Problem [1].

The idea behind the Coupon Collector Problem is to estimate how long it will take to obtain at least one of each of n coupons when drawing coupons at random with replacement from a set of n.

The Double Dixie Cup Problem replaces coupons with dixie cups, and it asks how many couples you’d expect to need to collect until you have two of each cup.

The authors in [1] answer the more general question of how long to collect m of each cup, so m = 1 reduces to the Coupon Collector Problem. The punchline is Theorem 2 from the paper: the expected number of cups (or coupons) is

$n \left( \log n + (m-1) \log \log n + C_m + o(1) \right)$

for fixed m as n → ∞.

Here C_m is a constant that depends on m (but not on n) and o(1) is a term that goes to zero as n → ∞.

Since log log n is much smaller than log n when n is large, this says that collecting multiples of each coupon doesn’t take much longer, on average, than collecting one of each coupon. It takes substantially less time than playing the original coupon collector game m times.

[1] Donald J. Newman and Lawrence Shepp. The Double Dixie Cup Problem. The American Mathematical Monthly, Vol. 67, No. 1 (Jan., 1960), pp. 58–61.

Morse code and the limits of human perception

Posted on 23 April 2025 by John

Musician Adam Neely made a video asking What is the fastest music humanly possible? He emphasizes that he means the fastest music possible to hear, not the fastest to perform.

Screenshot of Bolton paper

The video cites a psychology article [1] from 1894 that found that most people can reliably distinguish an inter-onset interval (time between notes) of 100 ms [2]. It also gives examples of music faster than this, such as a performance of Vivaldi with an inter-onset interval of 83 ms [3]. The limit seem to be greater than 50 ms because a pulse train with an inter-onset interval of 50 ms starts to sound like a 20 Hz pitch.

People are able to receive Morse code faster than this implies is possible. We will explain how this works, but first we need to convert words per minute to inter-onset interval length.

Morse code timing

Morse code is made of dots and dashes, but it is also made of spaces of three different lengths: the space between the dots and dashes representing a single letter, the space between letters, and the space between words.

According to an International Telecommunication Union standard

A dash is equal to three dots.
The space between the signals forming the same letter is equal to one dot.
The space between two letters is equal to three dots.
The space between two words is equal to seven dots.

The same timing is referred to as standard in a US Army manual from 1957,

Notice that all the numbers above are odd. Since a dot or dash is always followed by a space, the duration of a dot or dash and its trailing space is always an even multiple of the duration of a dot.

If we think of a dot as a sixteenth note, Morse code is made of notes that are either sixteenth notes or three sixteenth notes tied together. Rests are equal to one, three, or seven sixteenths, and notes and rests must alternate. All notes start on an eighth note boundary, i.e. either on a down beat or an up beat but not in between.

Words per minute

Morse code speed is measured in words per minute. But what exactly is a “word”? Words have a variable number of letters, and even words with the same number of letters can have very different durations in Morse code.

The most common standard is to use PARIS as the prototypical word. Ten words per minute, for example, means that dots and dashes are coming at you as fast as if someone were sending the word PARIS ten times per minute. Here’s a visualization of the code for PARIS with each square representing the duration of a dot.

This has the duration of 50 dots.

How does this relate to inter-onset interval? If each duration of a dot is an interval, then n words per minute corresponds to 50n intervals per minute, or 60/50n = 1.2/n seconds per interval.

This would imply that 12 wpm would correspond to an inter-onset interval of around 100 ms, pushing the limit of perception. But 12 wpm is a beginner speed. It’s not uncommon for people to receive Morse code at 30 wpm. The world record, set by Theodore Roosevelt McElroy in 1939, is 75.2 wpm.

What’s going on?

In the preceding section I assumed a dot is an interval when calculating inter-onset interval length. In musical terms, this is saying a sixteenth note is an interval. But maybe we should count eighth notes as intervals. As noted before, every “note” (dot or dash) starts on a down beat or up beat. Still, that would say 20 wpm is pushing the limit of perception, and we know people can listen faster than that.

You don’t have to hear with the precision of the diagram above in order to recognize letters. And you have context clues. Maybe you can’t hear the difference between “E E E” and “O”, but in ordinary prose the latter is far more likely.

E E E vs O

At some skill level people quit hearing individual letters and start hearing words, much like experienced readers see words rather than letters. I’ve heard that this transition happens somewhere between 20 wpm and 30 wpm. That would be consistent with the estimate above that 20 wpm is pushing the limit of perception letter by letter. But 30 words per minute is doable. It’s impressive, but not unheard of.

What I find hard to believe is that there were intelligence officers, such as Terry Jackson, who could copy encrypted text at 50 wpm. Here a “word” is a five-letter code group. There are millions of possible code groups [4], all equally likely, and so it would be impossible to become familiar with particular code groups the way one can become familiar with common words. Maybe they learned to hear pairs or triples of letters.

[1] Thaddeus L. Bolton. Rhythm. The American Journal of Psychology. Vol. VI. No. 2. January, 1894. Available here.

[2] Interonset is not commonly hyphenated, but I’ve hyphenated it here for clarity.

[3] The movement Summer from Vivaldi’s The Four Seasons performed at 180 bpm which corresponds to 720 sixteenth notes per minute, each 83 ms long.

[4] If a code group consisted entirely of English letters, there are 26⁵ = 11,881,376 possible groups. If a code group can contain digits as well, there would be 36⁵ = 60,466,176 possible groups.

Running the Gregorian calendar backwards

Posted on 21 April 2025 by John

Toward the end of last year I wrote several blog posts about calendars. The blog post about the Gregorian calendar began with this paragraph.

The time it takes for the earth to orbit the sun is not an integer multiple of the time it takes for the earth to rotate on its axis, nor is it a rational number with a small denominator. Why should it be? Much of the complexity of our calendar can be explained by rational approximations to an irrational number.

The post went on to say why the Gregorian calendar was designed as it was.

Calendar year and astronomical year

In a nutshell, the average length of a year in the Gregorian calendar is 365 ⁹⁷/₄₀₀ days, which matches the astronomical year much better than the Julian calendar, which has an average year length of 365 ¼ days.

In the Julian calendar, every year divisible by 4 is a leap year. The Gregorian calendar makes an exception: centuries are not leap years unless they are divisible by 400. So the year 2000 was a leap year, but 1700, 1800, and 1900 were not. Instead of having 100 leap years every 400 years, the Gregorian calendar has 97 leap years every 400 years.

Why does it matter whether the calendar year matches the astronomical year? In the short run it makes little difference, but in the long run it matters more. Under the Julian calendar, the Spring equinox occurred around March 21. If the world had remained on the Julian calendar, the date of the Spring equinox would drift later and later, moving into the summer and eventually moving through the entire year. Plants that bloom in March would eventually bloom in what we’d call July. And instead of the dog days of summer being in July, eventually they’d be in what we’d call November.

Removing days

The Gregorian calendar wanted to do two things: stop the drift of the seasons. and restore the Spring equinox to March 21. The former could have been accomplished with little disruption by simply using the Gregorian calendar moving forward. The latter was more disruptive since it required removing days from the calendar.

The Julian year was too long, gaining 3 days every 400 years. So between the time of Jesus and the time of Pope Gregory, the calendar had drifted by about 12 days. In order to correct for this, the calendar would have to jump forward about a dozen years. If you think moving clocks forward an hour is disruptive, imagine moving the calendar forward a dozen days.

The Gregorian calendar didn’t remove 12 days; it removed 10. In the first countries to adopt the new calendar in 1582, Thursday, October 4th in 1582 was followed by Friday, October 15th. Note that Thursday was followed by Friday as usual. The seven-day cycle of days of the week was not disrupted. That’ll be important later on.

Why did the Gregorian calendar remove 10 days and not 12?

We can think of the 10 days that were removed as corresponding to previous years that the Julian calendar considered a leap year but that the Gregorian calendar would not have: 1500, 1400, 1300, 1100, 1000, 900, 700, 600, 500, and 300. Removing 10 days put the calendar in sync astronomically with the 300’s. This is significant because the Council of Nicaea met in 325 and made decisions regarding the calendar. Removing 10 days in 1582 put the calendar in sync with the calendar at the time of the council.

Making the calendar retroactive

The calendar that pushes the Gregorian calendar back in time, before the adoption of the calendar, is known as the proleptic Gregorian calendar. [1]

Most scholars say Jesus was crucified on Friday, April 3, 33 AD. What exactly does “April 3, 33” mean? Was that a Julian date or a Gregorian date? There’s a possible difference of two days, corresponding to whether or not the years 100 and 200 were considered leap years.

If we were to push the Gregorian calendar back to the first century, the calendar for April in 33 AD would be the same as the calendar in 2033 AD (five cycles of 400 years later). April 3, 2033 is on a Sunday. (You could look that up, or use the algorithm given here.) April 3, 33 in the Julian calendar corresponds to April 1, 33 in the Gregorian calendar. So April 3, 33 was a Friday in the Julian calendar, the calendar in use at the time.

Some scholars date the crucifixion as Friday, April 7, 30 AD. That would also be a Julian date.

[1] Origin of prolepsis according to Etymonline:

1570s, “anticipation, the taking of something anticipated as already done or existing,” also “the assignment of something to a too early date,” from Latin prolepsis, from Greek prolēpsis “an anticipating,” etymologically “a taking beforehand,” from prolambanein “to take before, receive in advance,” from pro “before” + lambanein “to take”.

Fredholm Alternative

Posted on 20 April 2025 by John

The Fredholm alternative is so called because it is a theorem by the Swedish mathematician Erik Ivar Fredholm that has two alternative conclusions: either this is true or that is true. This post will state a couple forms of the Fredholm alternative.

Mr. Fredholm was interested in the solutions to linear integral equations, but his results can be framed more generally as statements about solutions to linear equations.

This is the third in a series of posts, starting with a post on kernels and cokernels, followed by a post on the Fredholm index.

Fredholm alternative warmup

Given an m×n real matrix A and a column vector b, either

Ax = b

has a solution or

A^T y = 0 has a solution y^Tb ≠ 0.

This is essentially what I said in an earlier post on kernels and cokernels. From that post:

Suppose you have a linear transformation T: V → W and you want to solve the equation Tx = b. … If c is an element of W that is not in the image of T, then Tx = c has no solution, by definition. In order for Tx = b to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Tx = b is to have a solution.

In this context you could say that the Fredholm alternative boils down to saying either b is in the image of A or it isn’t. If b isn’t in. the image of A, then it has some component in the complement of the image of A, i.e. it has a component in the cokernel, the kernel of A^T.

The Fredholm alternative

I’ve seen the Fredholm alternative stated several ways, and the following from [1] is the clearest. The “alternative” nature of the theorem is a corollary rather than being explicit in the theorem.

As stated above, Fredholm’s interest was in integral equations. These equations can be cast as operators on Hilbert spaces.

Let K be a compact linear operator on a Hilbert space H. Let I be the identity operator and A = I − K. Let A* denote the adjoint of A.

The null space of A is finite dimensional,
The image of A is closed.
The image of A is the orthogonal complement of the kernel of A*.
The null space of A is 0 iff the image of A is H.
The dimension of the kernel of A equals the dimension of the kernel of A*.

The last point says that the kernel and cokernel have the same dimension, and the first point says these dimensions are finite. In other words, the Fredholm index of A is 0.

Where is the “alternative” in this theorem?

The theorem says that there are two possibilities regarding the inhomogeneous equation

Ax = f.

One possibility is that the homogeneous equation

Ax = 0

has only the solution x = 0, in which case the inhomogeneous equation has a unique solution for all f in H.

The other possibility is that homogeneous equation has non-zero solutions, and the inhomogeneous has solutions has a solution if and only if f is orthogonal to the kernel of A*, i.e. if f is orthogonal to the cokernel.

Freedom and constraint

We said in the post on kernels and cokernels that kernels represent degrees of freedom and cokernels represent constraints. We can add elements of the kernel to a solution and still have a solution. Requiring f to be orthogonal to the cokernel is a set of constraints.

If the kernel of A has dimension n, then the Fredholm alternative says the cokernel of A also has dimension n.

If solutions x to Ax = f have n degrees of freedom, then right-hand sides f must satisfy n constraints. Each degree of freedom for x corresponds to a basis element for the kernel of A. Each constraint on f corresponds to a basis element for the cokernel that f must be orthogonal to.

[1] Lawrence C. Evans. Partial Differential Equations, 2nd edition

Fredholm index

Posted on 20 April 2025 by John

The previous post on kernels and cokernels mentioned that for a linear operator T: V → W, the index of T is defined as the difference between the dimension of its kernel and the dimension of its cokernel:

index T = dim ker T − dim coker T.

The index was first called the Fredholm index, because of it came up in Fredholm’s investigation of integral equations. (More on this work in the next post.)

Robustness

The index of a linear operator is robust in the following sense. If V and W are Banach spaces and T: V → W is a continuous linear operator, then there is an open set around T in the space of continuous operators from V to W on which the index is constant. In other words, small changes to T don’t change its index.

Small changes to T may alter the dimension of the kernel or the dimension of the cokernel, but they don’t alter their difference.

Relation to Fredholm alternative

The next post discusses the Fredholm alternative theorem. It says that if K is a compact linear operator on a Hilbert space and I is the identity operator, then the Fredholm index of I − K is zero. The post will explain how this relates to solving linear (integral) equations.

Analogy to Euler characteristic

We can make an exact sequence with the spaces V and W and the kernel and cokernel of T as follows:

0 → ker T → V → W → coker T → 0

All this means is that the image of one map is the kernel of the next.

We can take the alternating sum of the dimensions of the spaces in this sequence:

dim ker T − dim V + dim W − dim coker T.

If V and W have the same finite dimension, then this alternating sum equals the index of T.

The Euler characteristic is also an alternating sum. For a simplex, the Euler characteristic is defined by

V − E + F

where V is the number of vertices, E the number of edges, and F the number of faces. We can extend this to higher dimensions as the number of zero-dimensional object (vertices), minus the number of one-dimensional objects (edges), plus the number of two-dimensional objects, minus the number of three dimensional objects, etc.

A more sophisticated definition of Euler characteristic is the alternating sum of the dimensions of cohomology spaces. These also form an exact sequence.

The Atiyah-Singer index theorem says that for elliptic operators on manifolds, two kinds of index are equal: the analytical index and the topological index. The analytical index is essentially the Fredholm index. The topological index is derived from topological information about the manifold.

This is analogous to the Gauss-Bonnet theorem that says you can find the Euler characteristic, a topological invariant, by integrating Gauss curvature, an analytic calculation.

Kernel and Cokernel

Posted on 19 April 2025 by John

The kernel of a linear transformation is the set of vectors mapped to 0. That’s a simple idea, and one you’ll find in every linear algebra textbook.

The cokernel is the dual of the kernel, but it’s much less commonly mentioned in textbooks. Sometimes the idea of a cokernel is there, but it’s not given that name.

Degrees of Freedom and Constraints

One way of thinking about kernel and cokernel is that the kernel represents degrees of freedom and the cokernel represents constraints.

Suppose you have a linear transformation T: V → W and you want to solve the equation Tx = b. If x is a solution and Tk = 0, then x + k is also a solution. You are free to add elements of the kernel of T to a solution.

If c is an element of W that is not in the image of T, then Tx = c has no solution, by definition. In order for Tx = b to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Tx = b is to have a solution.

If W is an inner product space, we can define the cokernel as the orthogonal complement to the image of T.

Another way to think of the kernel and cokernel is that in the linear equation Ax = b, the kernel consists of degrees of freedom that can be added to x, and the cokernel consists of degrees of freedom that must be removed from b in order for there to be a solution.

Cokernel definitions

You may also see the cokernel defined as the quotient space W / image(T). This is not the same space as the complement of the image of T. The former is a subset of W and the latter is a new space. However, these two spaces are isomorphic. This is a little bit of foreshadowing: the most general idea of a cokernel will only hold up to isomorphism.

You may also see the cokernel defined as the kernel of the adjoint of T. This suggests where the name “cokernel” comes from: the dual of the kernel of an operator is the kernel of the dual of the operator.

Kernel and cokernel dimensions

I mentioned above that there are multiple ways to define cokernel. They don’t all define the same space, but they define isomorphic spaces. And since isomorphic spaces have the same dimension, all definitions of cokernel give spaces with the same dimension.

There are several big theorems related to the dimensions of the kernel and cokernel. These are typically included in linear algebra textbooks, even those that don’t use the term “cokernel.” For example, the rank-nullity theorem can be stated without explicitly mentioning the cokernel, but it is equivalent to the following.

For a linear operator T: V → W ,

dim V − dim W = dim ker T − dim coker T.

When V or W are finite dimensional, both sides are well defined. Otherwise, the right side may be defined when the left side is not. For example, let V and W both be the space of functions analytic in the entire complex plane and let T be the operator that takes the second derivative. Then the left side is ∞ − ∞ but the right side is 2: the kernel is all functions of the form az + b and the cokernel is 0 because every analytic function has an antiderivative.

The right hand side of the equation above is the definition of the index of a linear operator. This is the subject of the next post.

Full generality

Up to this point the post has discussed kernels and cokernels in the context of linear algebra. But we could define the kernel and cokernel in contexts with less structure, such as groups, or more structure, such as Sobolev spaces. The most general definition is in terms of category theory.

Category theory makes the “co” in cokernel obvious. The cokernel will is the dual of the kernel in the same way that every thing in category is related to its co- thing: simply turn all the arrows around. This makes the definition of cokernel easier, but it makes the definition of kernel harder.

We can’t simply define the kernel as “the stuff that gets mapped to 0” because category has no way to look inside objects. We can only speak of objects in terms of how they interact with other objects in their category. There’s no direct way to define 0, much less things that map to 0. But we can define something that acts like 0 (initial objects), and things that act like maps to 0 (zero morphisms), if the category we’re working in contains such things; not all categories do.

For all the tedious details, see the nLab articles on kernel and cokernel.

Topological Abelian Groups

Posted on 18 April 2025 by John

This post will venture further into abstract mathematics than most of my posts. If this isn’t what you’re looking for, you might try browsing here for more concrete articles.

Incidentally, although I’m an applied mathematician, I also appreciate pure math. I imagine most applied mathematicians do as well. But what I do not appreciate is pseudo-applied math, pure math that pretends to be more useful than it is. Pure math is elegant. Applied math is useful. The best math is elegant and useful. Pseudo-applied math is the worst because it is neither elegant nor useful [1].

***

A common theme in pure mathematics, and especially the teaching of pure mathematics, is to strip items of interest down to their most basic properties, then add back properties gradually. One motivation for this is to prove theorems assuming no more structure than necessary.

Choosing a level of abstraction

For example, we can think of the Euclidean plane as the vector space ℝ², but we can think of it as having less structure or more structure. If we just think about adding and subtracting vectors, and forget about scalar multiplication for a moment, then ℝ² is an Abelian group. We could ignore the fact that addition is commutative and think of it simply as a group. We could continue to ignore properties and go down to monoids, semigroups, and magmas.

Going the other direction, there is more to the plane than it’s algebraic structure. We can think of ℝ² as a topological space, in fact a Hausdorff space, and in fact a metric space. We could think of the plane as topological vector space, a Banach space, and more specifically a Hilbert space.

In short, there are many ways to classify the plane as a mathematical object, and we can pick the one best suited for a particular purpose, one with enough structure to get done what we want to get done, but one without additional structure that could be distracting or make our results less general.

Topological groups

A topological group is a set with a topological structure, and a group structure. Furthermore, the two structures must play nicely together, i.e. we require the group operations to be continuous.

Unsurprisingly, an Abelian topological group is a topological group whose group structure is Abelian.

Not everything about Abelian topological groups is unsurprising. The motivation for this post is a surprise that we’ll get to shortly.

Category theory

A category is a collection of objects and structure-preserving maps between those objects. The meaning of “structure-preserving” varies with context.

In the context of vector spaces, maps are linear transformations. In the context of groups, the maps are homomorphisms. In the context of topological spaces, the maps are continuous functions.

In the previous section I mentioned structures playing nicely together. Category theory makes this idea of playing together nicely explicit by requiring maps to have the right structure-preserving properties. So while the category of groups has homomorphisms and the category of topological spaces has continuous functions, the category of topological groups has continuous homomorphisms.

Abelian categories

The category of Abelian groups is much nicer than the category of groups. This takes a while to appreciate. Abelian groups are groups after all, so isn’t the category of Abelian groups just a part of the category of groups? No, it’s more subtle than that. Here’s a post that goes into the distinction between the categories of groups and Abelian groups.

The category of Abelian groups is so nice that the term Abelian category was coined to describe categories that are as nice as the category of Abelian groups. To put it another way, the category of Abelian groups is the archetypical Abelian category.

Now here’s the surprise I promised above: the category of topological Abelian groups is not an Abelian category. More on that at nLab.

If you naively think of an Abelian category as a category containing Abelian things, then this makes no sense. If a grocery cart is a cart containing groceries, then you’d think a gluten-free grocery cart is a grocery cart containing gluten-free groceries.

A category is not merely a container, like a shopping cart, but a working context. What makes an Abelian category special is that it has several nice properties as a working context. If that idea is new to you, I’d recommend carefully reading this post.

[1] This reminds me of the quote by William Morris: “Have nothing in your houses that you do not know to be useful or believe to be beautiful.”

Millionth powers

Posted on 17 April 2025 by John

I was poking around Richard Stanley’s site today and found the following problem on his miscellaneous page.

Find a positive integer n < 10,000,000 such that the first four digits (in the decimal expansion) of n^1,000,000 are all different. The problem should be solved in your head.

The solution is not unique, but the solution Stanley gives is n = 1,000,001. Why should that work?

Let M = 1,000,000. We will show that the first four digits of (M + 1)^M are 2718.

$\begin{align*} (1 + M)^M &= \left(M\left(1 + \frac{1}{M}\right)\right)^M \\ &= M^M \left(1 + \frac{1}{M}\right)^M \\ &\approx M^M e \\ &= 2718\ldots \end{align*}$

This uses the fact that (1 + 1/n)ⁿ → e as n → ∞. If you’re doing this in your head, as Stanley suggests, you’re going to have to take it on faith that setting n = M will give you at least 4 decimals of e, which it does.

If you allow yourself to use a computer, you can use the bounds

$\left(1 + \frac{1}{n}\right)^n < e < \left(1 + \frac{1}{n}\right)^{n+1}$

to prove that sticking in n = M gives you a value between 2.718280 and 2.718283. So in fact we get 6 correct decimals, and we only needed 4.

There are many solutions to Stanley’s puzzle, the smallest being 4. The first four digits of 4^M are 9802. How could you determine this?

You may not be able to compute 4^M and look at its first digits, depending on your environment, but you can tell the first few digits of a number from its approximate logarithm.

log₁₀ 4^M = M log₁₀ 4 = 602059.9913279624.

It follows that

4^M = 10⁶⁰²⁰⁵⁹ 10^0.9913279624 = 9.80229937666385 × 10⁶⁰²⁰⁵⁹.

There are many other solutions: 7, 8, 12, 14, 16, …

Solving hard problems

Overtones and equal temperament

Differences in cents

More music posts

Notation

Math

Related posts

Related posts

Morse code timing

Words per minute

What’s going on?

Related posts

Calendar year and astronomical year

Removing days

Making the calendar retroactive

Related posts

Fredholm alternative warmup

The Fredholm alternative

Freedom and constraint

Robustness

Relation to Fredholm alternative

Analogy to Euler characteristic

Other posts in this series

Degrees of Freedom and Constraints

Cokernel definitions

Kernel and cokernel dimensions

Full generality

Related posts

Choosing a level of abstraction

Topological groups

Category theory

Abelian categories

Related posts

Related posts