A crowded little chess puzzle

Posted on 13 May 2025 by John

Here’s a puzzle by Martin Gardner [1].

Can a queen, king, rook, bishop, and knight be placed on a 4² board so no piece attacks another?

There are two solutions, plus symmetries.

Note that in all non-attacking chess puzzles, the colors of the pieces are irrelevant. In the solutions I chose the piece colors to be the opposite of the square colors strictly for aesthetic reasons.

The non-attacking bishops problem

Posted on 10 May 2025 by John

How many bishops can you place on a chessboard so that no bishop is attacking any other bishop?

For a standard 8 × 8 chessboard the answer is 14. In general, for an n × n chessboard the answer is 2n − 2.

Here’s one way to place the maximum number of non-attacking bishops.

To see that the bishops cannot attack each other, I think it’s helpful to imagine extending the chessboard so that each bishop attacks the same number of squares. Then we can see that they miss each other.

Frequency of names of English monarchs

Posted on 9 May 2025 by John

After I wrote the code to make the bar graph of papal names for the previous post, I decided to reuse the code to make a similar graph for monarchs of England. Just as there is some complication in counting papal names, there are even more complications in counting names of English monarchs.

Who was the first king of England? I went with Æthelstan (924–927). Was Lady Jane Grey queen of England? Not for my chart. Note that Edward the Elder and Edward the Martyr came before Henry I.

Incidentally, John is the most common name for a pope and the least common for a king of England. Several monarch names are unique, but John’s name is conspicuously not reused since he was an odious king. I remember my world history teacher saying there would never be another English king named John, something I found disappointing at the time.

Frequency of papal names

Posted on 9 May 2025 by John

The new pope chose the name Leo XIV. That made me curious about the distribution of names of popes and so I made the graph below. (I’m Protestant, so wasn’t familiar to me.)

Looks like Leo is tied with Clement for fourth place, the top three names being John, Benedict, and Gregory.

There are a few oddities in counting the names due to the time in the Middle Ages when there was disagreement over who was pope. For this reason some popes are listed twice, sorta like how Grover Cleveland and Donald Trump each appear twice in the list of US presidents. And although the last pope named John was John XXIII, 21 popes have been named John: there was no John XX due to a clerical error, and John XVI was declared an antipope.

I also made a higher resolution PDF.

Converting between quaternions and rotation matrices

Posted on 7 May 2025 by John

In the previous post I wrote about representing rotations with quaternions. This representation has several advantages, such as making it clear how rotations compose. Rotations are often represented as matrices, and so it’s useful to be able to go between the two representations.

A unit-length quaternion (q₀, q₁, q₂, q₃) represents a rotation by an angle θ around an axis in the direction of (q₁, q₂, q₃) where cos(θ/2) = q₀. The corresponding rotation matrix is given below.

$R = \begin{pmatrix} 2(q_0^2 + q_1^2) - 1 & 2(q_1 q_2 - q_0 q_3) & 2(q_1 q_3 + q_0 q_2) \\ 2(q_1 q_2 + q_0 q_3) & 2(q_0^2 + q_2^2) - 1 & 2(q_1 q_3 - q_0 q_1) \\ 2(q_1 q_3 - q_0 q_2) & 2(q_2 q_3 + q_0 q_1) & 2(q_0^2 + q_3^2) - 1 \end{pmatrix}$

Going the other way around, inferring a quaternion representation from a rotation matrix, is harder. Here is a mathematically correct but numerically suboptimal method known [1] as the Chiaverini-Siciliano method.

$\begin{align*} q_0 &= \frac{1}{2} \sqrt{1 + r_{11} + r_{22} + r_{33}} \\ q_1 &= \frac{1}{2} \sqrt{1 + r_{11} - r_{22} - r_{33}} \text{ sgn}(r_{32} - r_{32}) \\ q_2 &= \frac{1}{2} \sqrt{1 - r_{11} + r_{22} - r_{33}} \text{ sgn}(r_{13} - r_{31}) \\ q_3 &= \frac{1}{2} \sqrt{1 - r_{11} - r_{22} + r_{33}} \text{ sgn}(r_{21} - r_{12}) \end{align*}$

Here sgn is the sign function; sgn(x) equals 1 if x is positive and −1 if x is negative. Note that the components only depend on the diagonal of the rotation matrix, aside from the sign terms. Better numerical algorithms make more use of the off-diagonal elements.

Accounting for degrees of freedom

Something seems a little suspicious here. Quaternions contain four real numbers, and 3 by 3 matrices contain nine. How can four numbers determine nine numbers? And going the other way, out of the nine, we essentially choose three that determine the four components of a quaternion.

Quaternions have four degrees of freedom, but we’re using unit quaternions, so there are basically three degrees of freedom. Likewise orthogonal matrices have three degrees of freedom. An axis of rotation is a point on a sphere, so that has two degrees of freedom, and the degree of rotation is the third degree of freedom.

In topological terms, the unit quaternions and the set of 3 by 3 orthogonal matrices are both three dimensional manifolds, and the former is a double cover of the latter. It is a double cover because a unit quaternion q corresponds to the same rotation as −q.

Python code

Implementing the equations above is straightforward.

import numpy as np

def quaternion_to_rotation_matrix(q):
    q0, q1, q2, q3 = q
    return np.array([
        [2*(q0**2 + q1**2) - 1, 2*(q1*q2 - q0*q3), 2*(q1*q3 + q0*q2)],
        [2*(q1*q2 + q0*q3), 2*(q0**2 + q2**2) - 1, 2*(q2*q3 - q0*q1)],
        [2*(q1*q3 - q0*q2), 2*(q2*q3 + q0*q1), 2*(q0**2 + q3**2) - 1]
    ]) 

def rotation_matrix_to_quaternion(R):
    r11, r12, r13 = R[0, 0], R[0, 1], R[0, 2]
    r21, r22, r23 = R[1, 0], R[1, 1], R[1, 2]
    r31, r32, r33 = R[2, 0], R[2, 1], R[2, 2]
    
    # Calculate quaternion components
    q0 = 0.5 * np.sqrt(1 + r11 + r22 + r33)
    q1 = 0.5 * np.sqrt(1 + r11 - r22 - r33) * np.sign(r32 - r23)
    q2 = 0.5 * np.sqrt(1 - r11 + r22 - r33) * np.sign(r13 - r31)
    q3 = 0.5 * np.sqrt(1 - r11 - r22 + r33) * np.sign(r21 - r12)
    
    return np.array([q0, q1, q2, q3])

Random testing

We’d like to test the code above by generating random quaternions, converting the quaternions to rotation matrices, then back to quaternions to verify that the round trip puts us back essentially where we started. Then we’d like to go the other way around, starting with randomly generated rotation matrices.

To generate a random unit quaternion, we generate a vector of four independent normal random values, then normalize by dividing by its length. (See this recent post.)

To generate a random rotation matrix, we use a generator that is part of SciPy.

Here’s the test code:

def randomq():
    q = norm.rvs(size=4)
    return q/np.linalg.norm(q)

def randomR():
    return special_ortho_group.rvs(dim=3)

np.random.seed(20250507)
N = 10

for _ in range(N):
    q = randomq()
    R = quaternion_to_rotation_matrix(q)
    t = rotation_matrix_to_quaternion(R)
    print(np.linalg.norm(q - t))
    
for _ in range(N):
    R = randomR()
    q = rotation_matrix_to_quaternion(R)
    T = quaternion_to_rotation_matrix(q)
    print(np.linalg.norm(R - T))

The first test utterly fails, returning six 2s, i.e. the round trip vector is as far as possible from the vector we started with. How could that happen? It must be returning the negative of the original vector. Now go back to the discussion above about double covers: q and −q correspond to the same rotation.

If we go back and add the line

    q *= np.sign(q[0])

then we standardize our random vectors to have a positive first component, just like the vectors returned by rotation_matrix_to_quaternion.

Now our tests all return norms on the order of 10⁻¹⁶ to 10⁻¹⁴. There’s a little room to improve the accuracy, but the results are good.

Update: I did some more random testing, and found errors on the order of 10⁻¹⁰. Then I was able to create a test case where rotation_matrix_to_quaternion threw an exception because one of the square roots had a negative argument. In [1] the authors get around this problem by evaluating two theoretically equivalent expressions for each of the square root arguments. The expressions are complementary in the sense that both should not lead to numerical difficulties at the same time.

[1] See “Accurate Computation of Quaternions from Rotation Matrices” by Soheil Sarabandi and Federico Thomas for a better numerical algorithm. See also the article “A Survey on the Computation of Quaternions From Rotation Matrices” by the same authors.

5,000th post

Posted on 6 May 2025 by John

This is the 5,000th post on this blog. I started blogging in 2008, and Wayne Joubert started contributing posts last year. We’ve written an average of between five and six posts a week for the last 17 years.

I thought about listing some of the most popular posts over the years, but that’s not as simple as it sounds. Popularity varies over time, and posts are popular with different people for different reasons. I don’t have a way of quantifying what posts have been popular with regular readers, but I’m sure such a list would be very different from the lists below.

Recent favorites

Here are posts that have been popular over the last year.

I knew that the ASCII post was popular, but before looking at stats I had no idea anyone was reading the other two posts. I imagine regular readers are more interested in things like my recent series on the logistic map.

Hacker News

The first post to go viral was Why programmers are not paid in proportion to their productivity. Hacker News sent the site more traffic than it could handle at the time.

Many of the posts that have seen a lot of traffic have been posted on Hacker News. I very much appreciate everyone who posts my work there. Because Hacker News readers tend to be programmers, my most popular posts have tended to be programming-related. The posts most popular with regular readers are not as tilted toward programming.

Here are a few more posts that have been popular because of Hacker News.

Code snippets

I didn’t realize until Tim Hopper pointed it out that a lot of projects on Github mention this blog, either to cite an article as a reference or to use code I’ve posted. That’s fine, by the way: feel free to use whatever you find useful. Here is Tim’s list of mentions.

Here’s an index of stand-alone code. Everything these code snippets do can be found in standard software libraries. However, these code samples remain popular because sometimes you cannot import a library or do not want to. I mentioned an example of this in the previous post. I’ve had several consulting projects where there was something new about their project that meant they had to develop basic mathematical software from scratch.

Calculators

This site started as a set of hand-written HTML pages, and there are a still a few such pages, especially calculators. Some of these have been surprisingly popular. (“Surprising” and “popular” seem to always go together. I can kinda predict when something will be moderately popular, but the most popular content is always a surprise to me.)

A note to new readers

If you’re new to this site, the links above may give a wrong impression. I mostly write about math and statistics, and occasionally about other topics such as data privacy or music. None of the posts above are typical.

If you’d like to be notified of posts as they come out, you can subscribe via RSS or follow one of my X accounts. I also have a newsletter where I introduce posts two or three at a time.

Thanks for reading.

Morse code and the limits of human perception

Posted on 23 April 2025 by John

Musician Adam Neely made a video asking What is the fastest music humanly possible? He emphasizes that he means the fastest music possible to hear, not the fastest to perform.

Screenshot of Bolton paper

The video cites a psychology article [1] from 1894 that found that most people can reliably distinguish an inter-onset interval (time between notes) of 100 ms [2]. It also gives examples of music faster than this, such as a performance of Vivaldi with an inter-onset interval of 83 ms [3]. The limit seem to be greater than 50 ms because a pulse train with an inter-onset interval of 50 ms starts to sound like a 20 Hz pitch.

People are able to receive Morse code faster than this implies is possible. We will explain how this works, but first we need to convert words per minute to inter-onset interval length.

Morse code timing

Morse code is made of dots and dashes, but it is also made of spaces of three different lengths: the space between the dots and dashes representing a single letter, the space between letters, and the space between words.

According to an International Telecommunication Union standard

A dash is equal to three dots.
The space between the signals forming the same letter is equal to one dot.
The space between two letters is equal to three dots.
The space between two words is equal to seven dots.

The same timing is referred to as standard in a US Army manual from 1957,

Notice that all the numbers above are odd. Since a dot or dash is always followed by a space, the duration of a dot or dash and its trailing space is always an even multiple of the duration of a dot.

If we think of a dot as a sixteenth note, Morse code is made of notes that are either sixteenth notes or three sixteenth notes tied together. Rests are equal to one, three, or seven sixteenths, and notes and rests must alternate. All notes start on an eighth note boundary, i.e. either on a down beat or an up beat but not in between.

Words per minute

Morse code speed is measured in words per minute. But what exactly is a “word”? Words have a variable number of letters, and even words with the same number of letters can have very different durations in Morse code.

The most common standard is to use PARIS as the prototypical word. Ten words per minute, for example, means that dots and dashes are coming at you as fast as if someone were sending the word PARIS ten times per minute. Here’s a visualization of the code for PARIS with each square representing the duration of a dot.

This has the duration of 50 dots.

How does this relate to inter-onset interval? If each duration of a dot is an interval, then n words per minute corresponds to 50n intervals per minute, or 60/50n = 1.2/n seconds per interval.

This would imply that 12 wpm would correspond to an inter-onset interval of around 100 ms, pushing the limit of perception. But 12 wpm is a beginner speed. It’s not uncommon for people to receive Morse code at 30 wpm. The world record, set by Theodore Roosevelt McElroy in 1939, is 75.2 wpm.

What’s going on?

In the preceding section I assumed a dot is an interval when calculating inter-onset interval length. In musical terms, this is saying a sixteenth note is an interval. But maybe we should count eighth notes as intervals. As noted before, every “note” (dot or dash) starts on a down beat or up beat. Still, that would say 20 wpm is pushing the limit of perception, and we know people can listen faster than that.

You don’t have to hear with the precision of the diagram above in order to recognize letters. And you have context clues. Maybe you can’t hear the difference between “E E E” and “O”, but in ordinary prose the latter is far more likely.

E E E vs O

At some skill level people quit hearing individual letters and start hearing words, much like experienced readers see words rather than letters. I’ve heard that this transition happens somewhere between 20 wpm and 30 wpm. That would be consistent with the estimate above that 20 wpm is pushing the limit of perception letter by letter. But 30 words per minute is doable. It’s impressive, but not unheard of.

What I find hard to believe is that there were intelligence officers, such as Terry Jackson, who could copy encrypted text at 50 wpm. Here a “word” is a five-letter code group. There are millions of possible code groups [4], all equally likely, and so it would be impossible to become familiar with particular code groups the way one can become familiar with common words. Maybe they learned to hear pairs or triples of letters.

[1] Thaddeus L. Bolton. Rhythm. The American Journal of Psychology. Vol. VI. No. 2. January, 1894. Available here.

[2] Interonset is not commonly hyphenated, but I’ve hyphenated it here for clarity.

[3] The movement Summer from Vivaldi’s The Four Seasons performed at 180 bpm which corresponds to 720 sixteenth notes per minute, each 83 ms long.

[4] If a code group consisted entirely of English letters, there are 26⁵ = 11,881,376 possible groups. If a code group can contain digits as well, there would be 36⁵ = 60,466,176 possible groups.

Estimating satellite altitude from apparent motion

Posted on 6 April 2025 by John

If you see a light moving in the sky, how could you tell whether it’s an airplane or a satellite? If it is a satellite, could you tell how high of an orbit it’s in?

For an object in a circular orbit,

v² = μ/r

where r is the orbit radius and μ is the standard gravitational parameter. For a satellite orbiting the earth,

μ = 5.1658 × 10¹² km³/hr².

The orbit radius r is the earth’s radius R plus the altitude a above the earth’s surface.

r = R + a

where the mean radius of the earth is R = 6,371 km.

Angular velocity is

v / r = √(μ/r³)

This velocity is relative to an observer hovering far above the earth. An observer on the surface of the earth would need to subtract the earth’s angular velocity to get the apparent velocity [1].

from numpy import pi, sqrt

def angular_v(altitude): # in radians/hour
    R = 6371 # mean earth radius in km
    r = altitude + R
    mu = 5.1658e12 # km^3/hr^2
    return sqrt(mu/r**3)

def apparent_angular_v(altitude):
    earth_angular_v = 2*pi/(23 + 56/60 + 4/3600)
    return angular_v(altitude) - earth_angular_v

Here’s a plot of apparent angular velocity for altitudes between 350 km (initial orbit of a Starlink satellite) and geostationary orbit (35,786 km).

Radians per hour is a little hard to conceptualize; degrees per minute would be easier. But fortunately, the two are nearly the same. One radian per hour is 3/π = 0.955 degrees per minute.

The apparent size of the moon is about 0.5°, so you could estimate angular velocity by seeing how long it takes a satellite to cross the moon (or a region of space the width of the moon). It would only take an object in low earth orbit (LEO) a few seconds to cross the moon.

Can you see objects in medium earth orbit (MEO) or high earth orbit (HEO)? Yes. Although most satellites are in LEO, objects in higher orbits may be larger, and if they’re reflective enough they may be visible.

Airplanes move much slower than LEO satellites. An airliner at altitude 10 km moving 1000 km/hr would have an apparent angular velocity of 0.16 radians per hour. An object appearing to move that slowly is either an airplane or an object in HEO, and very likely the former.

[1] The earth completes a full rotation (2π radians) in 23 hours 56 minutes and 4 seconds, so its angular velocity is 0.2625 radians per hour. Why not 24 hours? That’s the length of a rotation of the earth relative to the sun. Since the earth moves in its orbit relative to the sun while it rotates, it has to rotate a little longer (i.e. for about 4 more minutes) to reach the same position relative to the sun.

Gluons, quarks, letters, and envelopes

Posted on 16 March 2025 by John

Yesterday I wrote a couple of posts about a combinatorics question that lead to OEIS sequence A000255. That page has this intriguing line:

This comment derives from a family of recurrences found by Malin Sjodahl for a combinatorial problem for certain quark and gluon diagrams (Feb 27 2010)

I love how pulling on a thread can lead you to unexpected places. What in the world does my little permutation problem have to do with particle physics‽

I found Malin Sjödahl’s website, and apparently the citation above refers to this paper from 2009. The author’s site doesn’t list any papers from 2010. Maybe the paper was published electronically in 2009 and in print in 2010.

Counting tensors

The paper is mostly opaque to me since I don’t know particle physics, but at one point Sjödahl says

The problem of finding all such topologies is equivalent to the number of ways of mapping N elements to each other without mapping a single one to itself …

and says that the solution is

$N! \sum_{i=0}^N \frac{(-1)^i}{i!} \to \frac{N!}{e}$

Sjödahl is not counting physical permutations but the number possible tensors associated with gluons and quark interaction diagrams.

The right-hand side above is essentially the same as the asymptotic estimate for the function Q(n) in the previous post.

I didn’t find the recurrence that the OEIS comment alluded to. Perhaps I’m not looking at the same paper. Perhaps I’m not looking hard enough because I’m skimming a paper whose contents I don’t understand.

The Montmort letter problem

In more mathematical terminology, Sjödahl is counting the number of permutations of N objects with no fixed point, known as the number derangements of a set N objects.

If you divide by the number of possible permutations N! you have the probability that a random permutation moves every object.

This was historically known as the Montmort letter problem, named after Pierre-Remond Montmort who asked the following question. If N letters are randomly assigned to N envelopes, what is the probability that no letter ends up in the correct envelope?

The probability converges to 1/e as N approaches infinity. It approaches 1/e quickly, and so this is a good approximation even for moderate N. More on the rate of convergence in the next post.

Previous posts in this series

Computing the nth digit of π directly

Posted on 14 March 2025 by John

The following equation, known as the BBP formula [1], will let you compute the nth digit of π directly without having to compute the previous digits.

$\pi = \sum_{n=0}^\infty \frac{1}{16^n} \left( \frac{4}{8n+1} - \frac{2}{8n+4} - \frac{1}{8n+5} - \frac{1}{8n+6}\right)$

I’ve seen this claim many times, but I’ve never seen it explained in much detail how this formula lets you extract digits of π.

First of all, this formula lets you calculate the digits of π, not in base 10, but in base 16, i.e. in hexadecimal.

It looks like the BBP formula might let you extract hexadecimal digits. After all, the hexadecimal expansion of π is the set of coefficients a_n such that

$\pi = \sum_{n=0}^\infty \frac{a_n}{16^n}$

where each a_n is an integer between 0 and 15. But the term multiplying 16⁻ⁿ in the BBP formula is not an integer, so it’s not that easy. Fortunately, it’s not that hard either.

Warmup

As a warmup, let’s think how we would find the nth digit of π in base 10. We’d multiply by 10ⁿ, throw away the factional part, and look at the last digit. That is, we would calculate

$\left\lfloor 10^n \pi \right\rfloor \bmod 10$

Another approach would be to multiply by 10ⁿ⁻¹, shifting the digit that we’re after to the first digit after the decimal point, then take the integer part of 10 times the fraction part.

$\left\lfloor 10 \{10^{n-1} \pi\} \right\rfloor$

Here {x} denotes the fractional part of x. This is a little more complicated, but now all the calculation is inside the floor operator.

For example, suppose we wanted to find the 4th digit of π. Multiplying by 10³ gives 3141.592… with fractional part 0.592…. Multiplying by 10 gives 5.92… and taking the floor gives us 5.

100th hex digit

Now to find the nth hexadecimal digit we’d do the analogous procedure, replacing 10 with 16. To make things concrete, let’s calculate the 100th hexadecimal digit. We need to calculate

$\left\lfloor 16 \left\{\sum_{n=0}^\infty 16^{99-n} \left( \frac{4}{8n+1} - \frac{2}{8n+4} - \frac{1}{8n+5} - \frac{1}{8n+6}\right) \right\} \right\rfloor$

We can replace the infinite sum with a sum up to 99 because the remaining terms sum to an amount too small to change our answer. Note that we’re being sloppy here, but this step is justified in this particular example.

Here’s the trick that makes this computation practical: when calculating the fractional part, we can carry out the calculation of the first term mod (8n + 1), and the second part mod (8n + 4), etc. We can use fast modular exponentiation, the same trick that makes, for example, a lot of encryption calculations practical.

Here’s code that evaluates the Nth hexadecimal digit of π by evaluating the expression above.

def hexdigit(N):
    s = 0
    for n in range(N):
        s += 4*pow(16, N-n-1, 8*n + 1) / (8*n + 1)
        s -= 2*pow(16, N-n-1, 8*n + 4) / (8*n + 4)
        s -=   pow(16, N-n-1, 8*n + 5) / (8*n + 5)
        s -=   pow(16, N-n-1, 8*n + 6) / (8*n + 6)
    frac = s - floor(s)
    return floor(16*frac)

Here the three-argument version of pow, introduced into Python a few years ago, carries out modular exponentiation efficiently. That is, pow(b, x, m) calculates b^x mod m.

This code correctly calculates that the 100th hex digit of π is C. Note that we did not need 100 hex digits (400 bits) of precision to calculate the 100th hex digit of π. We used standard precision, which is between 15 and 16 bits.

Improved code

We can improve the code above by adding an estimate of the infinite series we ignored.

A more subtle improvement is reducing the sum accumulator variable s mod 1. We only need the fractional part of s, and so by routinely cutting off the integer part we keep s from getting large. This improves accuracy by devoting all the bits in the machine representation of s to the fractional part.

epsilon = np.finfo(float).eps

def term(N, n):
     return 16**(N-1-n) * (4/(8*n + 1) - 2/(8*n+4) - 1/(8*n+5) - 1/(8*n + 6))

def hexdigit(N):
    s = 0
    for n in range(N):
        s += 4*pow(16, N-n-1, 8*n + 1) / (8*n + 1)
        s -= 2*pow(16, N-n-1, 8*n + 4) / (8*n + 4)
        s -=   pow(16, N-n-1, 8*n + 5) / (8*n + 5)
        s -=   pow(16, N-n-1, 8*n + 6) / (8*n + 6)
        s = s%1

    n = N + 1    
    while True:
        t = term(N, n)
        s += t
        n += 1
        if abs(t) < epsilon:
            break
        
    frac = s - floor(s)
    return floor(16*frac)

I checked the output of this code with [1] for N = 10⁶, 10⁷, 10⁸, and 10⁹. The code will fail due to rounding error for some very large N. We could refine the code to push this value of N a little further out, but eventually machine precision will be the limiting factor.

[1] Named after the initials of the authors. Bailey, David H.; Borwein, Peter B.; Plouffe, Simon (1997). On the Rapid Computation of Various Polylogarithmic Constants. Mathematics of Computation. 66 (218) 903–913.

Uncategorized