Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

Embeddings, Projections, and Inverses

Posted on 14 May 2025 by John

I just revised a post from a week ago about rotations. The revision makes explicit the process of embedding a 3D vector into the quaternions, then pulling it back out.

The 3D vector is embedded in the quaternions by making it the vector part of a quaternion with zero real part:

(p₁, p₂, p₃) → (0, p₁, p₂, p₃)

and the quaternion is returned to 3D by cutting off the real part:

(p₀, p₁, p₂, p₃) → (p₁, p₂, p₃).

To give names to the the process of moving to and from quaternions, we have an embedding E : ℝ³ to ℝ⁴ and a projection P from ℝ⁴ to ℝ³.

We can represent E as a 4 × 3 matrix

$E = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}$

and P by a 3 × 4 matrix

$P = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

We’d like to say E and P are inverses. Surely P undoes what E does, so they’re inverses in that sense. But E cannot undo P because you lose information projecting from ℝ⁴ to ℝ³ and E cannot recover information that was lost.

The rest of this post will look at three generalizations of inverses and how E and P relate to each.

Left and right inverse

Neither matrix is invertible, but PE equals the identity matrix on ℝ³ , and so P is a left inverse of E and E is a right inverse of P.

$PE = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}$

On the other hand, EP is not an identity matrix, and so E is not a left inverse of E, and neither is P a right inverse of E.

$EP = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Adjoint

P is the transpose of E, which means it is the adjoint of E. Adjoints are another way to generalize the idea of an inverse. More on that here.

Pseudo-inverse

The Moore-Penrose pseudo-inverse acts a lot like an inverse, which is somewhat uncanny because all matrices have a pseudo-inverse, even rectangular matrices.

Pseudo-inverses are symmetric, i.e. if A⁺ is the pseudo-inverse of A, then A is the pseudo-inverse of A⁺.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A⁺ is the unique n by m matrix satisfying four conditions:

A A⁺ A = A
A⁺ A A⁺ = A⁺
(A A⁺)* = A A⁺
(A⁺ A)* = A⁺ A

To show that A⁺ = P we have to establish

EPE = E
PEP = A⁺
(EP)* = EP
(PE)* = PE

We calculated EP and PE above, and both are real and symmetric, so properties 3 and 4 hold.

We can also compute

$EPE = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = E$

and

$PEP = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} =P$

showing that properties 1 and 2 hold as well.

Alternative exp and log notation

Posted on 13 May 2025 by John

The other day I stumbled on an article [1] that advocated writing a^b as a↑b and log_a(b) as a↓b.

$\begin{align*} a &\uparrow b \equiv a^b \\ a &\downarrow b \equiv \log_a b \end{align*}$

This is a special case of Knuth’s up arrow and down arrow notation. Knuth introduces his arrows with the intention of repeating them to represent hyperexponentiation and iterated logarithms. But the emphasis in [1] is more on the pedagogical advantages of using a single up or down arrow.

Advantages

One advantage is that the notation is more symmetric. Exponents and logs are inverses of each other, and up and down arrows are visual inverses of each other.

Another advantage is that the down arrow notation makes the base of the logarithm more prominent, which is sometimes useful.

Finally, the up and down arrow notation is more typographically linear: a↑b and a↓b stay within a line, whereas a^b and log_a(b) extend above and below the line. LaTeX handles subscripts and superscripts well, but HTML doesn’t. That’s one reason I usually write exp(x) rather than e^x here.

Comparison

Here are the basic properties of logs and exponents using conventional notation.

$\begin{align} a^b = c &\iff \log_a c = b \\ \log_b 1 &= 0 \\ \log_b b &= 1 \\ \log_b(b^x) &= x \\ b^{\log_b x} &= x \\ \log_b xy &= \log_b x + \log_b y \\ \log_b \frac{x}{y} &= \log_b x - \log_by \\ a^{\log_b c} &= c^{\log_b a} \\ \log_a b^c &= c (\log_a b) \\ (\log_b a) (\log_a x) &= \log_b x \end{align}$

Here are the same properties using up and down arrow notation.

$\begin{align} a \uparrow b = c &\iff a \downarrow c = b \\ b \downarrow 1 &= 0 \\ b \downarrow b &= 1 \\ b \downarrow (b \uparrow x) &= x \\ b \uparrow (b \downarrow x) &= x \\ b \downarrow xy &= b \downarrow x + b \downarrow y \\ b \downarrow \frac{x}{y} &= b \downarrow x - b \downarrow y \\ a \uparrow (b \downarrow c) &= c \uparrow (b \downarrow a ) \\ a \downarrow (b \uparrow c) &= c (a \downarrow b) \\ (b \downarrow a) (a \downarrow x) &= b \downarrow x \end{align}$

[1] Margaret Brown. Some Thoughts on the Use of Computer Symbols in Mathematics. The Mathematical Gazette, Vol. 58, No. 404 (Jun., 1974), pp. 78-79

Decimal Separator and Internationalization

Posted on 13 May 2025 by John

This morning I ran across the following tip from Joost Helberg on Mastodon:

TIL don’t report numbers with three digits after the decimal point. People may interpret the decimal point as a thousands separator. Using 2 or 4 digits, although wrong, avoids off by a factor thousand errors.

I usually report four decimal places, but I hadn’t thought about that in relation to the decimal separator problem.

In software development, it’s best to let a library handle numeric input and output, using local conventions. Failure to do so can lead to problems, as I’ll never forget.

Embarrassed in Bordeaux

In 2006, Peter Thall and I gave a week-long course on Bayesian clinical trial design in Bordeaux, France. Part of the course was presenting software that my team at MD Anderson Cancer Center had developed.

A few minutes into my first presentation I realized the software wasn’t working for the course attendees. The problem had to do with the US and France using opposite conventions for decimal separator and thousands separator. I had tested our software on a French version of Windows, but I had entered integers in my testing and decimals in my presentation.

I apologized and asked my French audience to enter decimals in the American style, such as 3.14 rather than 3,14. But that didn’t work either!

We were using a Windows API for parsing input, which correctly handles input and output per local conventions. But we had written custom validation code. We checked that the input fields contained only valid numeric characters, i.e. digits and periods. Oops!

Users were between a rock and hard place. The input validation would not accept French notation, and the parsing code would not accept American notation.

The solution was for the attendees to set their operating system locale to the US. They were gracious about having to apply the hack and said that it was a common problem. It was a humiliating way to start the course, but the rest of the week went well.

A crowded little chess puzzle

Posted on 13 May 2025 by John

Here’s a puzzle by Martin Gardner [1].

Can a queen, king, rook, bishop, and knight be placed on a 4² board so no piece attacks another?

There are two solutions, plus symmetries.

Note that in all non-attacking chess puzzles, the colors of the pieces are irrelevant. In the solutions I chose the piece colors to be the opposite of the square colors strictly for aesthetic reasons.

Formulating eight queens as a SAT problem

Posted on 11 May 2025 by John

The Boolean satisfiability problem is to determine whether there is a way to assign values to variables in a set of Boolean formulas to make the formulas hold [1]. If there is a solution, the next task would be to enumerate the solutions.

You can solve the famous eight queens problem, or its generalization to n-queens, by formulating the problem as a Boolean formula then using a SAT solver to find the solutions.

It’s pretty obvious how to start. A chessboard is an 8 by 8 grid, so you have a variable for each square on the board, representing whether or not that square holds a queen. Call the variables b_ij where i and j run from 1 to 8.

The requirement that every row contains exactly one queen can be turned into two subrequirements:

Each row contains at least one queen.
Each row contains at most one queen.

The first requirement is easy. For each row i, we have a clause

b_i1 ∨ b_i2 ∨ b_i3 ∨ … ∨ b_i8

The second requirement is harder. How do you express in terms of our boolean variables that there is no more than one queen in each row? This is the key difficulty. If we can solve this problem, then we’re essentially done. We just need to do the analogous thing for columns and diagonals. (We don’t require a queen on every diagonal, but we require that there be at most one queen on every diagonal.)

First approach

There are two ways to encode the requirement that every row contain at most one queen. The first is to use implication. If there’s a queen in the first column, then there is not a queen in the remaining columns. If there’s a queen in the second column, then there is not a queen in all but the second column, etc. We have an implication for each row in each column. Let’s just look at the first row and first column.

b₁₁ ⇒ ¬ (b₁₂ ∨ b₁₃ ∨ … ∨ b₁₈)

We can turn an implication of the form a ⇒ b into the clause ¬a ∨ b.

Second approach

The second way to encode the requirement that every row contain at most one queen is to say that for every pair of squares in a row (a, b) either a has no queen or b has no queen. So for the first row we would have ₈C₂ = 28 clauses because there are 28 ways to choose pairs from a set of 8 things.

(¬b₁₁ ∨ ¬b₁₂) ∧ (¬b₁₁ ∨ ¬b₁₃) ∧ … ∧ (¬b₁₇ ∨ ¬b₁₈)

An advantage of this approach is that it directly puts the problem into conjunctive normal form (CNF). That is, our formula is a conjunction of terms that contain only disjunctions, an AND of ORs.

[1] You’ll see the SAT problem described as finding the solution to a Boolean formula. If you have multiple formulas, then the first holds, and the second, etc. So you can AND them all together to make one formula.

Special solutions to the eight queens problem

Posted on 11 May 2025 by John

There are 92 ways to place eight queens on a chessboard so that no queen is attacking any other. These fall into 12 equivalence classes. The 92 solutions are all rotations and reflections of these 12 basic solutions.

If you think about the previous numbers a minute, you might wonder why the total number of solutions is not a multiple of 12. There is one particular solution that is more symmetric than the others. So the total number of solutions 92 breaks down into

8 × 11 + 4 × 1.

To illustrate this, let’s look at two fundamental solutions: one that is particularly ordered and one that is particularly disordered in a sense that we’ll get to later on.

Most basic solutions, like the one below, are part of an equivalence class of eight solutions.

You can rotate the board 90° or reflect it about the middle[1], or some combination of both [2]. This amounts to eight possibilities.

This solution, however, is more symmetric.

You can rotate the basic solution, but flipping it over does not create a new solution: a flip produces the same result as two 90° rotations.

It’s curious that there is only one highly symmetric solution to the eight queens problem. When I first saw the problem as a child I expected all the solutions to be highly symmetric. That may be why I wasn’t able to find a solution.

Among the 11 basic solutions that are less ordered, the one shown above is uniquely disordered in the following sense: no three queens lie on a straight line.

The eight queens problem is a problem about restricted straight lines. It says no two queens lie on the same rank, file, or diagonal. But if we look at all straight lines, then of course there is a line through any two queens. In the most orderly solution, every queen is on a straight line with two others. In the least orderly solution, no queen is on a straight line with two others.

In 1900 Henry Dudenay introduced the no-three-in-line problem, looking at ways to place points on a lattice such that no line goes through three points, with no restriction on the slope of the lines. So one family of solutions to the eight queens problem is also a solution to the no-three-in-line problem.

[1] It doesn’t matter whether you flip about the horizontal or vertical axis.

[2] In fancy terminology, the action of the dihedral group D₈ applied to a solution yields another solution.

The non-attacking bishops problem

Posted on 10 May 2025 by John

How many bishops can you place on a chessboard so that no bishop is attacking any other bishop?

For a standard 8 × 8 chessboard the answer is 14. In general, for an n × n chessboard the answer is 2n − 2.

Here’s one way to place the maximum number of non-attacking bishops.

To see that the bishops cannot attack each other, I think it’s helpful to imagine extending the chessboard so that each bishop attacks the same number of squares. Then we can see that they miss each other.

Sorting Roman numerals

Posted on 9 May 2025 by John

This morning I wrote about the frequencies of names for popes and kings. This involved sorting strings with Roman numerals since it’s common for popes and kings to have Roman numerals after their names.

Something that surprised me was that sorting Roman numerals alphabetically roughly sorts them in numerical order, especially for small numbers. It’s not perfect. For example, IX comes before V in alphabetical order.

Everyone who has done much work with data will have run into the problem of a column of numbers being sorted alphabetically rather than numerically. For example, “10” comes between “1” and “2” even though 10 comes after 1 and 2.

So you can’t sort numerals, Roman or Arabic, as strings and expect them to appear in numerical order. But Roman numbers come close when you’re sorting small numbers, such as I through XXIII for popes named John or I through VIII for kings of England named Henry.

To illustrate this, I plotted how well string sort order correlates with numeric order for Roman and Arabic numbers, for the sequence 1 … n for increasing values of n. I measured correlation using Spearman’s rank-order correlation. I tried Kendall’s tau and as well and got similar results.

Alphabetical order and numerical order for Roman numerals agree pretty well up to XXXVIII, with just a few numbers out of place, namely IX, XIX, and XXIX. But alphabetical order and numerical order diverge quite a bit for Arabic numerals when all the numbers between 10 and 19 come before 2.

As you go further out, alphabetical order and numerical order diverge for both writing systems, but especially for Roman numerals.

Frequency of names of English monarchs

Posted on 9 May 2025 by John

After I wrote the code to make the bar graph of papal names for the previous post, I decided to reuse the code to make a similar graph for monarchs of England. Just as there is some complication in counting papal names, there are even more complications in counting names of English monarchs.

Who was the first king of England? I went with Æthelstan (924–927). Was Lady Jane Grey queen of England? Not for my chart. Note that Edward the Elder and Edward the Martyr came before Henry I.

Incidentally, John is the most common name for a pope and the least common for a king of England. Several monarch names are unique, but John’s name is conspicuously not reused since he was an odious king. I remember my world history teacher saying there would never be another English king named John, something I found disappointing at the time.

Frequency of papal names

Posted on 9 May 2025 by John

The new pope chose the name Leo XIV. That made me curious about the distribution of names of popes and so I made the graph below. (I’m Protestant, so wasn’t familiar to me.)

Looks like Leo is tied with Clement for fourth place, the top three names being John, Benedict, and Gregory.

There are a few oddities in counting the names due to the time in the Middle Ages when there was disagreement over who was pope. For this reason some popes are listed twice, sorta like how Grover Cleveland and Donald Trump each appear twice in the list of US presidents. And although the last pope named John was John XXIII, 21 popes have been named John: there was no John XX due to a clerical error, and John XVI was declared an antipope.

I also made a higher resolution PDF.

Solving hard problems

Embeddings, Projections, and Inverses

Left and right inverse

Adjoint

Pseudo-inverse

Related posts

Alternative exp and log notation

Advantages

Comparison

Related posts

Decimal Separator and Internationalization

Embarrassed in Bordeaux

A crowded little chess puzzle

More chess posts

More Martin Gardner posts

Formulating eight queens as a SAT problem

First approach

Second approach

Related posts

Special solutions to the eight queens problem

Related posts

The non-attacking bishops problem

Related posts

Sorting Roman numerals

Frequency of names of English monarchs

Frequency of papal names