Uncategorized

Multiplying by quaternions on the left and right

The map that takes a quaternion x to the quaternion qx is linear, so it can be represented as multiplication by a matrix. The same is true of the map that takes x to xq, but the two matrices are not the same because quaternion multiplication does not commute.

Let qa + bi + cjdk and let qM be the matrix that represents multiplication on the left by q. Then

_qM = \begin{bmatrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \\ \end{bmatrix}

Now let Mq be the matrix that represents multiplication on the right by q. Then

M_q = \begin{bmatrix} a & -b & -c & -d \\ b & a & d & -c \\ c & -d & a & b \\ d & c & -b & a \\ \end{bmatrix}

Can prove both matrix representations are correct by showing that they do the right thing when q = 1, ij, and k. The rest follows by linearity.

You might speculate that the matrix representation for multiplying on the right by q might be the transpose of the matrix representation for multiplying on the left by q. You can look at the matrices above and see that’s not the case.

In this post I talk about how to represent rotations with quaternions, and in this post I give an equation for the equivalent rotation matrix for a rotation described by a quaternion. You can prove that the matrix representation is correct by multiplying out qM and Mq* . Keep in mind that q in that case is a unit quaterion, so the sum of the squares of its components add to 1.

Related posts

Alternative exp and log notation

The other day I stumbled on an article [1] that advocated writing ab as ab and loga(b) as ab.

\begin{align*} a &\uparrow b \equiv a^b  \\ a &\downarrow b \equiv \log_a b \end{align*}

This is a special case of Knuth’s up arrow and down arrow notation. Knuth introduces his arrows with the intention of repeating them to represent hyperexponentiation and iterated logarithms. But the emphasis in [1] is more on the pedagogical advantages of using a single up or down arrow.

Advantages

One advantage is that the notation is more symmetric. Exponents and logs are inverses of each other, and up and down arrows are visual inverses of each other.

Another advantage is that the down arrow notation makes the base of the logarithm more prominent, which is sometimes useful.

Finally, the up and down arrow notation is more typographically linear:  ab and ab stay within a line, whereas ab and loga(b) extend above and below the line. LaTeX handles subscripts and superscripts well, but HTML doesn’t. That’s one reason I usually write exp(x) rather than ex here.

Comparison

Here are the basic properties of logs and exponents using conventional notation.

\begin{align} a^b = c &\iff \log_a c = b \\ \log_b 1 &= 0 \\ \log_b b &= 1 \\ \log_b(b^x) &= x \\ b^{\log_b x} &= x \\ \log_b xy &= \log_b x + \log_b y \\ \log_b \frac{x}{y} &= \log_b x - \log_by \\ a^{\log_b c} &= c^{\log_b a} \\ \log_a b^c &= c (\log_a b) \\ (\log_b a) (\log_a x) &= \log_b x \end{align}

Here are the same properties using up and down arrow notation.

\begin{align} a \uparrow b = c &\iff a \downarrow c = b \\ b \downarrow 1 &= 0 \\ b \downarrow b &= 1 \\ b \downarrow (b \uparrow x) &= x \\ b \uparrow (b \downarrow x) &= x \\ b \downarrow xy &= b \downarrow x + b \downarrow y \\ b \downarrow \frac{x}{y} &= b \downarrow x - b \downarrow y \\ a \uparrow (b \downarrow c) &= c \uparrow (b \downarrow a ) \\ a \downarrow (b \uparrow c) &= c (a \downarrow b) \\ (b \downarrow a) (a \downarrow x) &= b \downarrow x \end{align}

Related posts

[1] Margaret Brown. Some Thoughts on the Use of Computer Symbols in Mathematics. The Mathematical Gazette, Vol. 58, No. 404 (Jun., 1974), pp. 78-79

A crowded little chess puzzle

Here’s a puzzle by Martin Gardner [1].

Can a queen, king, rook, bishop, and knight be placed on a 4² board so no piece attacks another?

There are two solutions, plus symmetries.

Note that in all non-attacking chess puzzles, the colors of the pieces are irrelevant. In the solutions I chose the piece colors to be the opposite of the square colors strictly for aesthetic reasons.

More chess posts

More Martin Gardner posts

[1] Martin Gardner. Some New Results on Nonattacking Chess Tasks. Math Horizons. February 2001, pp 10–12.

The non-attacking bishops problem

How many bishops can you place on a chessboard so that no bishop is attacking any other bishop?

For a standard 8 × 8 chessboard the answer is 14. In general, for an n × n chessboard the answer is 2n − 2.

Here’s one way to place the maximum number of non-attacking bishops.

To see that the bishops cannot attack each other, I think it’s helpful to imagine extending the chessboard so that each bishop attacks the same number of squares. Then we can see that they miss each other.

Related posts

Frequency of names of English monarchs

After I wrote the code to make the bar graph of papal names for the previous post, I decided to reuse the code to make a similar graph for monarchs of England. Just as there is some complication in counting papal names, there are even more complications in counting names of English monarchs.

Who was the first king of England? I went with Æthelstan (924–927). Was Lady Jane Grey queen of England? Not for my chart. Note that Edward the Elder and Edward the Martyr came before Henry I.

Incidentally, John is the most common name for a pope and the least common for a king of England. Several monarch names are unique, but John’s name is conspicuously not reused since he was an odious king. I remember my world history teacher saying there would never be another English king named John, something I found disappointing at the time.

Frequency of papal names

The new pope chose the name Leo XIV. That made me curious about the distribution of names of popes and so I made the graph below. (I’m Protestant, so wasn’t familiar to me.)

Looks like Leo is tied with Clement for fourth place, the top three names being John, Benedict, and Gregory.

There are a few oddities in counting the names due to the time in the Middle Ages when there was disagreement over who was pope. For this reason some popes are listed twice, sorta like how Grover Cleveland and Donald Trump each appear twice in the list of US presidents. And although the last pope named John was John XXIII, 21 popes have been named John: there was no John XX due to a clerical error, and John XVI was declared an antipope.

I also made a higher resolution PDF.

Converting between quaternions and rotation matrices

In the previous post I wrote about representing rotations with quaternions. This representation has several advantages, such as making it clear how rotations compose. Rotations are often represented as matrices, and so it’s useful to be able to go between the two representations.

A unit-length quaternion (q0, q1, q2, q3) represents a rotation by an angle θ around an axis in the direction of (q1, q2, q3) where cos(θ/2) = q0. The corresponding rotation matrix is given below.

R = \begin{pmatrix} 2(q_0^2 + q_1^2) - 1 & 2(q_1 q_2 - q_0 q_3) & 2(q_1 q_3 + q_0 q_2) \\ 2(q_1 q_2 + q_0 q_3) & 2(q_0^2 + q_2^2) - 1 & 2(q_1 q_3 - q_0 q_1) \\ 2(q_1 q_3 - q_0 q_2) & 2(q_2 q_3 + q_0 q_1) & 2(q_0^2 + q_3^2) - 1 \end{pmatrix}

Going the other way around, inferring a quaternion representation from a rotation matrix, is harder. Here is a mathematically correct but numerically suboptimal method known [1] as the Chiaverini-Siciliano method.

\begin{align*} q_0 &= \frac{1}{2} \sqrt{1 + r_{11} + r_{22} + r_{33}} \\ q_1 &= \frac{1}{2} \sqrt{1 + r_{11} - r_{22} - r_{33}} \text{ sgn}(r_{32} - r_{32}) \\ q_2 &= \frac{1}{2} \sqrt{1 - r_{11} + r_{22} - r_{33}} \text{ sgn}(r_{13} - r_{31}) \\ q_3 &= \frac{1}{2} \sqrt{1 - r_{11} - r_{22} + r_{33}} \text{ sgn}(r_{21} - r_{12}) \end{align*}

Here sgn is the sign function; sgn(x) equals 1 if x is positive and −1 if x is negative. Note that the components only depend on the diagonal of the rotation matrix, aside from the sign terms. Better numerical algorithms make more use of the off-diagonal elements.

Accounting for degrees of freedom

Something seems a little suspicious here. Quaternions contain four real numbers, and 3 by 3 matrices contain nine. How can four numbers determine nine numbers? And going the other way, out of the nine, we essentially choose three that determine the four components of a quaternion.

Quaternions have four degrees of freedom, but we’re using unit quaternions, so there are basically three degrees of freedom. Likewise orthogonal matrices have three degrees of freedom. An axis of rotation is a point on a sphere, so that has two degrees of freedom, and the degree of rotation is the third degree of freedom.

In topological terms, the unit quaternions and the set of 3 by 3 orthogonal matrices are both three dimensional manifolds, and the former is a double cover of the latter. It is a double cover because a unit quaternion q corresponds to the same rotation as −q.

Python code

Implementing the equations above is straightforward.

import numpy as np

def quaternion_to_rotation_matrix(q):
    q0, q1, q2, q3 = q
    return np.array([
        [2*(q0**2 + q1**2) - 1, 2*(q1*q2 - q0*q3), 2*(q1*q3 + q0*q2)],
        [2*(q1*q2 + q0*q3), 2*(q0**2 + q2**2) - 1, 2*(q2*q3 - q0*q1)],
        [2*(q1*q3 - q0*q2), 2*(q2*q3 + q0*q1), 2*(q0**2 + q3**2) - 1]
    ]) 

def rotation_matrix_to_quaternion(R):
    r11, r12, r13 = R[0, 0], R[0, 1], R[0, 2]
    r21, r22, r23 = R[1, 0], R[1, 1], R[1, 2]
    r31, r32, r33 = R[2, 0], R[2, 1], R[2, 2]
    
    # Calculate quaternion components
    q0 = 0.5 * np.sqrt(1 + r11 + r22 + r33)
    q1 = 0.5 * np.sqrt(1 + r11 - r22 - r33) * np.sign(r32 - r23)
    q2 = 0.5 * np.sqrt(1 - r11 + r22 - r33) * np.sign(r13 - r31)
    q3 = 0.5 * np.sqrt(1 - r11 - r22 + r33) * np.sign(r21 - r12)
    
    return np.array([q0, q1, q2, q3])

Random testing

We’d like to test the code above by generating random quaternions, converting the quaternions to rotation matrices, then back to quaternions to verify that the round trip puts us back essentially where we started. Then we’d like to go the other way around, starting with randomly generated rotation matrices.

To generate a random unit quaternion, we generate a vector of four independent normal random values, then normalize by dividing by its length. (See this recent post.)

To generate a random rotation matrix, we use a generator that is part of SciPy.

Here’s the test code:

def randomq():
    q = norm.rvs(size=4)
    return q/np.linalg.norm(q)

def randomR():
    return special_ortho_group.rvs(dim=3)

np.random.seed(20250507)
N = 10

for _ in range(N):
    q = randomq()
    R = quaternion_to_rotation_matrix(q)
    t = rotation_matrix_to_quaternion(R)
    print(np.linalg.norm(q - t))
    
for _ in range(N):
    R = randomR()
    q = rotation_matrix_to_quaternion(R)
    T = quaternion_to_rotation_matrix(q)
    print(np.linalg.norm(R - T))

The first test utterly fails, returning six 2s, i.e. the round trip vector is as far as possible from the vector we started with. How could that happen? It must be returning the negative of the original vector. Now go back to the discussion above about double covers: q and −q correspond to the same rotation.

If we go back and add the line

    q *= np.sign(q[0])

then we standardize our random vectors to have a positive first component, just like the vectors returned by rotation_matrix_to_quaternion.

Now our tests all return norms on the order of 10−16 to 10−14. There’s a little room to improve the accuracy, but the results are good.

Update: I did some more random testing, and found errors on the order of 10−10. Then I was able to create a test case where rotation_matrix_to_quaternion threw an exception because one of the square roots had a negative argument. In [1] the authors get around this problem by evaluating two theoretically equivalent expressions for each of the square root arguments. The expressions are complementary in the sense that both should not lead to numerical difficulties at the same time.

[1] See “Accurate Computation of Quaternions from Rotation Matrices” by Soheil Sarabandi and Federico Thomas for a better numerical algorithm. See also the article “A Survey on the Computation of Quaternions From Rotation Matrices” by the same authors.

5,000th post

This is the 5,000th post on this blog. I started blogging in 2008, and Wayne Joubert started contributing posts last year. We’ve written an average of between five and six posts a week for the last 17 years.

I thought about listing some of the most popular posts over the years, but that’s not as simple as it sounds. Popularity varies over time, and posts are popular with different people for different reasons. I don’t have a way of quantifying what posts have been popular with regular readers, but I’m sure such a list would be very different from the lists below.

Recent favorites

Here are posts that have been popular over the last year.

I knew that the ASCII post was popular, but before looking at stats I had no idea anyone was reading the other two posts. I imagine regular readers are more interested in things like my recent series on the logistic map.

Hacker News

The first post to go viral was Why programmers are not paid in proportion to their productivity. Hacker News sent the site more traffic than it could handle at the time.

Many of the posts that have seen a lot of traffic have been posted on Hacker News. I very much appreciate everyone who posts my work there. Because Hacker News readers tend to be programmers, my most popular posts have tended to be programming-related. The posts most popular with regular readers are not as tilted toward programming.

Here are a few more posts that have been popular because of Hacker News.

Code snippets

I didn’t realize until Tim Hopper pointed it out that a lot of projects on Github mention this blog, either to cite an article as a reference or to use code I’ve posted. That’s fine, by the way: feel free to use whatever you find useful. Here is Tim’s list of mentions.

Here’s an index of stand-alone code. Everything these code snippets do can be found in standard software libraries. However, these code samples remain popular because sometimes you cannot import a library or do not want to. I mentioned an example of this in the previous post. I’ve had several consulting projects where there was something new about their project that meant they had to develop basic mathematical software from scratch.

Calculators

This site started as a set of hand-written HTML pages, and there are a still a few such pages, especially calculators. Some of these have been surprisingly popular. (“Surprising” and “popular” seem to always go together. I can kinda predict when something will be moderately popular, but the most popular content is always a surprise to me.)

A note to new readers

If you’re new to this site, the links above may give a wrong impression. I mostly write about math and statistics, and occasionally about other topics such as data privacy or music. None of the posts above are typical.

If you’d like to be notified of posts as they come out, you can subscribe via RSS or follow one of my X accounts. I also have a newsletter where I introduce posts two or three at a time.

Thanks for reading.

Morse code and the limits of human perception

Musician Adam Neely made a video asking What is the fastest music humanly possible? He emphasizes that he means the fastest music possible to hear, not the fastest to perform.

Screenshot of Bolton paper

The video cites a psychology article [1] from 1894 that found that most people can reliably distinguish an inter-onset interval (time between notes) of 100 ms [2]. It also gives examples of music faster than this, such as a performance of Vivaldi with an inter-onset interval of 83 ms [3]. The limit seem to be greater than 50 ms because a pulse train with an inter-onset interval of 50 ms starts to sound like a 20 Hz pitch.

People are able to receive Morse code faster than this implies is possible. We will explain how this works, but first we need to convert words per minute to inter-onset interval length.

Morse code timing

Morse code is made of dots and dashes, but it is also made of spaces of three different lengths: the space between the dots and dashes representing a single letter, the space between letters, and the space between words.

According to an International Telecommunication Union standard

  • A dash is equal to three dots.
  • The space between the signals forming the same letter is equal to one dot.
  • The space between two letters is equal to three dots.
  • The space between two words is equal to seven dots.

The same timing is referred to as standard in a US Army manual from 1957,

Notice that all the numbers above are odd. Since a dot or dash is always followed by a space, the duration of a dot or dash and its trailing space is always an even multiple of the duration of a dot.

If we think of a dot as a sixteenth note, Morse code is made of notes that are either sixteenth notes or three sixteenth notes tied together. Rests are equal to one, three, or seven sixteenths, and notes and rests must alternate. All notes start on an eighth note boundary, i.e. either on a down beat or an up beat but not in between.

Words per minute

Morse code speed is measured in words per minute. But what exactly is a “word”? Words have a variable number of letters, and even words with the same number of letters can have very different durations in Morse code.

The most common standard is to use PARIS as the prototypical word. Ten words per minute, for example, means that dots and dashes are coming at you as fast as if someone were sending the word PARIS ten times per minute. Here’s a visualization of the code for PARIS with each square representing the duration of a dot.

■□■■■□■■■□■□□□■□■■■□□□■□■■■□■□□□■□■□□□■□■□■□□□□□□□□

This has the duration of 50 dots.

How does this relate to inter-onset interval? If each duration of a dot is an interval, then n words per minute corresponds to 50n intervals per minute, or 60/50n = 1.2/n seconds per interval.

This would imply that 12 wpm would correspond to an inter-onset interval of around 100 ms, pushing the limit of perception. But 12 wpm is a beginner speed. It’s not uncommon for people to receive Morse code at 30 wpm. The world record, set by Theodore Roosevelt McElroy in 1939, is 75.2 wpm.

What’s going on?

In the preceding section I assumed a dot is an interval when calculating inter-onset interval length. In musical terms, this is saying a sixteenth note is an interval. But maybe we should count eighth notes as intervals. As noted before, every “note” (dot or dash) starts on a down beat or up beat. Still, that would say 20 wpm is pushing the limit of perception, and we know people can listen faster than that.

You don’t have to hear with the precision of the diagram above in order to recognize letters. And you have context clues. Maybe you can’t hear the difference between “E E E” and “O”, but in ordinary prose the latter is far more likely.

E E E vs O

At some skill level people quit hearing individual letters and start hearing words, much like experienced readers see words rather than letters. I’ve heard that this transition happens somewhere between 20 wpm and 30 wpm. That would be consistent with the estimate above that 20 wpm is pushing the limit of perception letter by letter. But 30 words per minute is doable. It’s impressive, but not unheard of.

What I find hard to believe is that there were intelligence officers, such as Terry Jackson, who could copy encrypted text at 50 wpm. Here a “word” is a five-letter code group. There are millions of possible code groups [4], all equally likely, and so it would be impossible to become familiar with particular code groups the way one can become familiar with common words. Maybe they learned to hear pairs or triples of letters.

Related posts

[1] Thaddeus L. Bolton. Rhythm. The American Journal of Psychology. Vol. VI. No. 2. January, 1894. Available here.

[2] Interonset is not commonly hyphenated, but I’ve hyphenated it here for clarity.

[3] The movement Summer from Vivaldi’s The Four Seasons performed at 180 bpm which corresponds to 720 sixteenth notes per minute, each 83 ms long.

[4] If a code group consisted entirely of English letters, there are 265 = 11,881,376 possible groups. If a code group can contain digits as well, there would be 365 = 60,466,176 possible groups.