Matrices of Matrices

It’s often useful to partition a matrix, thinking of it as a matrix whose entries are matrices. For example, you could look at the matrix 6 × 6

M = \begin{bmatrix} 3 & 1 & 4 & 0 & 0 & 0 \\ 2 & 7 & 1 & 0 & 0 & 0 \\ 5 & 0 & 6 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 4 & 8 \\ 0 & 0 & 0 & 1 & 7 & 6\\ 0 & 0 & 0 & 3 & 5 & 1 \\ \end{bmatrix}

as a 2 × 2 matrix whose entries are 3 × 3 matrices.

\begin{bmatrix} \begin{bmatrix} 3 & 1 & 4 \\ 2 & 7 & 1 \\ 5 & 0 & 6 \\ \end{bmatrix} & \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} \\ & \\ \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} & \begin{bmatrix} 2 & 4 & 8 \\ 1 & 7 & 6\\ 3 & 5 & 1 \\ \end{bmatrix} \end{bmatrix}

M is not a diagonal matrix of real numbers, but its block structure is a diagonal matrix of blocks. It might be possible in some application to take advantage of the fact that the off-diagonal blocks are zero matrices.

You can multiply two partitioned matrices by multiplying the blocks as if they were numbers. The only requirement is that the blocks be of compatible sizes for multiplication: the block widths on the left have to match the corresponding block heights on the right.

For example, if you have two partitioned matrices

A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}
and
B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix}

then their product is the partitioned matrix

C = \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{bmatrix}

The only requirement is that the matrix products above make sense. The number of columns in A11 must equal the number of rows in B11, and the number of columns in A12 must equal the number of rows in B21. (The rest of the compatibility conditions follow because the blocks in a column of a partitioned matrix have the same number of columns, and the blocks in a row of a partitioned matrix have the same number of rows.)

You can partition matrices into any number of rows and columns of blocks; the examples above have been 2 × 2 arrays of blocks only for convenience.

It’s natural to oscillate a bit between thinking “Of course you can do that” and “Really? That’s incredible.” If you haven’t gone through at least one cycle perhaps you haven’t thought about it long enough.

What could go wrong?

Manipulation of partitioned matrices works so smoothly that you have to wonder where things could go wrong.

For the most part, if a calculation makes logical sense then it’s likely to be correct. If you add two blocks, they have to be compatible for addition. If you “divide” by a block, i.e, multiply by its inverse, then the block must have an inverse. Etc.

One subtle pitfall is that matrix multiplication does not commute in general, and so you have to watch out for implicitly assuming that ABBA where A and B are blocks. For example, theorems about the determinant of a partitioned matrix are not as simple as you might hope. The determinant of the partitioned matrix

\begin{vmatrix} A & B \\ C & D \end{vmatrix}

is not simply det(ADBC) in general, though it is true if all the blocks are square matrices of the same size and the blocks in one row or one column commute, i.e. if AB = BA, CD = DC, AC = CA, or BD = DB.

More linear algebra posts

Trinomial Coefficients and Kings

The term trinomial coefficient is used a couple different ways. The most natural use of the term is a generalization of bionomial coefficients to three variables, i.e. numbers of the form

\frac{n!}{i! \, j! \, k!}

where ijkn. These numbers are the coefficients of yj zk in the expansion of (x + y + z)n.

But there’s another use of the term trinomial coefficient, the one that we are interest in for this post, and that is the coefficients of xk in the expansion of (1 + x + 1/x)n. These numbers can be visualized in a triangle analogous to Pascal’s triangle.

In Pascal’s triangle, each entry is the sum of the two entries above it. In the trinomial triangle, each entry is the sum of the three entries above it.

If you’ve been reading this blog regularly you know I’ve been interested in chess puzzles lately. The thing that make me interested in trinomial coefficients is that they relate to moves of a king on a chessboard.

If you note the number paths a king can take to a square using the minimal number of moves, you get the trinomial triangle. The reason is simple: the number of ways to get to a square equals the number of ways to get there via the square up and to the left, plus the number of ways to get their via the square above, plus the number of ways to get their via the square up and to the right.

Related posts

Riff on an integration bee integral

I saw an integral posted online that came from this year’s MIT Integration Bee.

\int_0^1 x^{2024} (1 - x^{2025})^{2025}\, dx

My thoughts on seeing this were, in order:

  1. It looks like a beta function.
  2. The answer is a small number.
  3. You can evaluate the integral using the substitution u = 1 − x2025.

I imagine most students’ reactions would be roughly the opposite. They’d first see the substitution. Some might think of the value being small, and none would think of beta functions.

Size estimate

The value of the integral is small because you have numbers between 0 and 1 being raised to large powers. In real applications, being able to estimate the size of integral may be more valuable than being able to compute the integral.

But on a quiz, like the Integration Bee, knowing that the result is small would be worthless, except possibly to check a result. You know from the context of the problem being on a timed quiz that there must be a trick [1].

Incidentally, if you changed the problem slightly, say by replacing one of the instances of 2025 with 2025.03, the integration problem becomes much harder, but you could still estimate that the value is small.

Beta functions

The first time I saw someone evaluate an integral by recognizing a beta function it seemed like he was pulling a rabbit out of a hat. I was completely surprised that he thought something was common knowledge that I’d never seen.

Years later I went to work in biostatistics at MDACC and worked with the beta function and beta random variables all the time. If you look through my publications, you’ll see the word “beta” appears several times.

The beta function is defined as follows. You could take the first equality as the definition and the second as a theorem.

B(x, y) = \int_0^1 t^{x-1} (1 - t)^{y-1} \,dx = \frac{\Gamma(x) \, \Gamma(y)}{\Gamma(x+y)}

The integrand in the definition of the beta function is the probability density function for a beta(xy) random variable, and so B(x, y) is the normalizing constant for a beta(xy) probability distribution.

The integral in the Integration Bee question is not a beta function. I thought maybe it could be turned into a beta function, but the u substitution is simpler.

If you change the integrand above to

\int_0^1 x^{2024} (1 - x)^{2025}\, dx

then you do have a beta function. The value of this integral is B(2025, 2026) which equals 2024! 2025! / 4050!, which is an extremely small number.

Numerical estimation

The integral in the previous section is roughly the same as that of x2024 (1 − x)2024. This function has its maximum at ½, and so the integrand is less than 2−4048 ≈ 10−1219.

If we want to evaluate 2024! 2025! / 4050! numerically we’ll need to be indirect about it because 2024! would overflow and the final result would underflow. We could calculate the log base 10 of the value using Python as follows.

>>> from scipy.special import gammaln
>>> from math import log
>>> (gammaln(2025) + gammaln(2026) - gammaln(4051))/log(10)
-1220.576093208249

So our upper bound got us to within an order of magnitude or two of the result.

 

[1] When you see the current year used in a math problem, the quiz writer is winking at you. The year is often meant to suggest that a number is large but somewhat arbitrary. Here you suspect that the key to the question would be the same if 2024 and 2025 were replaced with 1997 and 1998. Also, the year is often a hint that you don’t want to use brute force. In this case, it’s a warning not to expand the integrand using the binomial theorem.

Embeddings, Projections, and Inverses

I just revised a post from a week ago about rotations. The revision makes explicit the process of embedding a 3D vector into the quaternions, then pulling it back out.

The 3D vector is embedded in the quaternions by making it the vector part of a quaternion with zero real part:

(p1p2p3)  → (0, p1p2p3)

and the quaternion is returned to 3D by cutting off the real part:

(p0, p1p2p3)  → (p1p2p3).

To give names to the the process of moving to and from quaterions, we have an embedding E : ℝ3 to ℝ4 and a projection P from ℝ4 to ℝ3.

We can represent E as a 4 × 3 matrix

E = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

and P by a 3 × 4 matrix

P = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

We’d like to say E and P are inverses. Surely P undoes what E does, so they’re inverses in that sense. But E cannot undo P because you lose information projecting from ℝ4 to ℝ3 and E cannot recover information that was lost.

The rest of this post will look at three generalizations of inverses and how E and P relate to each.

Left and right inverse

Neither matrix is invertible, but PE equals the identity matrix on ℝ3 , and so P is a left inverse of E and E is a right inverse of P.

PE = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

On the other hand, EP is not an identity matrix, and so E is not a left inverse of E, and neither is P a right inverse of E.

EP = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Adjoint

P is the transpose of E, which means it is the adjoint of E. Adjoints are another way to generalize the idea of an inverse. More on that here.

Pseudo-inverse

The Moore-Penrose pseudo-inverse acts a lot like an inverse, which is somewhat uncanny because all matrices have a pseudo-inverse, even rectangular matrices.

Pseudo-inverses are symmetric, i.e. if A+ is the pseudo-inverse of A, then A is the pseudo-inverse of A+.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A+ is the unique n by m matrix satisfying four conditions:

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

To show that A+ = P we have to establish

  1. EPE = E
  2. PEP = A+
  3. (EP)* = EP
  4. (PE)* = PE

We calculated EP and PE above, and both are real and symmetric, so properties 3 and 4 hold.

We can also compute

 EPE = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = E

and

PEP = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} =P

showing that properties 1 and 2 hold as well.

Related posts

Special solutions to the eight queens problem

There are 92 ways to place eight queens on a chessboard so that no queen is attacking any other. These fall into 12 equivalence classes. The 92 solutions are all rotations and reflections of these 12 basic solutions.

If you think about the previous numbers a minute, you might wonder why the total number of solutions is not a multiple of 12. There is one particular solution that is more symmetric than the others. So the total number of solutions 92 breaks down into

8 × 11 + 4 × 1.

To illustrate this, let’s look at two fundamental solutions: one that is particularly ordered and one that is particularly disordered in a sense that we’ll get to later on.

Most basic solutions, like the one below, are part of an equivalence class of eight solutions.

You can rotate the board 90° or reflect it about the middle[1], or some combination of both [2]. This amounts to eight possibilities.

This solution, however, is more symmetric.

You can rotate the basic solution, but flipping it over does not create a new solution: a flip produces the same result as two 90° rotations.

It’s curious that there is only one highly symmetric solution to the eight queens problem. When I first saw the problem as a child I expected all the solutions to be highly symmetric. That may be why I wasn’t able to find a solution.

Among the 11 basic solutions that are less ordered, the one shown above is uniquely disordered in the following sense: no three queens lie on a straight line.

The eight queens problem is a problem about restricted straight lines. It says no two queens lie on the same rank, file, or diagonal. But if we look at all straight lines, then of course there is a line through any two queens. In the most orderly solution, every queen is on a straight line with two others. In the least orderly solution, no queen is on a straight line with two others.

In 1900 Henry Dudenay introduced the no-three-in-line problem, looking at ways to place points on a lattice such that no line goes through three points, with no restriction on the slope of the lines. So one family of solutions to the eight queens problem is also a solution to the no-three-in-line problem.

Related posts

[1] It doesn’t matter whether you flip about the horizontal or vertical axis.

[2] In fancy terminology, the action of the dihedral group D8 applied to a solution yields another solution.

Sorting Roman numerals

This morning I wrote about the frequencies of names for popes and kings. This involved sorting strings with Roman numerals since it’s common for popes and kings to have Roman numerals after their names.

Something that surprised me was that sorting Roman numerals alphabetically roughly sorts them in numerical order, especially for small numbers. It’s not perfect. For example, IX comes before V in alphabetical order.

Everyone who has done much work with data will have run into the problem of a column of numbers being sorted alphabetically rather than numerically. For example, “10” comes between “1” and “2” even though 10 comes after 1 and 2.

So you can’t sort numerals, Roman or Arabic, as strings and expect them to appear in numerical order. But Roman numbers come close when you’re sorting small numbers, such as I through XXIII for popes named John or I through VIII for kings of England named Henry.

To illustrate this, I plotted how well string sort order correlates with numeric order for Roman and Arabic numbers, for the sequence 1 … n for increasing values of n. I measured correlation using Spearman’s rank-order correlation. I tried Kendall’s tau and as well and got similar results.

Alphabetical order and numerical order for Roman numerals agree pretty well up to XXXVIII, with just a few numbers out of place, namely IX, XIX, and XXIX. But alphabetical order and numerical order diverge quite a bit for Arabic numerals when all the numbers between 10 and 19 come before 2.

As you go further out, alphabetical order and numerical order diverge for both writing systems, but especially for Roman numerals.

 

Composing rotations using quaternions

Every rotation in 3D fixes an axis [1]. This is Euler’s rotation theorem from 1775. Another way to state the theorem is that no matter how you rotate a sphere about its center, two points stay fixed.

The composition of two rotations is a rotation. So the first rotation fixes some axis, the second rotation fixes some other axis, and the composition fixes some third axis. It’s easy to see what these axes are if we work with quaternions. (Quaternions were discovered decades after Euler proved his rotation theorem.)

A rotation by θ about the axis given by a unit vector v = (v1, v2, v3) corresponds to the quaternion

q = (cos(θ/2), sin(θ/2)v1, sin(θ/2)v2, sin(θ/2)v3).

To rotate a point p = (p1, p2, p3) by an angle θ about the axis v, first embed p as a quaternion by setting its first coordinate to 0:

p → (0, p1, p2, p3)

and multiply the quaternion p on the left by q and on the right by the conjugate of q, written q*. That is, the rotation takes p to

p′ = qpq*.

This gives us a quaternion p′, not a 3D vector. We recover the vector by undoing the embedding, i.e. chopping off the first coordinate.

Making things clearer

Since q has unit length, the conjugate of q is also its inverse: q* = q−1. Usually rotations are described as above: multiply on the left by q and on the right by q*. In my opinion it’s clearer to say

p′ = qpq1.

Presumably sources say q* instead of q−1 because it’s obvious how to compute q* from q; it’s not quite as obvious that this also gives the inverse of q.

Another thing about the presentation above that, while standard, could be made clearer is the role Euclidean space and quaternions. It’s common to speak of the real and quaternion representations of a vector, but we could make this more explicit by framing this as an embedding E from ℝ³ to the quaternions ℍ and a projection P from ℍ back to ℝ³ [3].

\[\begin{tikzcd} {\mathbb{H}} && {\mathbb{H}} \\ \\ {\mathbb{R}^3} && {\mathbb{R}^3} \arrow[

The commutative diagram says we end up with the same result regardless of which of two paths we take: we can do the rotation directly from ℝ³ to ℝ³, or we could project into ℍ, multiply by q on the left and divide by q on the right, and project back down to ℝ³.

Composition

Composing rotations represented by quaternions is simple. Rotating by a quaterions q and then by a quaterion r is the same as rotating by rq. Proof:

r(qpq−1)r−1 = (rq)p(q−1r−1) = (rq)p(rq)−1.

See the next post for how to convert between quaternion and matrix representations of a rotation.

Related posts

[1] Note that this is not the case in two dimensions, nor is it true in higher even dimensions.

[2] We assumed v had unit length as a vector in ℝ³. So

||q||² = cos²(θ/2) + sin²(θ/2) = 1.

[3] Why ℍ for quaternions? ℚ for rationals was already taken, so we use ℍ for William Rowan Hamilton.

Recamán’s sequence

I recently ran into Recamán’s sequence. N. J. A. Sloane, the founder of the Online Encyclopedia of Integer Sequences calls Recamán’s sequence one of his favorites. It’s sequence A005132 in the OEIS.

This sequence starts at 0 and the nth number in the sequence is the result of moving forward or backward n steps from the previous number. You are allowed to move backward if the result is positive and a number you haven’t already visited. Otherwise you move forward.

Here’s Python code to generate the first N elements of the sequence.

    def recaman(N):
        a = [0]*N
        for n in range(1, N):
            proposal = a[n-1] - n
            if proposal > 0 and proposal not in set(a):
                a[n] = proposal
            else:
                a[n] = a[n-1] + n
        return a

For example, recaman(10) returns

    [0, 1, 3, 6, 2, 7, 13, 20, 12, 21]

There’s a Numberphile video that does two interesting things with the sequence: it visualizes the sequence and plays a representation of the sequence as music.

The rules for the visualization are that you connect consecutive elements of the sequence with circular arcs, alternating arcs above and below the numberline.

Here’s code to reproduce an image from the video.

    import numpy as np
    import matplotlib.pyplot as plt
    
    def draw_arc(a, b, n):
        c = (a + b)/2
        r = max(a, b) - c
        t = np.linspace(0, np.pi) * (-1)**(n+1)
        plt.plot(c + r*np.cos(t), r*np.sin(t), 'b')
           
    N = 62
    a = recaman(N)
    for n in range(N-1):
        draw_arc(a[n], a[n+1], n)
    plt.gca().set_aspect("equal")
    plt.show()

Here’s the output:

To create a music sequence, associate the number n with the nth note of the chromatic scale. You can listen to the music in the video; here’s sheet music made with Lilypond.

Update

Here is another Recamán visualization going further out in the sequence. I made a few aesthetic changes at the same time.

I’ve also made higher resolution versions of the images in this post. Here are SVG and PDF versions of the first visualization.

Here is a PDF of the sheet music.

And here are SVG and PDF versions of the new visualization.

Cycle order in period doubling region

The last four post have looked at the bifurcation diagram for the logistic map.

There was one post on the leftmost region, where there is no branching. Then two posts on the region in the middle where there is period doubling. Then a post about the chaotic region in the end.

This post returns the middle region. Here’s an image zooming in on the first three forks.

The nth region in the period doubling region has period 2n.

I believe, based on empirical results, that in each branching region, the branches are visited in the same order. For example, in the region containing r = 3.5 there are four branches. If we number these from bottom to top, starting with for the lowest branch, I believe a point on branch 0 will go to a point on branch 2, then branch 1, then branch 3. In summary, the cycle order is [0, 2, 1, 3].

Here are the cycle orders for the next three regions.

[0, 4, 6, 2, 1, 5, 3, 7]

[0, 8, 12, 4, 6, 14, 10, 2, 1, 9, 13, 5, 3, 11, 7, 15]

[0, 16, 24, 8, 12, 28, 20, 4, 6, 22, 30, 14, 10, 26, 18, 2, 1, 17, 25, 9, 13, 29, 21, 5, 3, 19, 27, 11, 7, 23, 15, 31]

[0, 32, 48, 16, 24, 56, 40, 8, 12, 44, 60, 28, 20, 52, 36, 4, 6, 38, 54, 22, 30, 62, 46, 14, 10, 42, 58, 26, 18, 50, 34, 2, 1, 33, 49, 17, 25, 57, 41, 9, 13, 45, 61, 29, 21, 53, 37, 5, 3, 35, 51, 19, 27, 59, 43, 11, 7, 39, 55, 23, 15, 47, 31, 63]

Two questions:

  1. Is the cycle order constant within a region of a constant period?
  2. If so, what can be said about the order in general?

I did a little searching into these questions, and one source said there are no known results along these lines. That may or may not be true, but it suggests these are not common questions.