Gaussian elimination

Carl Friedrich Gauss

When you solve systems of linear equations, you probably use Gaussian elimination, even if you don’t call it that. You may learn Gaussian elimination before you see it formalized in terms of matrices.

So if you’ve had a course in linear algebra, and you sign up for a course in numerical linear algebra, it’s natural to expect that Gaussian elimination would be one of the first things you talk about. That was my experience, and it was painful. I had this odd mixture of feeling like I already knew what the professor was talking about, while at the same time feeling lost.

Trefethen and Bau do not start their numerical linear algebra textbook with Gaussian elimination, and they explain why in the preface:

We have departed from the customary practice by not starting with Gaussian elimination. That algorithm is atypical of numerical linear algebra, exceptionally difficult to analyze, yet at the same time tediously familiar to every student in a course like this. Instead, we begin with the QR factorization, which is more important, less complicated, and fresher idea to most students.

It’s reassuring to hear experts in numerical linear algebra admit that Gaussian elimination is “exceptionally difficult to analyze.” The algorithm is not exceptionally difficult to perform; children learn to manually execute the algorithm. But it is surprisingly difficult, and tedious, to analyze.

Numerical analysts like Trefethen and Bau have much more sophisticated questions in mind than how you would naively carry out the algorithm by hand. Numerical analysts are concerned, for example, with stability.

If your coefficients are known exactly and you carry out your arithmetic exactly, then you get an exact result. But can a small change to your inputs, say due to measurement uncertainty, or the inevitable loss of precision from floating point arithmetic, make a large change in your results? Absolutely, unless you use pivoting. But with pivoting, Gaussian elimination is stable [1].

After taking an introductory linear algebra course, you know how to solve any linear system of equations in principle. But when you actually want to compute the solution accurately, efficiently, at scale, in a real computer, with real data, things can get much more interesting. Many people have devoted their careers to doing in practice what high school students know how to do in principle.

Related posts

[1] There are matrices for which Gaussian elimination is unstable, but they are rare. It’s been shown that they are rare in the sense of having low probability, but more importantly they never arise naturally in applied problems.

Self-orthogonal vectors and coding

One of the surprising things about linear algebra over a finite field is that a non-zero vector can be orthogonal to itself.

When you take the inner product of a real vector with itself, you get a sum of squares of real numbers. If any element in the sum is positive, the whole sum is positive.

In a finite field, things don’t work that way. A list of non-zero squares can add up to zero.

In the previous post, we looked at the ternary Golay code, an error correcting code that multiplies a vector of data by a rectangular matrix mod 3. That post included the example that

[1, 0, 1, 2, 2]

was encoded as

[1 0 1 2 2 0 1 0 2 0 0].

The output is orthogonal to itself:

1² + 0² + 1² + 2² + 2² + 0² + 1² + 0² + 2² + 0² + 0² ≡ 0 (mod 3).

In fact, every output of the ternary Golay code will be self-orthogonal. To see this, let v be a 1 by 5 row vector. Then its encoding is vG where G is the 5 by 11 matrix in the previous post. The inner product of vG with itself is

vG (vG)T = vGGTvT.

We can easily show that GGT is a zero matrix using Python:

    >>> (G@G.T)%3
    [[0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]]

And so

vGGTvT = vOvT = 0.

If a vector is not self-orthogonal, there has been an error. For example, if one of the 0s in our output turned into a 1 or a 2, the inner product of the corrupted vector with itself would be equal to 1 (mod 3) rather than 0.

Self-orthogonality is necessary but not sufficient for an encoded vector to be error-free. Correctly encoded vectors form a 5 dimensional subspace of GF(3)11, namely the row space of G. The set of self-orthogonal vectors in GF(3)11 is a larger space.

A sufficient condition for a row vector w to be the encoding of a data vector v is for

GwT

to be the zero vector (carrying out all calculations mod 3).

The ternary Golay code is capable of detecting and correcting corruption in up to two spots in a vector. For example, suppose we take our vector

[1 0 1 2 2 0 1 0 2 0 0]

above and corrupt it by turning the first couple 2’s to 1’s.

[1 0 1 1 1 0 1 0 2 0 0]

We’d still get a self-orthogonal vector, but the product GwT would not be zero.

    >>> v = np.array( [1, 0, 1, 2, 2] )
    >>> w = (v@G)%3
    >>> w[3] = w[4] = 1
    >>> print((G@w)%3)

This prints

    [0 0 0 2 2]

which lets us know that the modified version of w is not a valid encoding, not a part of a row space of G.

If we know (or are willing to assume) that

[1 0 1 1 1 0 1 0 2 0 0]

is the result of no more than two changes applied a valid code word vG, then we can recover vG by looking for the closest vector in the row space of G. Here closeness is measured in Hamming distance: the number of positions in which vectors differ. Every vector in GF(3)11 is within a distance of 2 from a unique code word, and so the code can correct any two errors.

Related posts

Ternary Golay code in Python

Marcel Golay discovered two “perfect” error-correcting codes: one binary and one ternary. These two codes stick out in the classification of perfect codes [1].

The ternary code is a linear code over GF(3), the field with three elements. You can encode a list of 5 base-three digits by multiplying the list as a row vector by the following generator matrix on the right:

\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 2 & 2 & 0 \\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 2 & 1 & 0 & 2 \\ 0 & 0 & 1 & 0 & 0 & 1 & 2 & 1 & 0 & 1 & 2 \\ 0 & 0 & 0 & 1 & 0 & 1 & 2 & 0 & 1 & 2 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 2 & 2 & 1 & 1 \\ \end{pmatrix}

Here the arithmetic is carried out in GF(3): every multiplication and addition is carried out mod 3.

Suppose we want to write this up in Python. How can you tell NumPy that you want to do arithmetic mod 3? I don’t believe you can directly. However, you could just multiply your row vector by the matrix using ordinary arithmetic and reducing the result mod 3 at the end. This gives the same result as if all intermediate operations had been carried out mod 3.

For example, suppose we wanted to encode the list [1, 0, 1, 2, 2].

    import numpy as np

    G = np.array([
      [1, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0], 
      [0, 1, 0, 0, 0, 1, 1, 2, 1, 0, 2], 
      [0, 0, 1, 0, 0, 1, 2, 1, 0, 1, 2], 
      [0, 0, 0, 1, 0, 1, 2, 0, 1, 2, 1], 
      [0, 0, 0, 0, 1, 1, 0, 2, 2, 1, 1], 
    ])

    v = np.array( [1, 0, 1, 2, 2] )
    print ((v@G)%3)

The vector-matrix product v@G contains integers that are larger than 3:

    [1 0 1 2 2 6 7 6 8 9 6]

But when we reduce everything mod 3 we get

    [1 0 1 2 2 0 1 0 2 0 0]

This doesn’t mean that linear algebra over a finite field is trivial, though it was trivial in this case.

The same trick would work if we were to work modulo any prime. So the analogous trick would work mod 7, for example.

However, the trick would not work for the field with 8 elements, for example, because arithmetic in this field is not simply arithmetic mod 8. (You can read why here.) So our trick works for GF(p) for any prime p, but not in GF(pk) for any k > 1.

Related posts

[1] From “Sphere Packings, Lattices and Groups” by J. H. Conway and N. J. A. Sloane:

Perfect codes were essentially classified by Tietäväinen and van Lint. The list is as follows.

(i) Certain trivial codes …
(ii) Hamming codes …
(iii) Nonlinear codes with the same parameters as Hamming codes …
(iv) The binary [23, 12, 7] binary Golay code C23 and the ternary [11, 6, 5] Golay code C11.

Non-associative multiplication

There are five ways to parenthesize a product of four things:

  • ((ab)c)d
  • (ab)(cd)
  • (a(b(cd))
  • (a(bc))d
  • (a((bc)d)

In a context where multiplication is not associative, the five products above are not necessarily the same. Maybe all five are different.

This post will give two examples where the products above are all different: octonions and matrix multiplication.

If you’re thinking “But wait: matrix multiplication is associative!” then read further and see in what sense I’m saying it’s not.

Octonioins

Octonion multiplication is not associative. I wrote a blog post a while back asking how close the octonions come to being associative. That is, if we randomly generate unit-length octonions a, b, and c, we can calculate the norm of

(ab)ca(bc)

and ask about its expected value. Sometimes for a triple of octionions this value is zero, but on average this expression has norm greater than 1. I estimated the average value via simulation, and later Greg Egan worked out the exact value. His write-up had gone down with the Google+ ship, but recently Greg posted a new version of his notes.

In this post I gave Python code for multiplying octonions using a function called CayleyDickson named after the originators of the algorithm. Let’s rename it m to have something shorter to work with and compute all five ways of associating a product of four octonions.

    import numpy as np

    def m(x, y):
        return CayleyDickson(x, y)

    def take_five(a, b, c, d):
        return [
            m(a, (m(b, m(c, d)))),
            m(m(a, b), m(c, d)),
            m(m(m(a, b), c), d),
            m(a, m(m(b, c), d)),
            m(m(a, m(b, c)), d)
    ]

I first tried products of basis elements, and I only got two different products out of the five ways of associating multiplication, but I only tried a few examples. However, when I tried using four random octonions, I got five different products.

    np.random.seed(20220201)

    a = np.random.rand(8)
    b = np.random.rand(8)
    c = np.random.rand(8)
    d = np.random.rand(8)

    for t in take_five(a, b, c, d):
        print(t)

This gave a very interesting result: I got five different results, but only two different real (first) components. The five vectors all differed in the last seven components, but only produced two distinct first components.

    [ 2.5180856  -2.61618184  ...]
    [ 2.5180856   0.32031027  ...]
    [ 2.5180856   1.13177500  ...]
    [ 3.0280984  -0.30169446  ...]
    [ 3.0280984  -2.36523580  ...]

I repeated this experiment a few times, and the first three results always had the same real component, and the last two results had another real component.

I suspect there’s a theorem that says

Re( ((ab)c)d ) = Re( (ab)(cd) ) = Re( a(b(cd)) )

and

Re( (a(bc))d ) = Re( a((bc)d) )

but I haven’t tried to prove it. If you come with a proof, or a counterexample, please post a link in the comments.

Matrix multiplication

Matrix multiplication is indeed associative, but the efficiency of matrix multiplication is not. That is, any two ways of parenthesizing a matrix product will give the same final matrix, but the cost of the various products are not the same. I first wrote about this here.

This is a very important result in practice. Changing the parentheses in a matrix product can make the difference between a computation being practical or impractical. This comes up, for example, in automatic differentiation and in backpropagation for neural networks.

Suppose A is an m by n matrix and B is an n by p matrix. Then AB is an m by p matrix, and forming the product AB requires mnp scalar multiplications. If C is a by q matrix, then (AB)C takes

mnp + mpq = mp(n + q)

scalar multiplications, but computing A(BC) takes

npqmnq = nq(m + p)

scalar multiplications, and in general these are not equal.

Let’s rewrite our multiplication function m and our take_five function to compute the cost of multiplying four matrices of random size.

We’ve got an interesting programming problem in that our multiplication function needs to do two different things. First of all, we need to know the size of the resulting matrix. But we also want to keep track of the number of scalar multiplications the product would require. We have a sort of main channel and a side channel. Having our multiplication function return both the dimension and the cost would make composition awkward.

This is is kind of a fork in the road. There are two ways to solving this problem, one high-status and one low-status. The high-status approach would be to use a monad. The low-status approach would be to use a global variable. I’m going to take the low road and use a global variable. What’s one little global variable among friends?

    mults = 0

    def M(x, y):
        global mults
        mults += x[0]*x[1]*y[0]
        return (x[0], y[1])

    def take_five2(a, b, c, d):

        global mults
        costs = []

        mults = 0; M(a, (M(b, M(c, d)))); costs.append(mults)
        mults = 0; M(M(a, b), M(c, d));   costs.append(mults)
        mults = 0; M(M(M(a, b), c), d);   costs.append(mults)
        mults = 0; M(a, M(M(b, c), d));   costs.append(mults)
        mults = 0; M(M(a, M(b, c)), d);   costs.append(mults)

        return costs 

Next, I’ll generate five random integers, and group them in pairs as sizes of matrices conformable for multiplication.

    dims = np.random.random_integers(10, size=5)
    dim_pairs = zip(dims[:4], dims[1:])
    c = take_five2(*[p for p in dim_pairs])

When I ran this dims was set to

[3 9 7 10 6]

and so my matrices were of size 3×9, 9×7, 7×10, and 10×6.

The number of scalar multiplications required by each way of multiplying the four matrices, computed by take_five2 was

[1384, 1090, 690, 1584, 984]

So each took a different number of operations. The slowest approach would take more than twice as long as the fastest approach. In applications, matrices can be very long and skinny, or very wide and thin, in which case one approach may take orders of magnitude more operations than another.

Related posts

Vector spaces and subspaces over finite fields

A surprising amount of linear algebra doesn’t depend on the field you’re working over. You can implicitly assume you’re working over the real numbers R and prove all the basic theorems—say all the theorems that come before getting into eigenvalues in a typical course—and all or nearly all of the theorems remain valid if you swap out the complex numbers for the real numbers. In fact, they hold for any field, even a finite field.

Subspaces over infinite fields

Given an n-dimensional vector space V over a field F we can ask how many subspaces of V there are with dimension k. If our field F is the reals and 0 < k < n then there are infinitely many such subspaces. For example, if n = 2, every line through the origin is a subspace of dimension 1.

Now if we’re still working over R but we pick a basis

e1, e2, e3, …, en

and ask for how many subspaces of dimension k there are in V  that use the same basis elements, now we have a finite number. In fact the number is the binomial coefficient

\binom{n}{k}

because there are as many subspaces as there are ways to select k basis elements from a set of n.

Subspaces over finite fields

Let F be a finite field with q elements; necessarily q is a prime power. Let V be an n-dimensional vector space over F. We might need to know how many subspaces of V there are of various dimensions when developing algebraic codes, error detection and correction codes based on vector spaces over finite fields.

Then the number of subspaces of V with dimension k equals the q-binomial coefficient

\binom{n}{k}_q \equiv \frac{[n]_q!}{[k]_q! [n-k]_q!}

mentioned in the previous post. Here [n]q! is the q-factorial of n, defined by

[n]_q! \equiv [1]_q [2]_q [3]_q \cdots [n]_q

and [n]q! is the q-analog of n, defined by

[n]_q \equiv \frac{1 - q^n}{1-q} = 1 + q + q^2 + \cdots q^{n-1}

The fraction definition holds for all n, integer or not, when q ≠ 1. The fraction equals the polynomial in q when n is a non-negative integer.

You can derive the expression for the number of subspaces directly from a combinatorial argument, not using any of the q-analog notation, but this notation makes things much more succinct.

Python code

We can easily code up the definitions above in Python.

    def analog(n, q):
        return (1 - q**n)//(1 - q)

    def qfactorial(n, q):
        p = 1
        for k in range(1, n+1):
            p *= analog(k, q)
        return p

    def qbinomial(n, k, q):
        d = qfactorial(k, q)*qfactorial(n-k, q)
        return qfactorial(n, q)/d

Example

To keep things small, let’s look at the field with q = 3 elements. Here addition and multiplication are carried out mod 3.

Let n = 3 and k = 1. So we’re looking for one-dimensional subspaces of F³ where F is the field of integers mod 3.

A one-dimensional subspace of vector space consists of all scalar multiples of a vector. We can only multiply a vector by 0, 1, or 2. Multiplying by 0 gives the zero vector, multiplying by 1 leaves the vector the same, and multiplying by 2 turns 1s into 2s and vice versa. So, for example, the vector (1, 2, 0) is a basis for the subspace consisting of

(0, 0, 0), (1, 2, 0}, (2, 1, 0).

We can find all the subspaces by finding a base for each subspace. And with a little brute force effort, here’s a list.

  • (1, 0, 0)
  • (0, 1, 0)
  • (0, 0, 1)
  • (0, 1, 1)
  • (1, 0, 1)
  • (1, 1, 0)
  • (1, 2, 0)
  • (0, 1, 2)
  • (2, 0, 1)
  • (1, 1, 1)
  • (2, 1, 1)
  • (1, 2, 1)
  • (1, 1, 2)

It’s easy to check that none of these vectors is a multiple mod 3 of another vector on the list. The theory above says we should expect to find 13 subspaces, and we have, so we must have found them all.

Related posts

Cyclic permutations and trace

The trace of a square matrix is the sum of the elements on its main diagonal.

The order in which you multiply matrices matters: in general, matrix multiplication is not commutative. But the trace of a product of matrices may or may not depend on the order of multiplication.

Specifically, trace doesn’t change if you take the last matrix and move it to the front. You can apply this repeatedly the get all cyclic permutations.

If you just have two matrices, A and B, then order doesn’t matter because there is only one permutation of two things, and it’s a cyclic permutation. That is trace(AB) = trace(BA).

But in general only cyclic permutations leave the trace unchanged. The following Python code illustrates this.

    import numpy as np
    np.random.seed(1776)

    N = 4

    A = np.random.rand(N, N)
    B = np.random.rand(N, N)
    C = np.random.rand(N, N)

    print( (A@B@C).trace() )
    print( (B@C@A).trace() )
    print( (C@A@B).trace() )
    print( (C@B@A).trace() )
    print( (B@A@C).trace() )
    print( (A@C@B).trace() )

This shows that ABC, BCA, and CAB all have the same trace (5.793 in this example) and CBA, BAC, and ACB all have the same trace (4.851).

In this case there were only two trace values: one for the forward sequence and its rotations, and one for the reversed sequence and its rotations. With more matrices there are more possibilities.

    D = np.random.rand(N, N)

    print( (A@B@C@D).trace() )
    print( (C@D@A@B).trace() )
    print( (C@D@B@A).trace() )
    print( (D@C@B@A).trace() )

Now we see that ABCD and CDBA have the same trace (12.632), because they are rotations of each other, but the other two permutations have difference traces (13.843 and 12.564).

If we’re working with symmetric matrices, then reversing the order doesn’t change the trace. Here’s a quick proof for the product of three symmetric matrices:

trace(ABC) = trace((ABC)T) = trace(CT BT AT) = trace(CBA)

So for three symmetric matrices, the trace is the same for all permutations.

The following code shows that the order of multiplication can change the trace, even for symmetric matrices, if you have four matrices.

    A += A.T
    B += B.T
    C += C.T
    D += D.T

    print( (C@D@A@B).trace() )
    print( (C@D@B@A).trace() )

The first statement prints 202.085 and the second 222.211.

Related posts

Illustrating Gershgorin disks with NumPy

Gershgorin’s theorem gives bounds on the locations of eigenvalues for an arbitrary square complex matrix.

The eigenvalues are contained in disks, known as Gershgorin disks, centered on the diagonal elements of the matrix. The radius of the disk centered on the kth diagonal element is the sum of the absolute values of the elements in the kth row, excluding the diagonal element itself.

To illustrate this theorem, we create a 5 by 5 diagonal matrix and add some random noise to it. The diagonal elements are

0, 3 + i, 4 + i, 1 + 5i, 9 + 2i.

The eigenvalues of the diagonal matrix would simply be the diagonal elements. The additional random values pull the eigenvalues away from the diagonal values, but by an amount no more than Gershgorin predicts.

Gershgorin disks and eigenvalues

Note that in this example, two of the eigenvalues land in the smallest disk. It’s possible that a disk may not contain any eigenvalues; what Gershgorin guarantees is that the union of all the disks contains the union of all the eigenvalues.

Here’s the Python code that created the graph above.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(202103014)

N = 5 # dimension of our square matrix

D = np.diag([0, 3 + 1j, 4 + 1j, 1 + 5j, 9 + 2j])
M = np.random.rand(N, N) + D

R = np.zeros(N) # disk radii
for i in range(N):
    R[i] = sum(abs(M[i,:])) - abs(M[i,i])

eigenvalues = np.linalg.eigvals(M)

# Plotting code
fig, ax = plt.subplots()
for k in range(N):
    x, y = M[k,k].real, M[k,k].imag
    ax.add_artist( plt.Circle((x, y), R[k], alpha=0.5) )
    plt.plot(eigenvalues[k].real, eigenvalues[k].imag, 'k+')

ax.axis([-4, 12.5, -4, 9])
ax.set_aspect(1)    
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.title("Gershgorin disks and eigenvalues $x + iy$")

Moving between vectors and diagonal matrices

This is the first of two posts on moving between vectors and diagonal matrices. The next post is Broadcasting and functors.

Motivation

When I first saw the product of two vectors in R, I was confused. If x and y are vectors, what does x*y mean? An R programmer would say “You multiply components together, of course. What else could it mean?!”

But my first thought was “What kind of product do they mean? Inner product? Outer product? You can’t take the matrix product of two vectors.”

That was a long time ago. What brought this to mind today was a statistical paper I was reading that used the same symbol for a vector and a matrix. In context it wasn’t hard to figure out what the authors meant, but there was something implicit going on there that I will make explicit here.

Vectors and matrices

You can map a vector to a matrix by putting its contents along the diagonal of a diagonal matrix. That is, given a vector v of dimension n, create an n by n matrix V that is filled with zeros, except that

Vii = vi

for i = 1, 2, …, n. Let’s call this function Δ. In NumPy, Δ is implemented as the function diag. For example,

\Delta: \begin{pmatrix} 4 \\ i \\ \pi \end{pmatrix} \mapsto \begin{pmatrix} 4 & 0 & 0\\ 0 & i & 0 \\ 0 & 0 & \pi \end{pmatrix}

The statistics paper that I mentioned above used Δ implicitly, i.e. it denoted v and Δv with the same notation.

The function Δ has some nice properties.

The componentwise product two vectors is called the Hadamard product, denoted by a circle. You can first take the Hadamard product of two column vectors (of the same dimension) and then embed the result in a matrix using Δ, or you can first embed the two vectors into the world of matrices and take the product there. Either way you end up at the same place.

In other words, the following diagram commutes:

\begin{diagram} C & \rTo^{\Delta} & M \\ \uTo^{\circ} & & \uTo_{\cdot} \\ C \times C & \rTo^{\Delta} & M \times M \end{diagram}

Here C is the set of column vectors, and by C × C we mean pairs of column vectors of the same dimension. Similarly, M is the set of square matrices, and by M × M we mean pairs of square matrices of the same dimension. The circle on the left is Hadamard product. The dot on the right is ordinary matrix product [1].

The Δ on top is the Δ we’ve defined above. The Δ on bottom takes a pair of column vectors to a pair of matrices by operating on each element of the pair separately.

Although it would a little strange to do so, you could define Hadamard product this way. To take the Hadamard product of two vectors, take them over to the world of matrices, multiply the two matrices, and take that matrix back to the world of column vectors.

Related posts

[1] Alternately, you could think of the product on the right as Hadamard product of matrices. For diagonal matrices, ordinary matrix product and Hadamard product coincide, though of course in general they are very different.

The debauch of indices

This morning I was working on a linear algebra problem for a client that I first solved by doing calculations with indices. As I was writing things up I thought of the phrase “the debauch of indices” that mathematicians sometimes use to describe tensor calculations. The idea is that calculations with lots of indices are inelegant and that more abstract arguments are better.

The term “debauch of indices” pejorative, but I’ve usually heard it used tongue-in-cheek. Although some people can be purists, going to great lengths to avoid index manipulation, pragmatic folk move up and down levels of abstraction as necessary to get their work done.

I searched on the term “debauch of indices” to find out who first said it, and found an answer on Stack Exchange that traces it back to Élie Cartan. Cartan said that although “le Calcul différentiel absolu du Ricci et Levi-Civita” (tensor calculus) is useful, “les débauches d’indices” could hide things that are easier to see geometrically.

After solving my problem using indices, I went back and came up with a more abstract solution. Both approaches were useful. The former cut through a complicated problem formulation and made things more tangible. The latter revealed some implicit pieces of the puzzle that needed to be made explicit.

Related posts

Kronecker sum

I’m working on a project these days that uses four different kinds of matrix product, which made me wonder if there’s another kind of product out there that I could find some use for.

In the process of looking around for other matrix products, I ran across the Kronecker sum. I’ve seen Kronecker products many times, but I’d never heard of Kronecker sums.

The Kronecker sum is defined in terms of the Kronecker product, so if you’re not familiar with the latter, you can find a definition and examples here. Essentially, you multiply each scalar element of the first matrix by the second matrix as a block matrix.

The Kronecker product of an m × n matrix A and a p × q matrix B is a mp × nq matrix KA B. You could think of K as an m × n matrix whose entries are p × q blocks.

So, what is the Kronecker sum? It is defined for two square matrices, an n × n matrix A and an m × m matrix B. The sizes of the two matrices need not match, but the matrices do need to be square.  The Kronecker sum of A and B is

AB = AIm + InB

where Im and In are identity matrices of size m and n respectively.

Does this make sense dimensionally? The left side of the (ordinary) matrix addition is nm × nm, and so is the right side, so the addition makes sense.

However, the Kronecker sum is not commutative, and usually things called “sums” are commutative. Products are not always commutative, but it goes against convention to call a non-commutative operation a sum. Still, the Kronecker sum is kinda like a sum, so it’s not a bad name.

I don’t have any application in mind (yet) for the Kronecker sum, but presumably it was defined for a good reason, and maybe I’ll run an application, maybe even on the project alluded to at the beginning.

There are several identities involving Kronecker sums, and here’s one I found interesting:

exp( A ) ⊗ exp( B ) = exp( A B ).

If you haven’t seen the exponential of a matrix before, basically you stick your matrix into the power series for the exponential function.

Examples

First, let’s define a couple matrices A and B.

\begin{align*} A &= \left( \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right) \\ B &= \left( \begin{array}{ccc} 1 & 0 & 1 \\ 1 & 2 & 0 \\ 2 & 0 & 3 \\ \end{array} \right) \end{align*}

We can compute the Kronecker sums

S = AB

and

T = B ⊕ A

with Mathematica to show they are different.

    A = {{1, 2}, {3, 4}}
    B = {{1, 0, 1}, {1, 2, 0}, {2, 0, 3}}
    S = KroneckerProduct[A, IdentityMatrix[3]] + 
        KroneckerProduct[IdentityMatrix[2], B]
    T = KroneckerProduct[B, IdentityMatrix[2]] + 
        KroneckerProduct[IdentityMatrix[3], A]

This shows

\begin{align*} A \oplus B &= \left( \begin{array}{cccccc} 2 & 0 & 1 & 2 & 0 & 0 \\ 1 & 3 & 0 & 0 & 2 & 0 \\ 2 & 0 & 4 & 0 & 0 & 2 \\ 3 & 0 & 0 & 5 & 0 & 1 \\ 0 & 3 & 0 & 1 & 6 & 0 \\ 0 & 0 & 3 & 2 & 0 & 7 \\ \end{array} \right) \\ B \oplus A &= \left( \begin{array}{cccccc} 2 & 2 & 0 & 0 & 1 & 0 \\ 3 & 5 & 0 & 0 & 0 & 1 \\ 1 & 0 & 3 & 2 & 0 & 0 \\ 0 & 1 & 3 & 6 & 0 & 0 \\ 2 & 0 & 0 & 0 & 4 & 2 \\ 0 & 2 & 0 & 0 & 3 & 7 \\ \end{array} \right) \end{align*}

and so the two matrices are not equal.

We can compute the matrix exponentials of A and B with the Mathematica function MatrixExp to see that

\begin{align*} \exp(A) &= \left( \begin{array}{cc} 2.71828 & 7.38906 \\ 20.0855 & 54.5982 \\ \end{array} \right) \\ \exp(B) &= \left( \begin{array}{ccc} 2.71828 & 1. & 2.71828 \\ 2.71828 & 7.38906 & 1. \\ 7.38906 & 1. & 20.0855 \\ \end{array} \right) \end{align*}

(I actually used MatrixExp[N[A]] and similarly for B so Mathematica would compute the exponentials numerically rather than symbolically. The latter takes forever and it’s hard to read the result.)

Now we have

\begin{align*} \exp(A) \otimes \exp(B) &= \left( \begin{array}{cccccc} 512.255 & 0. & 606.948 & 736.673 & 0. & 872.852 \\ 361.881 & 384.002 & 245.067 & 520.421 & 552.233 & 352.431 \\ 1213.9 & 0. & 1726.15 & 1745.7 & 0. & 2482.38 \\ 1105.01 & 0. & 1309.28 & 1617.26 & 0. & 1916.22 \\ 780.631 & 828.349 & 528.646 & 1142.51 & 1212.35 & 773.713 \\ 2618.55 & 0. & 3723.56 & 3832.45 & 0. & 5449.71 \\ \end{array} \right) \\ \exp(A \oplus B) &= \left( \begin{array}{cccccc} 512.255 & 0. & 606.948 & 736.673 & 0. & 872.852 \\ 361.881 & 384.002 & 245.067 & 520.421 & 552.233 & 352.431 \\ 1213.9 & 0. & 1726.15 & 1745.7 & 0. & 2482.38 \\ 1105.01 & 0. & 1309.28 & 1617.26 & 0. & 1916.22 \\ 780.631 & 828.349 & 528.646 & 1142.51 & 1212.35 & 773.713 \\ 2618.55 & 0. & 3723.56 & 3832.45 & 0. & 5449.71 \\ \end{array} \right) \end{align*}

and so the two matrices are equal.

Even though the identity

exp( A ) ⊗ exp( B ) = exp( A B )

may look symmetrical, it’s not. The matrices on the left do not commute in general. And not only are AB and BA different in general, their exponentials are also different. For example

\exp(B\oplus A) = \left( \begin{array}{cccccc} 512.255 & 736.673 & 0. & 0. & 606.948 & 872.852 \\ 1105.01 & 1617.26 & 0. & 0. & 1309.28 & 1916.22 \\ 361.881 & 520.421 & 384.002 & 552.233 & 245.067 & 352.431 \\ 780.631 & 1142.51 & 828.349 & 1212.35 & 528.646 & 773.713 \\ 1213.9 & 1745.7 & 0. & 0. & 1726.15 & 2482.38 \\ 2618.55 & 3832.45 & 0. & 0. & 3723.56 & 5449.71 \\ \end{array} \right)

Related posts