Representing octonions as matrices, sorta

It’s possible to represent complex numbers as a pair of real numbers or 2 × 2 matrices with real entries.

z \leftrightarrow (a, b) \leftrightarrow \begin{bmatrix}a & -b \\ b & a \end{bmatrix}

And it’s possible to represent quaternions as pairs of complex numbers or 2 × 2 matrices with complex entries

q \leftrightarrow (z_0, z_1) \leftrightarrow \begin{bmatrix} z_0 & z_1 \\ -z_1^* & z_0^* \end{bmatrix}

were z* is the complex conjugate of z.

And it’s also possible to represent octonions as pairs of quaternions or 2 × 2 matrices with quaternion entries, with a twist.

o \leftrightarrow (q_0, q_1) \leftrightarrow \begin{bmatrix} q_0 & q_1 \\ -q_1^* & q_0^* \end{bmatrix}

where q* is the quaternion conjugate of q.

Matrix multiplication is associative, but octonion multiplication is not, so something has to give. We have to change the definition of matrix multiplication slightly.

\begin{bmatrix} \alpha_0 & \alpha_1 \\ \alpha_2 & \alpha_3 \end{bmatrix}\circ\begin{bmatrix} \beta_0 & \beta_1 \\ \beta_2 & \beta_3 \end{bmatrix}=\begin{bmatrix} \alpha_0\beta_0+\beta_2\alpha_1 & \beta_1\alpha_0+\alpha_1\beta_3\\ \beta_0\alpha_2+\alpha_3\beta_2 & \alpha_2\beta_1+\alpha_3\beta_3 \end{bmatrix}

In half the products, the beta term comes before the alpha term. This wouldn’t matter if the alpha and beta terms commuted, e.g. if they were complex numbers this would be ordinary matrix multiplication. But the alphas and betas are quaternions, and so order matters, and the matrix product defined above is not the standard matrix product.

Going back to the idea of matrices of matrices that I wrote about a few days ago, we could represent the octonions as 2 × 2 matrices whose entries are 2 × 2 matrices of complex numbers, etc.

If you look closely at the matrix representations above, you’ll notice that the matrix representations of quaternions and octonions doesn’t quite match the pattern of the complex numbers. There should be a minus sign in the top right corner and not in the bottom left corner. You could do it that way, but there’s a sort of clash of conventions going on here.

Drazin pseudoinverse

The most well-known generalization of the inverse of a matrix is the Moore-Penrose pseudoinverse. But there is another notion of generalized inverse, the Drazin pseudoinverse, for square matrices.

If a matrix A has an inverse A−1 then it also has a Moore-Penrose pseudoinverse A+ and a Drazin pseudoinverse AD and

A−1 = A+ = AD.

When the matrix A is not invertible the two notions of pseudoinverse might not agree.

One way to think about the two generalized inverses is that both are based on a canonical form. The Moore-Penrose pseudoinverse is based on the singular value decomposition. The Drazin inverse is based on the Jordan canonical form.

Jordan canonical form

Every square matrix A can be put into Jordan canonical form

P−1 A PJ

where J is the Jordan canonical form and P is an invertible matrix whose columns are the generalized eigenvectors of A.

If the matrix A is diagonalizable, the matrix J is diagonal. Otherwise the matrix J is block diagonal. Each block has an eigenvalue on the diagonal; the size of the block equals the multiplicity of the eigenvalue. There are 1s on the diagonal above the main diagonal and zeros everywhere else.

If all the eigenvalues of A are non-zero, the matrix is invertible and it is easy to compute the inverse from the Jordan form. To invert J, simply invert each block of J. This is clear from the way you multiply partitioned matrices.

To find the inverse of A, multiply on the left by P and on the right by P−1. That is,

A−1 = P J−1 P−1.

Proof:

A (P J−1 P−1) = (P J P−1)(P J−1 P−1) = I

Drazin pseudoinverse

One way to compute the Drazin inverse, at least in theory, is from the Jordan canonical form. In practice, you might want to compute the Drazin inverse a different way.

If all the eigenvalues of A are non-zero, A is invertible and its Drazin pseudoinverse is its inverse.

If zero is an eigenvalue, the block(s) in J corresponding to 0 cannot be inverted. If a block is 1 by 1 then it just has a zero in it. If a block is larger, it will have 1’s above the main diagonal. Replace the 1s by 0s. This gives us the Drazin pseudoinverse of J. To obtain the pseudoinverse of J, multiply on the left by P and on the right by P−1 just as above.

Categorization

The Drazin pseudoinverse can be categorized by a few properties just as the Moore-Penrose pseudoinverse. Let’s compare the properties.

The defining properties of the Moore-Penrose pseudoinverse A+ of a matrix A are

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

The defining properties of the Drazin pseudoinverse are

  1. A ADAD A
  2. AD A AD = AD
  3. Ak+1 ADA Ak

where k is the smallest non-negative integer such that rank(Ak + 1) = rank(Ak).

The second properties of each pseudoinverse are analogous. The first property of Drazin inverse is sorta like the 3rd and 4th properties of the Moore-Penrose pseudoinverse.

The analog of A A+ A = A does not hold; in general A AD A does not equal A.

The Drazin property Ak+1 ADA Ak is nothing like any property for Moore-Penrose. It suggests that Drazin inverses behave nicely when you raise them to exponents, which is true.

You can see this exponent condition in the Jordan normal form. The blocks in J corresponding to a 0 eigenvector are nilpotent, i.e. they become zero when you raise them to a high enough power. When you raise J to some power k, the same k as above, the nilpotent blocks go away.

Note that although the Moore-Penrose pseudoinverse is symmetric, the Drazin inverse is not. That is (A+)+ = A but (AD)DA in general.

Related posts

Effective graph resistance

I’ve mentioned the Moore-Penrose pseudoinverse of a matrix a few times, most recently last week. This post will give an application of the pseudoinverse: computing effective graph resistance.

Given a graph G, imagine replacing each edge with a one Ohm resistor. The effective resistance between two nodes in G is the electrical resistance between those the two nodes.

Calculating graph resistance requires inverting the graph Laplacian, but the graph Laplacian isn’t invertible. We’ll resolve this paradox shortly, but in the mean time this sounds like a job for the pseudoinverse, and indeed it is.

The graph Laplacian of a graph G is the matrix L = D − A where D is the diagonal matrix whose entries are the degrees of each node and A is the adjacency matrix. The vector of all 1’s

1 = (1, 1, 1, …, 1)

is in the null space of L but if G is connected then the null space is one dimensional. More generally, the dimension of the null space equals the number of components in G.

Calculating graph resistance requires solving the linear system

Lw = v

where v and w are orthogonal to 1. If the graph G is connected then the system has a unique solution. The graph Laplacian L is not invertible over the entire space, but it is invertible in the orthogonal complement to the null space.

The Moore-Penrose pseudoinverse L+ gives the graph resistance matrix. The graph resistance between two nodes a and b is given by

Rab = (eaeb)T L+ (eaeb)

where ei is the vector that has all zero entries except for a 1 in the ith slot.

For more on graph resistance see Effective graph resistance by Wendy Ellens et al.

More spectral graph theory posts

Multiplying a matrix by its transpose

An earlier post claimed that there practical advantages to partitioning a matrix, thinking of the matrix as a matrix of matrices. This post will give an example.

Let M be a square matrix and suppose we need to multiply M by its transpose MT. We can compute this product faster than multiplying two arbitrary matrices of the same size by exploiting the fact that MMT will be a symmetric matrix.

We start by partitioning M into four blocks

M = \begin{bmatrix}A & B \\ C & D \end{bmatrix}

Then

M^\intercal = \begin{bmatrix}A^\intercal & C^\intercal \\ B^\intercal & D^\intercal \end{bmatrix}

and

MM^\intercal = \begin{bmatrix} AA^\intercal  + BB^\intercal & AC^\intercal  + BD^\intercal \\CA^\intercal + DB^\intercal & CC^\intercal + DD^\intercal \end{bmatrix}

Now for the first clever part: We don’t have to compute both of the off-diagonal blocks because each is the transpose of the other. So we can reduce our calculation by 25% by not calculating one of the blocks, say the lower left block.

And now for the second clever part: apply the same procedure recursively. The diagonal blocks in MMT involve a matrix times its transpose. That is, we can partition A and use the same idea to compute AAT and do the same for BBT, CCT, and DDT. The off diagonal blocks require general matrix multiplications.

The net result is that we can compute MMT in about 2/3 the time it would take to multiply two arbitrary matrices of the same size.

Recently a group of researchers found a way to take this idea even further, partitioning a matrix into a 4 by 4 matrix of 16 blocks and doing some clever tricks. The RXTX algorithm can compute MMT in about 26/41 the time required to multiply arbitrary matrices, a savings of about 5%. A 5% improvement may be significant if it appears in the inner loop of a heavy computation. According to the authors, “The algorithm was discovered by combining Machine Learning-based search methods with Combinatorial Optimization.”

Related posts

Matrices of Matrices

It’s often useful to partition a matrix, thinking of it as a matrix whose entries are matrices. For example, you could look at the matrix 6 × 6

M = \begin{bmatrix} 3 & 1 & 4 & 0 & 0 & 0 \\ 2 & 7 & 1 & 0 & 0 & 0 \\ 5 & 0 & 6 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 4 & 8 \\ 0 & 0 & 0 & 1 & 7 & 6\\ 0 & 0 & 0 & 3 & 5 & 1 \\ \end{bmatrix}

as a 2 × 2 matrix whose entries are 3 × 3 matrices.

\begin{bmatrix} \begin{bmatrix} 3 & 1 & 4 \\ 2 & 7 & 1 \\ 5 & 0 & 6 \\ \end{bmatrix} & \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} \\ & \\ \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} & \begin{bmatrix} 2 & 4 & 8 \\ 1 & 7 & 6\\ 3 & 5 & 1 \\ \end{bmatrix} \end{bmatrix}

M is not a diagonal matrix of real numbers, but its block structure is a diagonal matrix of blocks. It might be possible in some application to take advantage of the fact that the off-diagonal blocks are zero matrices.

You can multiply two partitioned matrices by multiplying the blocks as if they were numbers. The only requirement is that the blocks be of compatible sizes for multiplication: the block widths on the left have to match the corresponding block heights on the right.

For example, if you have two partitioned matrices

A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}
and
B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix}

then their product is the partitioned matrix

C = \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{bmatrix}

The only requirement is that the matrix products above make sense. The number of columns in A11 must equal the number of rows in B11, and the number of columns in A12 must equal the number of rows in B21. (The rest of the compatibility conditions follow because the blocks in a column of a partitioned matrix have the same number of columns, and the blocks in a row of a partitioned matrix have the same number of rows.)

You can partition matrices into any number of rows and columns of blocks; the examples above have been 2 × 2 arrays of blocks only for convenience.

It’s natural to oscillate a bit between thinking “Of course you can do that” and “Really? That’s incredible.” If you haven’t gone through at least one cycle perhaps you haven’t thought about it long enough.

What could go wrong?

Manipulation of partitioned matrices works so smoothly that you have to wonder where things could go wrong.

For the most part, if a calculation makes logical sense then it’s likely to be correct. If you add two blocks, they have to be compatible for addition. If you “divide” by a block, i.e, multiply by its inverse, then the block must have an inverse. Etc.

One subtle pitfall is that matrix multiplication does not commute in general, and so you have to watch out for implicitly assuming that ABBA where A and B are blocks. For example, theorems about the determinant of a partitioned matrix are not as simple as you might hope. The determinant of the partitioned matrix

\begin{vmatrix} A & B \\ C & D \end{vmatrix}

is not simply det(ADBC) in general, though it is true if all the blocks are square matrices of the same size and the blocks in one row or one column commute, i.e. if AB = BA, CD = DC, AC = CA, or BD = DB.

More linear algebra posts

Multiplying by quaternions on the left and right

The map that takes a quaternion x to the quaternion qx is linear, so it can be represented as multiplication by a matrix. The same is true of the map that takes x to xq, but the two matrices are not the same because quaternion multiplication does not commute.

Let qa + bi + cjdk and let qM be the matrix that represents multiplication on the left by q. Then

_qM = \begin{bmatrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \\ \end{bmatrix}

Now let Mq be the matrix that represents multiplication on the right by q. Then

M_q = \begin{bmatrix} a & -b & -c & -d \\ b & a & d & -c \\ c & -d & a & b \\ d & c & -b & a \\ \end{bmatrix}

Can prove both matrix representations are correct by showing that they do the right thing when q = 1, ij, and k. The rest follows by linearity.

You might speculate that the matrix representation for multiplying on the right by q might be the transpose of the matrix representation for multiplying on the left by q. You can look at the matrices above and see that’s not the case.

In this post I talk about how to represent rotations with quaternions, and in this post I give an equation for the equivalent rotation matrix for a rotation described by a quaternion. You can prove that the matrix representation is correct by multiplying out qM and Mq* . Keep in mind that q in that case is a unit quaterion, so the sum of the squares of its components add to 1.

Related posts

Embeddings, Projections, and Inverses

I just revised a post from a week ago about rotations. The revision makes explicit the process of embedding a 3D vector into the quaternions, then pulling it back out.

The 3D vector is embedded in the quaternions by making it the vector part of a quaternion with zero real part:

(p1p2p3)  → (0, p1p2p3)

and the quaternion is returned to 3D by cutting off the real part:

(p0, p1p2p3)  → (p1p2p3).

To give names to the the process of moving to and from quaternions, we have an embedding E : ℝ3 to ℝ4 and a projection P from ℝ4 to ℝ3.

We can represent E as a 4 × 3 matrix

E = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

and P by a 3 × 4 matrix

P = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

We’d like to say E and P are inverses. Surely P undoes what E does, so they’re inverses in that sense. But E cannot undo P because you lose information projecting from ℝ4 to ℝ3 and E cannot recover information that was lost.

The rest of this post will look at three generalizations of inverses and how E and P relate to each.

Left and right inverse

Neither matrix is invertible, but PE equals the identity matrix on ℝ3 , and so P is a left inverse of E and E is a right inverse of P.

PE = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

On the other hand, EP is not an identity matrix, and so E is not a left inverse of E, and neither is P a right inverse of E.

EP = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Adjoint

P is the transpose of E, which means it is the adjoint of E. Adjoints are another way to generalize the idea of an inverse. More on that here.

Pseudo-inverse

The Moore-Penrose pseudo-inverse acts a lot like an inverse, which is somewhat uncanny because all matrices have a pseudo-inverse, even rectangular matrices.

Pseudo-inverses are symmetric, i.e. if A+ is the pseudo-inverse of A, then A is the pseudo-inverse of A+.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A+ is the unique n by m matrix satisfying four conditions:

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

To show that A+ = P we have to establish

  1. EPE = E
  2. PEP = A+
  3. (EP)* = EP
  4. (PE)* = PE

We calculated EP and PE above, and both are real and symmetric, so properties 3 and 4 hold.

We can also compute

 EPE = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = E

and

PEP = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} =P

showing that properties 1 and 2 hold as well.

Related posts

Transpose and Adjoint

The transpose of a matrix turns the matrix sideways. Suppose A is an m × n matrix with real number entries. Then the transpose A is an n × m matrix, and the (ij) element of A is the (ji) element of A. Very concrete.

The adjoint of a linear operator is a more abstract concept, though it’s closely related. The matrix A is sometimes called the adjoint of A. That may be fine, or it may cause confusion. This post will define the adjoint in a more general context, then come back to the context of matrices.

This post, and the next will be more abstract than usual. After indulging in a little pure math, I’ll return soon to more tangible topics such as Morse code and barbershop quartets.

Dual spaces

Before we can define adjoints, we need to define dual spaces.

Let V be a vector space over a field F. You can think of F as ℝ or ℂ. Then V* is the dual space of V, the space of linear functionals on V, i.e. the vector space of functions from V to F.

The distinction between a vector space and its dual seems artificial when the vector space is ℝn. The dual space of ℝn is isomorphic to ℝn, and so the distinction between them can seem pedantic. It’s easier to appreciate the distinction between V and V* when the two spaces are not isomorphic.

For example, let V be L3(ℝ), the space of functions f such that |f|3 has a finite Lebesgue integral. Then the dual space is L3/2(ℝ). The difference between these spaces is not simply a matter of designation. There are functions f such that the integral of |f|3 is finite but the integral of |f|3/2 is not, and vice versa.

Adjoints

The adjoint of a linear operator TVW is a linear operator T*: W* → V* where V* and W* are the dual spaces of V and W respectively. So T* takes a linear function from W to the field F, and returns a function from V to F. How does T* do this?

Given an element w* of W*, T*w* takes a vector v in V and maps it to F by

(T*w*)(v) = w*(Tv).

In other words, T* takes a functional w* on W and turns it into a function on V by mapping elements of V over to W and then letting w* act on them.

Note what this definition does not contain. There is no mention of inner products or bases or matrices.

The definition is valid over vector spaces that might not have an inner product or a basis. And this is not just a matter of perspective. It’s not as if our space has a inner product but we choose to ignore it; we might be working over spaces where it is not possible to define an inner product or a basis, such as ℓ, the space of bounded sequences.

Since a matrix represents a linear operator with respect to some basis, you can’t speak of a matrix representation of an operator on a space with no basis.

Bracket notation

For a vector space V over a field F, denote a function ⟨ ·, · ⟩ that takes an element from V and an element from V* and returns an element of F by applying the latter to the former. That is, ⟨ v, v* ⟩ is defined to be the action of v* on v. This is not an inner product, but the notation is meant to suggest a connection to inner products.

With this notation, we have

Tv, w* ⟩W = ⟨ v, T*w* ⟩V

for all v in V and for all in W by definition. This is the definition of T* in different notation. The subscripts on the brackets are meant to remind us that the left side of the equation is an element of F obtained by applying an element of W* to an element of W, while the right side is an element of F obtained by applying an element of V* to an element of V.

Inner products

The development of adjoint above emphasized that there is not necessarily an inner product in sight. But if there are inner products on V and W, then we can define turn an element of v into an element of V* by associating v with ⟨ ·, v ⟩ where now the brackets do denote an inner product.

Now we can write the definition of adjoint as

Tv, w ⟩W = ⟨ v, T*w V

for all v in V and for all in W. This definition is legitimate, but it’s not natural in the technical sense that it depends on our choices of inner products and not just on the operator T and the spaces V and W. If we chose different inner products on V and W then the definition of T* changes as well.

Back to matrices

We have defined the adjoint of a linear operator in a very general setting where there may be no matrices in sight. But now let’s look at the case of TVW where V and W are finite dimensional vector spaces, either over ℝ or ℂ. (The difference between ℝ and ℂ will matter.) And lets definite inner products on V and W. This is always possible because they are finite dimensional.

How does a matrix representation of T* correspond to a matrix representation of T?

Real vector spaces

Suppose V and W are real vector spaces and A is a matrix representation of TVW with respect to some choice of basis on each space. Suppose also that the bases for V* and W* are given by the duals of the bases for V and W. Then the matrix representation of T* is the transpose of A. You can verify this by showing that

Av, w ⟩W = ⟨ v, Aw V

for all v in V and for all in W.

The adjoint of A is simply the transpose of A, subject to our choice of bases and inner products.

Complex vector spaces

Now consider the case where V and W are vector spaces over the complex numbers. Everything works as above, with one wrinkle. If A is the representation of TVW with respect to a given basis, and we choose bases for V* and W* as above, then the conjugate of A is the matrix representation of T*. The adjoint of A is A*, the conjugate transpose of A. As before, you can verify this by showing that

Av, w ⟩W = ⟨ v, A*w V

We have to take the conjugate of A because the inner product in the complex case requires taking a conjugate of one side.

Related posts

Fredholm Alternative

The Fredholm alternative is so called because it is a theorem by the Swedish mathematician Erik Ivar Fredholm that has two alternative conclusions: either this is true or that is true. This post will state a couple forms of the Fredholm alternative.

Mr. Fredholm was interested in the solutions to linear integral equations, but his results can be framed more generally as statements about solutions to linear equations.

This is the third in a series of posts, starting with a post on kernels and cokernels, followed by a post on the Fredholm index.

Fredholm alternative warmup

Given an m×n real matrix A and a column vector b, either

Axb

has a solution or

AT y = 0 has a solution yTb ≠ 0.

This is essentially what I said in an earlier post on kernels and cokernels. From that post:

Suppose you have a linear transformation TV → W and you want to solve the equation Tx = b. … If c is an element of W that is not in the image of T, then Tx = c has no solution, by definition. In order for Tx = b to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Tx = b is to have a solution.

In this context you could say that the Fredholm alternative boils down to saying either b is in the image of A or it isn’t. If b isn’t in. the image of A, then it has some component in the complement of the image of A, i.e. it has a component in the cokernel, the kernel of AT.

The Fredholm alternative

I’ve seen the Fredholm alternative stated several ways, and the following from [1] is the clearest. The “alternative” nature of the theorem is a corollary rather than being explicit in the theorem.

As stated above, Fredholm’s interest was in integral equations. These equations can be cast as operators on Hilbert spaces.

Let K be a compact linear operator on a Hilbert space H. Let I be the identity operator and A = IK. Let A* denote the adjoint of A.

  1. The null space of A is finite dimensional,
  2. The image of A is closed.
  3. The image of A is the orthogonal complement of the kernel of A*.
  4. The null space of A is 0 iff the image of A is H.
  5. The dimension of the kernel of A equals the dimension of the kernel of A*.

The last point says that the kernel and cokernel have the same dimension, and the first point says these dimensions are finite. In other words, the Fredholm index of A is 0.

Where is the “alternative” in this theorem?

The theorem says that there are two possibilities regarding the inhomogeneous equation

Ax = f.

One possibility is that the homogeneous equation

Ax = 0

has only the solution x = 0, in which case the inhomogeneous equation has a unique solution for all f in H.

The other possibility is that homogeneous equation has non-zero solutions, and the inhomogeneous has solutions has a solution if and only if f is orthogonal to the kernel of A*, i.e. if f is orthogonal to the cokernel.

Freedom and constraint

We said in the post on kernels and cokernels that kernels represent degrees of freedom and cokernels represent constraints. We can add elements of the kernel to a solution and still have a solution. Requiring f to be orthogonal to the cokernel is a set of constraints.

If the kernel of A has dimension n, then the Fredholm alternative says the cokernel of A also has dimension n.

If solutions x to Axf have n degrees of freedom, then right-hand sides f must satisfy n constraints. Each degree of freedom for x corresponds to a basis element for the kernel of A. Each constraint on f corresponds to a basis element for the cokernel that f must be orthogonal to.

[1] Lawrence C. Evans. Partial Differential Equations, 2nd edition

Kernel and Cokernel

The kernel of a linear transformation is the set of vectors mapped to 0. That’s a simple idea, and one you’ll find in every linear algebra textbook.

The cokernel is the dual of the kernel, but it’s much less commonly mentioned in textbooks. Sometimes the idea of a cokernel is there, but it’s not given that name.

Degrees of Freedom and Constraints

One way of thinking about kernel and cokernel is that the kernel represents degrees of freedom and the cokernel represents constraints.

Suppose you have a linear transformation T: VW and you want to solve the equation Txb. If x is a solution and Tk = 0, then xk is also a solution. You are free to add elements of the kernel of T to a solution.

If c is an element of W that is not in the image of T, then Txc has no solution, by definition. In order for Txb to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Txb is to have a solution.

If W is an inner product space, we can define the cokernel as the orthogonal complement to the image of T.

Another way to think of the kernel and cokernel is that in the linear equation Axb, the kernel consists of degrees of freedom that can be added to x, and the cokernel consists of degrees of freedom that must be removed from b in order for there to be a solution.

Cokernel definitions

You may also see the cokernel defined as the quotient space W / image(T). This is not the same space as the complement of the image of T. The former is a subset of W and the latter is a new space. However, these two spaces are isomorphic. This is a little bit of foreshadowing: the most general idea of a cokernel will only hold up to isomorphism.

You may also see the cokernel defined as the kernel of the adjoint of T. This suggests where the name “cokernel” comes from: the dual of the kernel of an operator is the kernel of the dual of the operator.

Kernel and cokernel dimensions

I mentioned above that there are multiple ways to define cokernel. They don’t all define the same space, but they define isomorphic spaces. And since isomorphic spaces have the same dimension, all definitions of cokernel give spaces with the same dimension.

There are several big theorems related to the dimensions of the kernel and cokernel. These are typically included in linear algebra textbooks, even those that don’t use the term “cokernel.” For example, the rank-nullity theorem can be stated without explicitly mentioning the cokernel, but it is equivalent to the following.

For a linear operator T: VW ,

dim V − dim W = dim ker T − dim coker T.

When V or W are finite dimensional, both sides are well defined. Otherwise, the right side may be defined when the left side is not. For example, let V and W both be the space of functions analytic in the entire complex plane and let T be the operator that takes the second derivative. Then the left side is ∞ − ∞ but the right side is 2: the kernel is all functions of the form azb and the cokernel is 0 because every analytic function has an antiderivative.

The right hand side of the equation above is the definition of the index of a linear operator. This is the subject of the next post.

Full generality

Up to this point the post has discussed kernels and cokernels in the context of linear algebra. But we could define the kernel and cokernel in contexts with less structure, such as groups, or more structure, such as Sobolev spaces. The most general definition is in terms of category theory.

Category theory makes the “co” in cokernel obvious. The cokernel will is the dual of the kernel in the same way that every thing in category is related to its co- thing: simply turn all the arrows around. This makes the definition of cokernel easier, but it makes the definition of kernel harder.

We can’t simply define the kernel as “the stuff that gets mapped to 0” because category has no way to look inside objects. We can only speak of objects in terms of how they interact with other objects in their category. There’s no direct way to define 0, much less things that map to 0. But we can define something that acts like 0 (initial objects), and things that act like maps to 0 (zero morphisms), if the category we’re working in contains such things; not all categories do.

For all the tedious details, see the nLab articles on kernel and cokernel.

Related posts