Matrices of Matrices

It’s often useful to partition a matrix, thinking of it as a matrix whose entries are matrices. For example, you could look at the matrix 6 × 6

M = \begin{bmatrix} 3 & 1 & 4 & 0 & 0 & 0 \\ 2 & 7 & 1 & 0 & 0 & 0 \\ 5 & 0 & 6 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 4 & 8 \\ 0 & 0 & 0 & 1 & 7 & 6\\ 0 & 0 & 0 & 3 & 5 & 1 \\ \end{bmatrix}

as a 2 × 2 matrix whose entries are 3 × 3 matrices.

\begin{bmatrix} \begin{bmatrix} 3 & 1 & 4 \\ 2 & 7 & 1 \\ 5 & 0 & 6 \\ \end{bmatrix} & \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} \\ & \\ \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} & \begin{bmatrix} 2 & 4 & 8 \\ 1 & 7 & 6\\ 3 & 5 & 1 \\ \end{bmatrix} \end{bmatrix}

M is not a diagonal matrix of real numbers, but its block structure is a diagonal matrix of blocks. It might be possible in some application to take advantage of the fact that the off-diagonal blocks are zero matrices.

You can multiply two partitioned matrices by multiplying the blocks as if they were numbers. The only requirement is that the blocks be of compatible sizes for multiplication: the block widths on the left have to match the corresponding block heights on the right.

For example, if you have two partitioned matrices

A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}
and
B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix}

then their product is the partitioned matrix

C = \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{bmatrix}

The only requirement is that the matrix products above make sense. The number of columns in A11 must equal the number of rows in B11, and the number of columns in A12 must equal the number of rows in B21. (The rest of the compatibility conditions follow because the blocks in a column of a partitioned matrix have the same number of columns, and the blocks in a row of a partitioned matrix have the same number of rows.)

You can partition matrices into any number of rows and columns of blocks; the examples above have been 2 × 2 arrays of blocks only for convenience.

It’s natural to oscillate a bit between thinking “Of course you can do that” and “Really? That’s incredible.” If you haven’t gone through at least one cycle perhaps you haven’t thought about it long enough.

What could go wrong?

Manipulation of partitioned matrices works so smoothly that you have to wonder where things could go wrong.

For the most part, if a calculation makes logical sense then it’s likely to be correct. If you add two blocks, they have to be compatible for addition. If you “divide” by a block, i.e, multiply by its inverse, then the block must have an inverse. Etc.

One subtle pitfall is that matrix multiplication does not commute in general, and so you have to watch out for implicitly assuming that ABBA where A and B are blocks. For example, theorems about the determinant of a partitioned matrix are not as simple as you might hope. The determinant of the partitioned matrix

\begin{vmatrix} A & B \\ C & D \end{vmatrix}

is not simply det(ADBC) in general, though it is true if all the blocks are square matrices of the same size and the blocks in one row or one column commute, i.e. if AB = BA, CD = DC, AC = CA, or BD = DB.

More linear algebra posts

Multiplying by quaternions on the left and right

The map that takes a quaternion x to the quaternion qx is linear, so it can be represented as multiplication by a matrix. The same is true of the map that takes x to xq, but the two matrices are not the same because quaternion multiplication does not commute.

Let qa + bi + cjdk and let qM be the matrix that represents multiplication on the left by q. Then

_qM = \begin{bmatrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \\ \end{bmatrix}

Now let Mq be the matrix that represents multiplication on the right by q. Then

M_q = \begin{bmatrix} a & -b & -c & -d \\ b & a & d & -c \\ c & -d & a & b \\ d & c & -b & a \\ \end{bmatrix}

Can prove both matrix representations are correct by showing that they do the right thing when q = 1, ij, and k. The rest follows by linearity.

You might speculate that the matrix representation for multiplying on the right by q might be the transpose of the matrix representation for multiplying on the left by q. You can look at the matrices above and see that’s not the case.

In this post I talk about how to represent rotations with quaternions, and in this post I give an equation for the equivalent rotation matrix for a rotation described by a quaternion. You can prove that the matrix representation is correct by multiplying out qM and Mq* . Keep in mind that q in that case is a unit quaterion, so the sum of the squares of its components add to 1.

Related posts

Embeddings, Projections, and Inverses

I just revised a post from a week ago about rotations. The revision makes explicit the process of embedding a 3D vector into the quaternions, then pulling it back out.

The 3D vector is embedded in the quaternions by making it the vector part of a quaternion with zero real part:

(p1p2p3)  → (0, p1p2p3)

and the quaternion is returned to 3D by cutting off the real part:

(p0, p1p2p3)  → (p1p2p3).

To give names to the the process of moving to and from quaternions, we have an embedding E : ℝ3 to ℝ4 and a projection P from ℝ4 to ℝ3.

We can represent E as a 4 × 3 matrix

E = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

and P by a 3 × 4 matrix

P = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

We’d like to say E and P are inverses. Surely P undoes what E does, so they’re inverses in that sense. But E cannot undo P because you lose information projecting from ℝ4 to ℝ3 and E cannot recover information that was lost.

The rest of this post will look at three generalizations of inverses and how E and P relate to each.

Left and right inverse

Neither matrix is invertible, but PE equals the identity matrix on ℝ3 , and so P is a left inverse of E and E is a right inverse of P.

PE = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

On the other hand, EP is not an identity matrix, and so E is not a left inverse of E, and neither is P a right inverse of E.

EP = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Adjoint

P is the transpose of E, which means it is the adjoint of E. Adjoints are another way to generalize the idea of an inverse. More on that here.

Pseudo-inverse

The Moore-Penrose pseudo-inverse acts a lot like an inverse, which is somewhat uncanny because all matrices have a pseudo-inverse, even rectangular matrices.

Pseudo-inverses are symmetric, i.e. if A+ is the pseudo-inverse of A, then A is the pseudo-inverse of A+.

Given an m by n matrix A, the Moore-Penrose pseudoinverse A+ is the unique n by m matrix satisfying four conditions:

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

To show that A+ = P we have to establish

  1. EPE = E
  2. PEP = A+
  3. (EP)* = EP
  4. (PE)* = PE

We calculated EP and PE above, and both are real and symmetric, so properties 3 and 4 hold.

We can also compute

 EPE = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = E

and

PEP = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} =P

showing that properties 1 and 2 hold as well.

Related posts

Transpose and Adjoint

The transpose of a matrix turns the matrix sideways. Suppose A is an m × n matrix with real number entries. Then the transpose A is an n × m matrix, and the (ij) element of A is the (ji) element of A. Very concrete.

The adjoint of a linear operator is a more abstract concept, though it’s closely related. The matrix A is sometimes called the adjoint of A. That may be fine, or it may cause confusion. This post will define the adjoint in a more general context, then come back to the context of matrices.

This post, and the next will be more abstract than usual. After indulging in a little pure math, I’ll return soon to more tangible topics such as Morse code and barbershop quartets.

Dual spaces

Before we can define adjoints, we need to define dual spaces.

Let V be a vector space over a field F. You can think of F as ℝ or ℂ. Then V* is the dual space of V, the space of linear functionals on V, i.e. the vector space of functions from V to F.

The distinction between a vector space and its dual seems artificial when the vector space is ℝn. The dual space of ℝn is isomorphic to ℝn, and so the distinction between them can seem pedantic. It’s easier to appreciate the distinction between V and V* when the two spaces are not isomorphic.

For example, let V be L3(ℝ), the space of functions f such that |f|3 has a finite Lebesgue integral. Then the dual space is L3/2(ℝ). The difference between these spaces is not simply a matter of designation. There are functions f such that the integral of |f|3 is finite but the integral of |f|3/2 is not, and vice versa.

Adjoints

The adjoint of a linear operator TVW is a linear operator T*: W* → V* where V* and W* are the dual spaces of V and W respectively. So T* takes a linear function from W to the field F, and returns a function from V to F. How does T* do this?

Given an element w* of W*, T*w* takes a vector v in V and maps it to F by

(T*w*)(v) = w*(Tv).

In other words, T* takes a functional w* on W and turns it into a function on V by mapping elements of V over to W and then letting w* act on them.

Note what this definition does not contain. There is no mention of inner products or bases or matrices.

The definition is valid over vector spaces that might not have an inner product or a basis. And this is not just a matter of perspective. It’s not as if our space has a inner product but we choose to ignore it; we might be working over spaces where it is not possible to define an inner product or a basis, such as ℓ, the space of bounded sequences.

Since a matrix represents a linear operator with respect to some basis, you can’t speak of a matrix representation of an operator on a space with no basis.

Bracket notation

For a vector space V over a field F, denote a function ⟨ ·, · ⟩ that takes an element from V and an element from V* and returns an element of F by applying the latter to the former. That is, ⟨ v, v* ⟩ is defined to be the action of v* on v. This is not an inner product, but the notation is meant to suggest a connection to inner products.

With this notation, we have

Tv, w* ⟩W = ⟨ v, T*w* ⟩V

for all v in V and for all in W by definition. This is the definition of T* in different notation. The subscripts on the brackets are meant to remind us that the left side of the equation is an element of F obtained by applying an element of W* to an element of W, while the right side is an element of F obtained by applying an element of V* to an element of V.

Inner products

The development of adjoint above emphasized that there is not necessarily an inner product in sight. But if there are inner products on V and W, then we can define turn an element of v into an element of V* by associating v with ⟨ ·, v ⟩ where now the brackets do denote an inner product.

Now we can write the definition of adjoint as

Tv, w ⟩W = ⟨ v, T*w V

for all v in V and for all in W. This definition is legitimate, but it’s not natural in the technical sense that it depends on our choices of inner products and not just on the operator T and the spaces V and W. If we chose different inner products on V and W then the definition of T* changes as well.

Back to matrices

We have defined the adjoint of a linear operator in a very general setting where there may be no matrices in sight. But now let’s look at the case of TVW where V and W are finite dimensional vector spaces, either over ℝ or ℂ. (The difference between ℝ and ℂ will matter.) And lets definite inner products on V and W. This is always possible because they are finite dimensional.

How does a matrix representation of T* correspond to a matrix representation of T?

Real vector spaces

Suppose V and W are real vector spaces and A is a matrix representation of TVW with respect to some choice of basis on each space. Suppose also that the bases for V* and W* are given by the duals of the bases for V and W. Then the matrix representation of T* is the transpose of A. You can verify this by showing that

Av, w ⟩W = ⟨ v, Aw V

for all v in V and for all in W.

The adjoint of A is simply the transpose of A, subject to our choice of bases and inner products.

Complex vector spaces

Now consider the case where V and W are vector spaces over the complex numbers. Everything works as above, with one wrinkle. If A is the representation of TVW with respect to a given basis, and we choose bases for V* and W* as above, then the conjugate of A is the matrix representation of T*. The adjoint of A is A*, the conjugate transpose of A. As before, you can verify this by showing that

Av, w ⟩W = ⟨ v, A*w V

We have to take the conjugate of A because the inner product in the complex case requires taking a conjugate of one side.

Related posts

Fredholm Alternative

The Fredholm alternative is so called because it is a theorem by the Swedish mathematician Erik Ivar Fredholm that has two alternative conclusions: either this is true or that is true. This post will state a couple forms of the Fredholm alternative.

Mr. Fredholm was interested in the solutions to linear integral equations, but his results can be framed more generally as statements about solutions to linear equations.

This is the third in a series of posts, starting with a post on kernels and cokernels, followed by a post on the Fredholm index.

Fredholm alternative warmup

Given an m×n real matrix A and a column vector b, either

Axb

has a solution or

AT y = 0 has a solution yTb ≠ 0.

This is essentially what I said in an earlier post on kernels and cokernels. From that post:

Suppose you have a linear transformation TV → W and you want to solve the equation Tx = b. … If c is an element of W that is not in the image of T, then Tx = c has no solution, by definition. In order for Tx = b to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Tx = b is to have a solution.

In this context you could say that the Fredholm alternative boils down to saying either b is in the image of A or it isn’t. If b isn’t in. the image of A, then it has some component in the complement of the image of A, i.e. it has a component in the cokernel, the kernel of AT.

The Fredholm alternative

I’ve seen the Fredholm alternative stated several ways, and the following from [1] is the clearest. The “alternative” nature of the theorem is a corollary rather than being explicit in the theorem.

As stated above, Fredholm’s interest was in integral equations. These equations can be cast as operators on Hilbert spaces.

Let K be a compact linear operator on a Hilbert space H. Let I be the identity operator and A = IK. Let A* denote the adjoint of A.

  1. The null space of A is finite dimensional,
  2. The image of A is closed.
  3. The image of A is the orthogonal complement of the kernel of A*.
  4. The null space of A is 0 iff the image of A is H.
  5. The dimension of the kernel of A equals the dimension of the kernel of A*.

The last point says that the kernel and cokernel have the same dimension, and the first point says these dimensions are finite. In other words, the Fredholm index of A is 0.

Where is the “alternative” in this theorem?

The theorem says that there are two possibilities regarding the inhomogeneous equation

Ax = f.

One possibility is that the homogeneous equation

Ax = 0

has only the solution x = 0, in which case the inhomogeneous equation has a unique solution for all f in H.

The other possibility is that homogeneous equation has non-zero solutions, and the inhomogeneous has solutions has a solution if and only if f is orthogonal to the kernel of A*, i.e. if f is orthogonal to the cokernel.

Freedom and constraint

We said in the post on kernels and cokernels that kernels represent degrees of freedom and cokernels represent constraints. We can add elements of the kernel to a solution and still have a solution. Requiring f to be orthogonal to the cokernel is a set of constraints.

If the kernel of A has dimension n, then the Fredholm alternative says the cokernel of A also has dimension n.

If solutions x to Axf have n degrees of freedom, then right-hand sides f must satisfy n constraints. Each degree of freedom for x corresponds to a basis element for the kernel of A. Each constraint on f corresponds to a basis element for the cokernel that f must be orthogonal to.

[1] Lawrence C. Evans. Partial Differential Equations, 2nd edition

Kernel and Cokernel

The kernel of a linear transformation is the set of vectors mapped to 0. That’s a simple idea, and one you’ll find in every linear algebra textbook.

The cokernel is the dual of the kernel, but it’s much less commonly mentioned in textbooks. Sometimes the idea of a cokernel is there, but it’s not given that name.

Degrees of Freedom and Constraints

One way of thinking about kernel and cokernel is that the kernel represents degrees of freedom and the cokernel represents constraints.

Suppose you have a linear transformation T: VW and you want to solve the equation Txb. If x is a solution and Tk = 0, then xk is also a solution. You are free to add elements of the kernel of T to a solution.

If c is an element of W that is not in the image of T, then Txc has no solution, by definition. In order for Txb to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Txb is to have a solution.

If W is an inner product space, we can define the cokernel as the orthogonal complement to the image of T.

Another way to think of the kernel and cokernel is that in the linear equation Axb, the kernel consists of degrees of freedom that can be added to x, and the cokernel consists of degrees of freedom that must be removed from b in order for there to be a solution.

Cokernel definitions

You may also see the cokernel defined as the quotient space W / image(T). This is not the same space as the complement of the image of T. The former is a subset of W and the latter is a new space. However, these two spaces are isomorphic. This is a little bit of foreshadowing: the most general idea of a cokernel will only hold up to isomorphism.

You may also see the cokernel defined as the kernel of the adjoint of T. This suggests where the name “cokernel” comes from: the dual of the kernel of an operator is the kernel of the dual of the operator.

Kernel and cokernel dimensions

I mentioned above that there are multiple ways to define cokernel. They don’t all define the same space, but they define isomorphic spaces. And since isomorphic spaces have the same dimension, all definitions of cokernel give spaces with the same dimension.

There are several big theorems related to the dimensions of the kernel and cokernel. These are typically included in linear algebra textbooks, even those that don’t use the term “cokernel.” For example, the rank-nullity theorem can be stated without explicitly mentioning the cokernel, but it is equivalent to the following.

For a linear operator T: VW ,

dim V − dim W = dim ker T − dim coker T.

When V or W are finite dimensional, both sides are well defined. Otherwise, the right side may be defined when the left side is not. For example, let V and W both be the space of functions analytic in the entire complex plane and let T be the operator that takes the second derivative. Then the left side is ∞ − ∞ but the right side is 2: the kernel is all functions of the form azb and the cokernel is 0 because every analytic function has an antiderivative.

The right hand side of the equation above is the definition of the index of a linear operator. This is the subject of the next post.

Full generality

Up to this point the post has discussed kernels and cokernels in the context of linear algebra. But we could define the kernel and cokernel in contexts with less structure, such as groups, or more structure, such as Sobolev spaces. The most general definition is in terms of category theory.

Category theory makes the “co” in cokernel obvious. The cokernel will is the dual of the kernel in the same way that every thing in category is related to its co- thing: simply turn all the arrows around. This makes the definition of cokernel easier, but it makes the definition of kernel harder.

We can’t simply define the kernel as “the stuff that gets mapped to 0” because category has no way to look inside objects. We can only speak of objects in terms of how they interact with other objects in their category. There’s no direct way to define 0, much less things that map to 0. But we can define something that acts like 0 (initial objects), and things that act like maps to 0 (zero morphisms), if the category we’re working in contains such things; not all categories do.

For all the tedious details, see the nLab articles on kernel and cokernel.

Related posts

 

What is partial pivoting?

Gaussian elimination is pretty simple if all goes well, and in introductory courses all does go well.

You have an n × n matrix A, an n × 1 column vector b, and you want to find an n × 1 column vector x such that Ax = b.

You subtract multiples of the first rows of A and b to zero out all the elements below a11. Then you subtract multiples of the second rows of A and b to zero out all the elements below a22. You keep doing this until all the elements below the diagonal are zero. Then you can solve for x using back substitution.

The process above is called naive Gaussian elimination. This post is for people who are familiar with naive Gaussian elimination but who do not realize it is naive.

Creating zeros

Suppose the first two rows of our matrix A are as follows:

2 * * * …
3 * * * …

Then you subtract 3/2 times the new first row from the second:

2 * * * …
0 * * * …

You would continue to subtract multiples of the first row from the rest of the rows.

What could go wrong?

What if the first two rows looked like this?

0 * * * …
3 * * * …

In this case naive Gaussian elimination fails on the first step. As far as I recall, I never saw an example like that when I took linear algebra: naive Gaussian elimination always worked. I don’t think I ever ran into this possibility until I took a course in numerical analysis.

The simplest way around the problem would be to swap the first two rows:

3 * * * …
0 * * * …

I imagine this strategy was commonly done in hand calculations long before anyone gave the process a name. I said that I don’t remember encountering this problem in a linear algebra class. Maybe it came up in a homework problem and I swapped rows without thinking about it. But if you’re going to write software to implement Gaussian elimination, you have to think about these things systematically.

You might have a program literally swap rows, or if you’re more performance-conscious you might conceptually swap the rows, keeping track of the swaps with an index but without moving data in memory.

Pivots and pivoting

The first element in the first row is called a pivot. More generally, the akk element at the kth step of Gaussian elimination is a pivot. When you run into a zero pivot, you have a logical problem: naive Gaussian elimination cannot proceed without some help, i.e. swapping rows. Less obvious but still true is that if you run into a small pivot, you have a numerical problem. When carrying out calculations on a computer with floating point arithmetic, small but non-zero pivots can lead to loss of accuracy, possibly catastrophic loss.

Swapping rows to avoid small pivots is called partial pivoting. Partial pivoting is usually [1] adequate in practice. Complete pivoting allows the possibility of swapping columns as well. You can go down a deep rabbit hole exploring the nuances of various pivoting strategies.

Gaussian elimination is part of the high school curriculum, but there are a lot of practical issues to work out when applying this apparently simple algorithm. As numerical analysis expert Nick Trefethen put it, “The closer one looks, the more subtle and remarkable Gaussian elimination appears.”

Related posts

[1] “Despite [artificially constructed examples] Gaussian elimination with partial pivoting is utterly stable in practice. … In fifty years of computing, no matrix problems that create explosive instability are known to have arisen under natural circumstances.” — Nick Trefethen and David Bau. Numerical Linear Algebra. 2022.

Power method and centrality

A few days ago I wrote about eigenvector centrality, a way of computing which nodes in a network are most important. Rather than simply looking for the most highly connected nodes, it looks for nodes that are highly connected to nodes that are highly connected. It’s the idea behind Google’s PageRank algorithm.

Adjacency matrices

One way to capture the structure of a graph is its adjacency matrix A. Each element aij of this matrix equals 1 if there is an edge between the ith and jth node and a 0 otherwise. If you square the matrix A, the (i, j) entry of the result tells you how many paths of length 2 are between the ith and jth nodes. In general, the (i, j) entry of An tells you how many paths of length n there are between the corresponding nodes.

Power method

Calculating eigenvector centrality requires finding an eigenvector associated with the largest eigenvalue of A. One way to find such an eigenvector is the power method. You start with a random initial vector and repeatedly multiply it by A. This produces a sequence of vectors that converges to the eigenvector we’re after.

Conceptually this is the same as computing An first and multiplying it by the random initial vector. So not only does the matrix An count paths of length n, for large n it helps us compute eigenvector centrality.

Theoretical details

Now for a little fine print. The power method will converge for any starting vector that has some component in the direction of the eigenvector you’re trying to find. This is almost certainly the case if you start with a vector chosen at random. The power method also requires that the matrix has a single eigenvector of largest magnitude and that its associated eigenspace have dimension 1. The post on eigenvector centrality stated that these conditions hold, provided the network is connected.

Computational practicalities

In principle, you could use calculate eigenvector centrality by computing An for some large n. In practice, you’d never do that. For a square matrix of size N, a matrix-vector product takes O(N²) operations, whereas squaring A requires O(N³) operations. So you would repeatedly apply A to a vector rather than computing powers of A.

Also, you wouldn’t use the power method unless A is sparse, making it relatively efficient to multiply by A. If most of the entries of A are zeros, and there is an exploitable pattern to where the non-zero elements are located, you can multiply A by a vector with far less than N² operations.

Eigenvector centrality

A basic question to ask about a network is which nodes are most important. A simple way of answering this question would be to say the most well-connected nodes, i.e. the nodes with the most edges. This approach is known as degree centrality. It’s not a bad place to start. It’s easy to understand and easy to compute. It may be adequate for some purposes, but it’s often helpful to take a more sophisticated approach known as eigenvector centrality.

Examples

Let’s look at a few motivating examples before we get into the details.

Social network marketing

If you wanted to advertise something, say a book you’ve written, and you’re hoping people will promote it on Twitter. Would you rather get a shout out from someone with more followers or someone with less followers? All else being equal, more followers is better. But even better would be a shout out from someone whose followers have a lot of followers.

Deans and flight attendants

Suppose you’re at a graduation ceremony. Your mind starts to wander after the first few hundred people walk across the stage, and you start to think about how a cold might spread through the crowd. The dean handing out diplomas could be a superspreader because he’s shaking hands with everyone as they receive their diplomas. But an inconspicuous parent in the audience may also be a superspreader because she’s a flight attendant and will be in a different city every day for the next few days. And not only is she a traveler, she’s in contact with travelers.

Search engines

Ranking web pages according to the number of inbound links they have was a good idea because this takes advantage of revealed preferences: instead of asking people what pages they recommend, you can observe what pages they implicitly recommend by linking to them. An even better idea was Page Rank, weighing inbound links by how many links the linking pages have.

Eigenvector centrality

The idea of eigenvector centrality is to give each node a rank proportional to the sum of the ranks of the adjacent nodes. This may seem circular, and it kinda is.

To know the rank of a node, you have to know the ranks of the nodes linking to it. But to know their ranks, you have to know the ranks of the nodes linking to them, etc. There is no sequential solution to the problem because you’d end up in an infinite regress, even for a finite network. But there is a simultaneous solution, considering all pages at once.

You want to assign to the ith node a rank xi proportional to the sum of the ranks of all adjacent nodes. A more convenient way to express this to compute the weighted sum of the ranks of all nodes, with weight 1 for adjacent nodes and weight 0 for non-adjacent nodes. That is, you want to compute

x_i = \frac{1}{\lambda} \sum a_{ij} x_j

where the a‘s are the components of the adjacency matrix A. Here aij equals 1 if there is an edge between nodes i and j and it equals 0 otherwise.

This is equivalent to looking for solutions to the system of equations

Ax = \lambda x

where A is the adjacency matrix and x is the vector of node ranks. If there are N nodes, then A is an N × N matrix and x is a column vector of length N.

In linear algebra terminology, x is an eigenvector of the adjacency matrix A, hence the name eigenvector centrality.

Mathematical questions

There are several mathematical details to be concerned about. Does the eigenvalue problem defining x have a solution? Is the solution unique (up to scalar multiples)? What about the eigenvalue λ? Does the adjacency matrix have multiple eigenvalues, which would mean there are multiple eigenvectors?

If A is the adjacency matrix for a connected graph, then there is a unique eigenvalue λ of largest magnitude, there is a unique corresponding eigenvector x, and all the components of x are non-negative. It is often said that this is a result of the Perron–Frobenius theorem, but there’s a little more to it than that because you need the hypothesis that the graph is connected.

The matrix A is non-negative, but not necessarily positive: some entries may be zero. But if the graph is connected, there’s a path between any node i and another node j, which means for some power of A, the ij entry of A is not zero. So although A is not necessarily positive, some power of A is positive. This puts us in the positive case of the Perron–Frobenius theorem, which is nicer than the non-negative case.

This section has discussed graphs, but Page Rank applies to directed graphs because links have a direction. But similar theorems hold for directed graphs.

Related posts

Determinant of an infinite matrix

What does the infinite determinant

D = \left| \begin{array}{lllll} 1 & a_1 & 0 & 0 & \cdots \\ b_1 & 1 & a_2 & 0 & \cdots \\ 0 & b_2 & 1 & a_3 & \\ 0 & 0 & b_3 & 1 & \ddots \\ \vdots & \vdots & & \ddots & \ddots \\ \end{array} \right|

mean and when does it converge?

The determinant D above is the limit of the determinants Dn defined by

D_n = \left| \begin{array}{lllll} 1 & a_1 & & & \\ b_1 & 1 & a_2 & & \\ & b_2 & 1 & \ddots & \\ & & \ddots & \ddots & a_{n-1} \\ & & & b_{n-1} & 1 \\ \end{array} \right|

If all the a‘s are 1 and all the b‘s are −1 then this post shows that Dn = Fn, the nth Fibonacci number. The Fibonacci numbers obviously don’t converge, so in this case the determinant of the infinite matrix does not converge.

In 1895, Helge von Koch said in a letter to Poincaré that the infinite determinant is absolutely convergent if and only if the sum

\sum_{i=1}^\infty a_ib_i

is absolutely convergent. A proof is given in [1].

The proof shows that the Dn are bounded by

\prod_{i=1}^n\left(1 + |a_ib_i| \right)

and so the infinite determinant converges if the corresponding infinite product converges. And a theorem on infinite products says

\prod_{i=1}^\infty\left(1 + |a_ib_i| \right)

converges absolute if the sum in Koch’s theorem converges. In fact,

\prod_{i=1}^\infty\left(1 + |a_ib_i| \right) \leq \exp\left(\sum_{i=1}^\infty |a_ib_i| \right )

and so we have an upper bound on the infinite determinant.

Related post: Triadiagonal systems, determinants, and cubic splines

[1] A. A. Shaw. H. von Koch’s First Lemma and Its Generalization. The American Mathematical Monthly, April 1931, Vol. 38, No. 4, pp. 188–194