Bibliography histogram

I recently noticed something in a book I’ve had for five years: the bibliography section ends with a histogram of publication dates for references. I’ve used the book over the last few years, but maybe I haven’t needed to look at the bibliography before.

publication date histogram

This is taken from Bernstein’s Matrix Mathematics. I wrote a review of it here.

An up-to-date bibliography isn’t always necessary. One of the references I use the most is nearly 60 years old, and another book I pull down occasionally is 100 years old. But sometimes you do need a current reference, and a histogram of publication dates is a good way for the author to demonstrate that the references are up to date.

By the way, notice the phantom diagonal lines across the histogram? That’s a moiré.[1]

***

[1] Pun on this and this.

Cofactors, determinants, and adjugates

Let A be an n × n matrix over a field F. The cofactor of an element Aij is the matrix formed by removing the ith row and jth column, denoted A[i, j].

This terminology is less than ideal. The matrix just described is called the cofactor of Aij, but it would more accurately be called the cofactor of (i, j) because it does not depend on the matrix entry Aij per se but rather its position in the matrix A. For example, if someone asked you what the cofactor of 7 is, you’d have to say “That depends. Where is the 7 located?”. But we will stick to standard terminology.

(I wrote a post about a similar phenomenon in statistical notation, but I can’t find it this morning. You can’t answer questions like “What is p(0.5)?” in statistics by without first asking “If you were to write 0.5 using a variable, would you write it as x, y, or θ?”)

You can compute the determinant of a matrix by taking the alternating sum of elements of a row (or column) multiplied by the determinants of their cofactors.

\det A = \sum_{j=1}^n (-1)^{i+j} A_{i,j} \det A_{[i,j]}

This is essentially Cramer’s rule. It suggests a recursive procedure for computing determinants which is theoretically useful, but only computationally useful for very small matrices.

The adjugate of A, written adj(A), is the transpose of the matrix formed by replacing the (i, j) element of A with an alternating sign times its cofactor [1].

\text{adj}(A)_{ij} = (-1)^{i+j} \det A[j,i]

As you might guess, the Mathematica command to compute the adjugate is Adjugate.

Note that “adjugate” is not a typo for “adjoint.” The adjoint of A (now) means something different. What we now call the adjugate was sometimes in the past called the adjoint. What we now call the adjoint is the (conjugate) transpose.

Here’s an application of the adjugate. Let A be a 3 × 3 matrix. Then the characteristic polynomial of A is

x^3 - (\text{tr}\, A) x^2 + \text{tr}\,\left(\text{adj}(A)\right)x - \det A

where tr denotes the trace, the sum of the diagonal elements of a matrix.

Since the trace only depends on the diagonal, you only need to compute the diagonal of a adjugate to use the equation above for the characteristic polynomial. And the alternating terms defining the adjugate are all positive on the diagonal. So the trace of the adjugate is the sum of the determinants of the cofactors of the diagonal: det A11 + det A22 + det A33.

Circulant matrices commute

A few days ago I wrote that circulant matrices all have the same eigenvectors. This post will show that it follows that circulant matrices commute with each other.

Recall that a circulant matrix is a square matrix in which the rows are cyclic permutations of each other. If we number the rows from 0, then the kth row takes the last k rows of the top row and moves them to the front.

Numerical example

We can generate two random circulant matrices and confirm that they commute by modifying the Python code from the earlier post.

    np.random.seed(42)
    N = 6
    row = np.random.random(N)
    A = np.array([np.roll(row, i) for i in range(N)])
    row = np.random.random(N)
    B = np.array([np.roll(row, i) for i in range(N)])
    AB = np.matmul(A, B)
    BA = np.matmul(B, A)
    print(np.absolute(AB - BA).max())

This computes two random 6 by 6 circulant matrices A and B and shows that the difference between AB and BA is on the order of the limit of floating point precision. Specifically, this prints 4.44 × 10-16.

Proof

Now for a proof. Let A and B be two matrices of size n that have the same set of independent eigenvectors e1 through en. In the case of circulant matrices we can take ek to be the kth column of the FFT matrix, but the proof here applies to any matrices with common eigenvectors.

Let αk be the eigenvalue for ek as an eigenvector of A and let βk be the eigenvalue for ek as an eigenvector of B. Then

 (AB)e_k = A(Be_k) = A(\beta_k e_k) = \beta_k(Ae_k) = \beta_k\alpha_k e_k

and

(BA)e_k = B(Ae_k) = B(\alpha_k e_k) = \alpha_k(Be_k) = \alpha_k\beta_k e_k

and so multiplying by AB or BA has the same effect on ek since the complex numbers αk and βk commute. Since the eigenvalues form a basis, and multiplication by AB and BA has the same effect on each basis vector, AB = BA.

In short, matrices that share eigenvectors commute because eigenvalues commute.

Circulant matrices, eigenvectors, and the FFT

A circulant matrix is a square matrix in which each row is a rotation of the previous row. This post will illustrate a connection between circulant matrices and the FFT (Fast Fourier Transform).

Circulant matrices

Color in the first row however you want. Then move the last element to the front to make the next row. Repeat this process until the matrix is full.

The NumPy function roll will do this rotation for us. Its first argument is the row to rotate, and the second argument is the number of rotations to do. So the following Python code generates a random circulant matrix of size N.

    import numpy as np

    np.random.seed(20230512)
    N = 5
    row = np.random.random(N)
    M = np.array([np.roll(row, i) for i in range(N)])

Here’s the matrix M with entries truncated to 3 decimal places to make it easier to read.

    [[0.556 0.440 0.083 0.460 0.909]
     [0.909 0.556 0.440 0.083 0.460]
     [0.460 0.909 0.556 0.440 0.083]
     [0.083 0.460 0.909 0.556 0.440]
     [0.440 0.083 0.460 0.909 0.556]]    

Fast Fourier Transform

The Fast Fourier Transform is a linear transformation and so it can be represented by a matrix [1]. This the N by N matrix whose (j, k) entry is ωjk where ω = exp(-2πi/N), with j and k running from 0 to N – 1. [2]

Each element of the FFT matrix corresponds to a rotation, so you could visualize the matrix using clocks in each entry or by a cycle of colors. A few years ago I created a visualization using both clock faces and colors:

Eigenvectors and Python code

Here’s a surprising property of circulant matrices: the eigenvectors of a circulant matrix depend only on the size of the matrix, not on the elements of the matrix. Furthermore, these eigenvectors are the columns of the FFT matrix. The eigenvalues depend on the matrix entries, but the eigenvectors do not.

Said another way, when you multiply a circulant matrix by a column of the FFT matrix of the same size, this column will be stretched but not rotated. The amount of stretching depends on the particular circulant matrix.

Here is Python code to illustrate this.

    for i in range(N):
        ω = np.exp(-2j*np.pi/N)
        col1 = np.array([ω**(i*j) for j in range(N)])
        col2 = np.matmul(M, col1)
        print(col1/col2)

In this code col1 is a column of the FFT matrix, and col2 is the image of the column when multiplied by the circulant matrix M. The print statement shows that the ratios of each elements are the same in each position. This ratio is the eigenvalue associated with each eigenvector. If you were to generate a new random circulant matrix, the ratios would change, but the input and output vectors would still be proportional.

Notes

[1] Technically this is the discrete Fourier transform (DFT). The FFT is an algorithm for computing the DFT. Because the DFT is always computed using the FFT, the transformation itself is usually referred to as the FFT.

[2] Conventions vary, so you may see the FFT matrix written differently.

What does rotating a matrix do to its determinant?

This post will look at rotating a matrix 90° and what that does to the determinant. This post was motivated by the previous post. There I quoted a paper that had a determinant with 1s in the right column. I debated rotating the matrix so that the 1s would be along the top because that seems more natural, but in the end I kept the original notation. If I had rotated the matrix, I’d need to adjust the value of the determinant.

Rotating a matrix is a an unusual operation. When someone speaks of turning a matrix over, they usually mean taking the transpose; taking the transpose is a reflection, not a rotation. And when you see the words “rotation” and “matrix” in close proximity, the topic is a matrix that represents a rotation. That’s not what this post is about: the matrix isn’t performing a rotation; a rotation is being performed on it.

Here’s an example:

\begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix} \rightarrow \begin{pmatrix} c & f & i \\ b & e & h \\ a & d & g \end{pmatrix}

Using subscripts makes the example above a little harder to read but much easier to formalize.

\begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} \rightarrow \begin{pmatrix} a_{13} & a_{23} & a_{33} \\ a_{12} & a_{22} & a_{32} \\ a_{11} & a_{21} & a_{31} \end{pmatrix}

If we denote a matrix by A and its image after rotation as A‘, then the components a‘ of A‘ are related to the components a of A by

a'_{ij} = a_{j, n-i+1}

To form the rotation of A, we take the transpose of A, then reverse the order of the rows. In matrix notation,

A' = JA^T

where J is the matrix that has 1’s on the secondary diagonal, the diagonal running southwest to northeast, and 0’s everywhere else. The matrix J is the rotation of the identity matrix I.

The determinant of A‘ is the determinant of J times the determinant of AT, and the determinant of AT is the same as the determinant of A.

\det A' = \det J \det A^T = \det J \det A

So the last thing we need to do to compute the determinant of A‘ is to compute the determinant of J.

We can turn I into J, and vice versa, by reversing the order of the rows. Let’s do this one pair of rows at a time. We swap the first and last rows. Then we swap the second row with the second to last row, then swap the third row from the top with the third row from the bottom, etc.

Suppose I is an n × n matrix, and first assume n is even. Then we need to do n/2 swaps, swapping each of the top n/2 rows with one of the bottom n/2 rows. Each swap reverses the sign of the determinant, and so the determinant of J is (-1)n/2. Now if n is odd, the middle row doesn’t move. So we swap the top (n – 1)/2 rows with the bottom (n – 1)/2 rows. So in that case the determinant of J is (-1)(n-1)/2. We can combine the even and the odd case using floor notation:

\det A' = (-1)^{\lfloor n/2\rfloor} \det A

Gaussian elimination

Carl Friedrich Gauss

When you solve systems of linear equations, you probably use Gaussian elimination, even if you don’t call it that. You may learn Gaussian elimination before you see it formalized in terms of matrices.

So if you’ve had a course in linear algebra, and you sign up for a course in numerical linear algebra, it’s natural to expect that Gaussian elimination would be one of the first things you talk about. That was my experience, and it was painful. I had this odd mixture of feeling like I already knew what the professor was talking about, while at the same time feeling lost.

Trefethen and Bau do not start their numerical linear algebra textbook with Gaussian elimination, and they explain why in the preface:

We have departed from the customary practice by not starting with Gaussian elimination. That algorithm is atypical of numerical linear algebra, exceptionally difficult to analyze, yet at the same time tediously familiar to every student in a course like this. Instead, we begin with the QR factorization, which is more important, less complicated, and fresher idea to most students.

It’s reassuring to hear experts in numerical linear algebra admit that Gaussian elimination is “exceptionally difficult to analyze.” The algorithm is not exceptionally difficult to perform; children learn to manually execute the algorithm. But it is surprisingly difficult, and tedious, to analyze.

Numerical analysts like Trefethen and Bau have much more sophisticated questions in mind than how you would naively carry out the algorithm by hand. Numerical analysts are concerned, for example, with stability.

If your coefficients are known exactly and you carry out your arithmetic exactly, then you get an exact result. But can a small change to your inputs, say due to measurement uncertainty, or the inevitable loss of precision from floating point arithmetic, make a large change in your results? Absolutely, unless you use pivoting. But with pivoting, Gaussian elimination is stable [1].

After taking an introductory linear algebra course, you know how to solve any linear system of equations in principle. But when you actually want to compute the solution accurately, efficiently, at scale, in a real computer, with real data, things can get much more interesting. Many people have devoted their careers to doing in practice what high school students know how to do in principle.

Related posts

[1] There are matrices for which Gaussian elimination is unstable, but they are rare. It’s been shown that they are rare in the sense of having low probability, but more importantly they never arise naturally in applied problems.

Self-orthogonal vectors and coding

One of the surprising things about linear algebra over a finite field is that a non-zero vector can be orthogonal to itself.

When you take the inner product of a real vector with itself, you get a sum of squares of real numbers. If any element in the sum is positive, the whole sum is positive.

In a finite field, things don’t work that way. A list of non-zero squares can add up to zero.

In the previous post, we looked at the ternary Golay code, an error correcting code that multiplies a vector of data by a rectangular matrix mod 3. That post included the example that

[1, 0, 1, 2, 2]

was encoded as

[1 0 1 2 2 0 1 0 2 0 0].

The output is orthogonal to itself:

1² + 0² + 1² + 2² + 2² + 0² + 1² + 0² + 2² + 0² + 0² ≡ 0 (mod 3).

In fact, every output of the ternary Golay code will be self-orthogonal. To see this, let v be a 1 by 5 row vector. Then its encoding is vG where G is the 5 by 11 matrix in the previous post. The inner product of vG with itself is

vG (vG)T = vGGTvT.

We can easily show that GGT is a zero matrix using Python:

    >>> (G@G.T)%3
    [[0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]
     [0 0 0 0 0]]

And so

vGGTvT = vOvT = 0.

If a vector is not self-orthogonal, there has been an error. For example, if one of the 0s in our output turned into a 1 or a 2, the inner product of the corrupted vector with itself would be equal to 1 (mod 3) rather than 0.

Self-orthogonality is necessary but not sufficient for an encoded vector to be error-free. Correctly encoded vectors form a 5 dimensional subspace of GF(3)11, namely the row space of G. The set of self-orthogonal vectors in GF(3)11 is a larger space.

A sufficient condition for a row vector w to be the encoding of a data vector v is for

GwT

to be the zero vector (carrying out all calculations mod 3).

The ternary Golay code is capable of detecting and correcting corruption in up to two spots in a vector. For example, suppose we take our vector

[1 0 1 2 2 0 1 0 2 0 0]

above and corrupt it by turning the first couple 2’s to 1’s.

[1 0 1 1 1 0 1 0 2 0 0]

We’d still get a self-orthogonal vector, but the product GwT would not be zero.

    >>> v = np.array( [1, 0, 1, 2, 2] )
    >>> w = (v@G)%3
    >>> w[3] = w[4] = 1
    >>> print((G@w)%3)

This prints

    [0 0 0 2 2]

which lets us know that the modified version of w is not a valid encoding, not a part of a row space of G.

If we know (or are willing to assume) that

[1 0 1 1 1 0 1 0 2 0 0]

is the result of no more than two changes applied a valid code word vG, then we can recover vG by looking for the closest vector in the row space of G. Here closeness is measured in Hamming distance: the number of positions in which vectors differ. Every vector in GF(3)11 is within a distance of 2 from a unique code word, and so the code can correct any two errors.

Related posts

Ternary Golay code in Python

Marcel Golay discovered two “perfect” error-correcting codes: one binary and one ternary. These two codes stick out in the classification of perfect codes [1].

The ternary code is a linear code over GF(3), the field with three elements. You can encode a list of 5 base-three digits by multiplying the list as a row vector by the following generator matrix on the right:

\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 2 & 2 & 0 \\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 2 & 1 & 0 & 2 \\ 0 & 0 & 1 & 0 & 0 & 1 & 2 & 1 & 0 & 1 & 2 \\ 0 & 0 & 0 & 1 & 0 & 1 & 2 & 0 & 1 & 2 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 2 & 2 & 1 & 1 \\ \end{pmatrix}

Here the arithmetic is carried out in GF(3): every multiplication and addition is carried out mod 3.

Suppose we want to write this up in Python. How can you tell NumPy that you want to do arithmetic mod 3? I don’t believe you can directly. However, you could just multiply your row vector by the matrix using ordinary arithmetic and reducing the result mod 3 at the end. This gives the same result as if all intermediate operations had been carried out mod 3.

For example, suppose we wanted to encode the list [1, 0, 1, 2, 2].

    import numpy as np

    G = np.array([
      [1, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0], 
      [0, 1, 0, 0, 0, 1, 1, 2, 1, 0, 2], 
      [0, 0, 1, 0, 0, 1, 2, 1, 0, 1, 2], 
      [0, 0, 0, 1, 0, 1, 2, 0, 1, 2, 1], 
      [0, 0, 0, 0, 1, 1, 0, 2, 2, 1, 1], 
    ])

    v = np.array( [1, 0, 1, 2, 2] )
    print ((v@G)%3)

The vector-matrix product v@G contains integers that are larger than 3:

    [1 0 1 2 2 6 7 6 8 9 6]

But when we reduce everything mod 3 we get

    [1 0 1 2 2 0 1 0 2 0 0]

This doesn’t mean that linear algebra over a finite field is trivial, though it was trivial in this case.

The same trick would work if we were to work modulo any prime. So the analogous trick would work mod 7, for example.

However, the trick would not work for the field with 8 elements, for example, because arithmetic in this field is not simply arithmetic mod 8. (You can read why here.) So our trick works for GF(p) for any prime p, but not in GF(pk) for any k > 1.

Related posts

[1] From “Sphere Packings, Lattices and Groups” by J. H. Conway and N. J. A. Sloane:

Perfect codes were essentially classified by Tietäväinen and van Lint. The list is as follows.

(i) Certain trivial codes …
(ii) Hamming codes …
(iii) Nonlinear codes with the same parameters as Hamming codes …
(iv) The binary [23, 12, 7] binary Golay code C23 and the ternary [11, 6, 5] Golay code C11.

Non-associative multiplication

There are five ways to parenthesize a product of four things:

  • ((ab)c)d
  • (ab)(cd)
  • (a(b(cd))
  • (a(bc))d
  • (a((bc)d)

In a context where multiplication is not associative, the five products above are not necessarily the same. Maybe all five are different.

This post will give two examples where the products above are all different: octonions and matrix multiplication.

If you’re thinking “But wait: matrix multiplication is associative!” then read further and see in what sense I’m saying it’s not.

Octonioins

Octonion multiplication is not associative. I wrote a blog post a while back asking how close the octonions come to being associative. That is, if we randomly generate unit-length octonions a, b, and c, we can calculate the norm of

(ab)ca(bc)

and ask about its expected value. Sometimes for a triple of octionions this value is zero, but on average this expression has norm greater than 1. I estimated the average value via simulation, and later Greg Egan worked out the exact value. His write-up had gone down with the Google+ ship, but recently Greg posted a new version of his notes.

In this post I gave Python code for multiplying octonions using a function called CayleyDickson named after the originators of the algorithm. Let’s rename it m to have something shorter to work with and compute all five ways of associating a product of four octonions.

    import numpy as np

    def m(x, y):
        return CayleyDickson(x, y)

    def take_five(a, b, c, d):
        return [
            m(a, (m(b, m(c, d)))),
            m(m(a, b), m(c, d)),
            m(m(m(a, b), c), d),
            m(a, m(m(b, c), d)),
            m(m(a, m(b, c)), d)
    ]

I first tried products of basis elements, and I only got two different products out of the five ways of associating multiplication, but I only tried a few examples. However, when I tried using four random octonions, I got five different products.

    np.random.seed(20220201)

    a = np.random.rand(8)
    b = np.random.rand(8)
    c = np.random.rand(8)
    d = np.random.rand(8)

    for t in take_five(a, b, c, d):
        print(t)

This gave a very interesting result: I got five different results, but only two different real (first) components. The five vectors all differed in the last seven components, but only produced two distinct first components.

    [ 2.5180856  -2.61618184  ...]
    [ 2.5180856   0.32031027  ...]
    [ 2.5180856   1.13177500  ...]
    [ 3.0280984  -0.30169446  ...]
    [ 3.0280984  -2.36523580  ...]

I repeated this experiment a few times, and the first three results always had the same real component, and the last two results had another real component.

I suspect there’s a theorem that says

Re( ((ab)c)d ) = Re( (ab)(cd) ) = Re( a(b(cd)) )

and

Re( (a(bc))d ) = Re( a((bc)d) )

but I haven’t tried to prove it. If you come with a proof, or a counterexample, please post a link in the comments.

Matrix multiplication

Matrix multiplication is indeed associative, but the efficiency of matrix multiplication is not. That is, any two ways of parenthesizing a matrix product will give the same final matrix, but the cost of the various products are not the same. I first wrote about this here.

This is a very important result in practice. Changing the parentheses in a matrix product can make the difference between a computation being practical or impractical. This comes up, for example, in automatic differentiation and in backpropagation for neural networks.

Suppose A is an m by n matrix and B is an n by p matrix. Then AB is an m by p matrix, and forming the product AB requires mnp scalar multiplications. If C is a by q matrix, then (AB)C takes

mnp + mpq = mp(n + q)

scalar multiplications, but computing A(BC) takes

npqmnq = nq(m + p)

scalar multiplications, and in general these are not equal.

Let’s rewrite our multiplication function m and our take_five function to compute the cost of multiplying four matrices of random size.

We’ve got an interesting programming problem in that our multiplication function needs to do two different things. First of all, we need to know the size of the resulting matrix. But we also want to keep track of the number of scalar multiplications the product would require. We have a sort of main channel and a side channel. Having our multiplication function return both the dimension and the cost would make composition awkward.

This is is kind of a fork in the road. There are two ways to solving this problem, one high-status and one low-status. The high-status approach would be to use a monad. The low-status approach would be to use a global variable. I’m going to take the low road and use a global variable. What’s one little global variable among friends?

    mults = 0

    def M(x, y):
        global mults
        mults += x[0]*x[1]*y[0]
        return (x[0], y[1])

    def take_five2(a, b, c, d):

        global mults
        costs = []

        mults = 0; M(a, (M(b, M(c, d)))); costs.append(mults)
        mults = 0; M(M(a, b), M(c, d));   costs.append(mults)
        mults = 0; M(M(M(a, b), c), d);   costs.append(mults)
        mults = 0; M(a, M(M(b, c), d));   costs.append(mults)
        mults = 0; M(M(a, M(b, c)), d);   costs.append(mults)

        return costs 

Next, I’ll generate five random integers, and group them in pairs as sizes of matrices conformable for multiplication.

    dims = np.random.random_integers(10, size=5)
    dim_pairs = zip(dims[:4], dims[1:])
    c = take_five2(*[p for p in dim_pairs])

When I ran this dims was set to

[3 9 7 10 6]

and so my matrices were of size 3×9, 9×7, 7×10, and 10×6.

The number of scalar multiplications required by each way of multiplying the four matrices, computed by take_five2 was

[1384, 1090, 690, 1584, 984]

So each took a different number of operations. The slowest approach would take more than twice as long as the fastest approach. In applications, matrices can be very long and skinny, or very wide and thin, in which case one approach may take orders of magnitude more operations than another.

Related posts

Vector spaces and subspaces over finite fields

A surprising amount of linear algebra doesn’t depend on the field you’re working over. You can implicitly assume you’re working over the real numbers R and prove all the basic theorems—say all the theorems that come before getting into eigenvalues in a typical course—and all or nearly all of the theorems remain valid if you swap out the complex numbers for the real numbers. In fact, they hold for any field, even a finite field.

Subspaces over infinite fields

Given an n-dimensional vector space V over a field F we can ask how many subspaces of V there are with dimension k. If our field F is the reals and 0 < k < n then there are infinitely many such subspaces. For example, if n = 2, every line through the origin is a subspace of dimension 1.

Now if we’re still working over R but we pick a basis

e1, e2, e3, …, en

and ask for how many subspaces of dimension k there are in V  that use the same basis elements, now we have a finite number. In fact the number is the binomial coefficient

\binom{n}{k}

because there are as many subspaces as there are ways to select k basis elements from a set of n.

Subspaces over finite fields

Let F be a finite field with q elements; necessarily q is a prime power. Let V be an n-dimensional vector space over F. We might need to know how many subspaces of V there are of various dimensions when developing algebraic codes, error detection and correction codes based on vector spaces over finite fields.

Then the number of subspaces of V with dimension k equals the q-binomial coefficient

\binom{n}{k}_q \equiv \frac{[n]_q!}{[k]_q! [n-k]_q!}

mentioned in the previous post. Here [n]q! is the q-factorial of n, defined by

[n]_q! \equiv [1]_q [2]_q [3]_q \cdots [n]_q

and [n]q! is the q-analog of n, defined by

[n]_q \equiv \frac{1 - q^n}{1-q} = 1 + q + q^2 + \cdots q^{n-1}

The fraction definition holds for all n, integer or not, when q ≠ 1. The fraction equals the polynomial in q when n is a non-negative integer.

You can derive the expression for the number of subspaces directly from a combinatorial argument, not using any of the q-analog notation, but this notation makes things much more succinct.

Python code

We can easily code up the definitions above in Python.

    def analog(n, q):
        return (1 - q**n)//(1 - q)

    def qfactorial(n, q):
        p = 1
        for k in range(1, n+1):
            p *= analog(k, q)
        return p

    def qbinomial(n, k, q):
        d = qfactorial(k, q)*qfactorial(n-k, q)
        return qfactorial(n, q)/d

Example

To keep things small, let’s look at the field with q = 3 elements. Here addition and multiplication are carried out mod 3.

Let n = 3 and k = 1. So we’re looking for one-dimensional subspaces of F³ where F is the field of integers mod 3.

A one-dimensional subspace of vector space consists of all scalar multiples of a vector. We can only multiply a vector by 0, 1, or 2. Multiplying by 0 gives the zero vector, multiplying by 1 leaves the vector the same, and multiplying by 2 turns 1s into 2s and vice versa. So, for example, the vector (1, 2, 0) is a basis for the subspace consisting of

(0, 0, 0), (1, 2, 0}, (2, 1, 0).

We can find all the subspaces by finding a base for each subspace. And with a little brute force effort, here’s a list.

  • (1, 0, 0)
  • (0, 1, 0)
  • (0, 0, 1)
  • (0, 1, 1)
  • (1, 0, 1)
  • (1, 1, 0)
  • (1, 2, 0)
  • (0, 1, 2)
  • (2, 0, 1)
  • (1, 1, 1)
  • (2, 1, 1)
  • (1, 2, 1)
  • (1, 1, 2)

It’s easy to check that none of these vectors is a multiple mod 3 of another vector on the list. The theory above says we should expect to find 13 subspaces, and we have, so we must have found them all.

Related posts