Hilbert transform as an infinite matrix

The previous post linked to a post I wrote a few years ago about the Hilbert transform and Fourier series. That post says that if the Fourier series of a function is

f(t) = \sum_{n=1}^\infty \left\{ a_n \sin(nt) + b_n\cos(nt) \right\}

then the Fourier series of its Hilbert transform is

f_H(x) = \sum_{n=1}^\infty \left\{ -b_n \sin(nx) + a_n\cos(nx) \right\}

When I looked back at that post I thought about how if you thought of the Fourier coefficients as elements of an infinite vector then the Hilbert transform can be represented as multiplying by an infinite block matrix.

\left[ \begin{array}{cc|cc|cc|c} 0 & -1 & 0 & 0 & 0 & 0 & \cdots \\ 1 & 0 & 0 & 0 & 0 & 0 & \cdots \\ \hline 0 & 0 & 0 & -1 & 0 & 0 & \cdots \\ 0 & 0 & 1 & 0 & 0 & 0 & \cdots \\ \hline 0 & 0 & 0 & 0 & 0 & -1 & \cdots \\ 0 & 0 & 0 & 0 & 1 & 0 & \cdots \\ \hline \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{array} \right] \left[ \begin{array}{c} a_1 \\ b_1 \\ \hline a_2 \\ b_2 \\ \hline a_3 \\ b_3 \\ \hline \vdots \end{array} \right]

I rarely see infinite matrices except in older math books. Apparently they were more fashionable a few decades ago than they are now. I suppose the notation falls between two stools, too concrete for some tastes and not concrete enough for others. The former folks would prefer something like H and the latter would prefer the sum above.

Inverse shift

What is the inverse of shifting a sequence to the right? Shifting it to the left, obviously.

But wait a minute. Suppose you have a sequence of eight bits

abcdefgh

and you shift it to the right. You get

0abcdefg.

If you shift this sequence to the left you get

abcdefg0

You can’t recover the last element h because the right-shift destroyed information about h.

A left-shift doesn’t fully recover a right-shift, and yet surely left shift and right shift are in some sense inverses.

Yesterday I wrote a post about representing bit manipulations, including shifts, as matrix operators. The matrix corresponding to shifting right by k bits has 1s on the kth diagonal above the main diagonal and 0s everywhere else. For example, here is the matrix for shifting an 8-bit number right two bits. A black square represents a 1 and a white square represents a 0.

This matrix isn’t invertible. When you’d like to take the inverse of a non-invertible matrix, your knee jerk response should be to compute the pseudoinverse. (Technically the Moore-Penrose pseudoinverse. There are other pseudoinverses, but Moore-Penrose is the most common.)

As you might hope/expect, the pseudoinverse of a right-shift matrix is a left-shift matrix. In this case the pseudoinverse is simply the transpose, though of course that isn’t always the case.

If you’d like to prove that the pseudoinverse of a matrix that shifts right by k places is a matrix that shifts left by k places, you don’t have to compute the pseudo inverse per se: you can verify your guess. This post gives four requirements for a pseudoinverse. You can prove that left shift is the inverse of right shift by showing that it satisfies the four equations.

Probability that a random binary matrix is invertible

The two latest posts have involved invertible matrices with 0 and 1 entries. If you fill an n × n matrix with 0s and 1s randomly, how likely is it to be invertible?

What kind of inverse?

There are a couple ways to find the probability that a binary matrix is invertible, depending on what you mean by the inverse.

Suppose you have a matrix M filled with 0s and 1s and you’re looking for a matrix N such that MN is the identity matrix. Do you want the entries of N to also be 0s and 1s? And when you multiply the matrices, are you doing ordinary integer arithmetic or are you working mod 2?

In the previous posts we were working over GF(2), the field with two elements, 0 and 1. All the elements of a matrix are either 0 or 1, and arithmetic is carried out mod 2. In that context there’s a nice expression for the probability a square matrix is invertible.

If you’re working over the real numbers, the probability of binary matrix being invertible is higher. One way to see this is that the inverse of a binary matrix is allowed to be binary but it isn’t required to be.

Another way to see this is to look at determinants. If you think of a matrix M as a real matrix whose entries happen to only be 0 or 1, M is invertible if and only its determinant is non-zero. But if you think of M as a matrix over GF(2), the entries are either 0 or 1 out of necessity, and M is invertible if and only if its determinant, computed in GF(2), is non-zero. If the determinant of M as a real matrix is a non-zero even number, then M is invertible as a real matrix but not as a matrix over GF(2).

Probability of invertibility in GF(2)

Working over GF(2), what is the probability that a random matrix is invertible? Turns out it’s just as easy to answer a more general question: what is the probability that a random n × n matrix over GF(q), a finite field with q elements, is invertible? This is

\prod_{i=1}^n\left( 1 - \frac{1}{q^i} \right)

When q = 2 and n = 8 this probability is 0.289919. The probability is roughly the same for all larger values of n, converging to approximately 0.288788 as n → ∞.

Probability of invertibility in ℝ

What is the probability that an 8 × 8 matrix with random 0 and 1 entries is invertible as a real matrix? We can estimate this by simulation.

import numpy as np

def simulate_prob_invertible_real(n, numreps=1000):
    s = 0
    for _ in range(numreps):
        M = np.random.randint(0, 2, size=(n, n))
        det = np.linalg.det(M)
        if abs(det) > 1e-9:
            s += 1
    return s/numreps

When n = 8, I got 0.5477 when running the code with 10,000 reps.

When n = 32, I got a probability of 1. Obviously it is possible for a 32 × 32 binary matrix to be singular, but it’s very unlikely: it didn’t happen in 10,000 random draws.

The linear algebra of bit twiddling

The previous post looked at the tempering step of the Mersenne Twister, formulating a sequence of bit operations as multiplication by a matrix mod 2. This post will look at the components more closely.

The theorems of linear algebra generally hold independent of the field of scalars. Typically the field is ℝ or ℂ, but most of basic linear algebra works the same over every field [1]. In particular, we can do linear algebra over a finite field, and we’re interested in the most finite of finite fields GF(2), the field with just two elements, 0 and 1.

In GF(2), addition corresponds to XOR. We will denote this by ⊕ to remind us that although it’s addition, it’s not the usual addition, i.e. 1 ⊕ 1 = 0. Similarly, multiplication corresponds to AND. We’ll work with 8-bit numbers to make the visuals easier to see.

Shifting a number left one bit corresponds to multiplication by a matrix with 1’s below the diagonal main. Shifting left by k bits is the same as shifting left by 1 bit k times, so the the matrix representation for x << k is the kth power of the matrix representation of shifting left once. This matrix has 1s on the kth diagonal below the main diagonal. Below is the matrix for shifting left two bits, x << k.

Right shifts are the mirror image of left shifts. Here’s the matrix for shifting right two bits, x >> k.

Shifts are not fully invertible because bits either fall off the left or the right end. The steps in the Mersenne Twister are invertible because shifts are always XOR’d with the original argument. For example, although the function that takes x to x >> 2 is not invertible, the function that takes x to x ⊕ (x >> 2) is invertible. This operation corresponds to the matrix below.

This is an upper triangular matrix, so its determinant is the product of the diagonal elements. These are all 1s, so the determinant is 1, and the matrix is invertible.

Bitwise AND multiplies each bit of the input by the corresponding bit in another number known as the mask. The bits aligned with a 1 are kept and the bits aligned with a 0 are cleared. This corresponds to multiplying by a diagonal matrix whose diagonal elements correspond to the bits in the mask. For example, here is the matrix that corresponds to taking the bitwise AND with 10100100.

Each of the steps in the Mersenne Twister tempering process are invertible because they all correspond to triangular matrices with all 1’s on the diagonal. For example, the line

y ^= (y <<  7) & 0x9d2c5680 

says to shift the bits of y left 7 places, then zero out the elements corresponding to 0s in the mask, then XOR the result with y. In matrix terms, we multiply by a lower triangular matrix with zeros on the main diagonal, then multiply by a diagonal matrix that zeros out some of the terms, then add the identity matrix. So the matrix corresponding to the line of code above is lower triangular, with all 1s on the diagonal, so it is invertible.

[1] Until you get to eigenvalues. Then it matters whether the field is algebraically complete, which no finite field is.

Reverse engineering Mersenne Twister with Linear Algebra

The Mersenne Twister (MT) is a random number generator with good statistical properties but bad cryptographic properties. In buzzwords, it’s a PRNG but not a CSPRNG.

This post will show how the internal state of a MT generator can be recovered from its output. We’ll do this using linear algebra. The bit twiddling approach is more common and more efficient, but the linear algebra approach is conceptually simpler.

How MT works

There are numerous variations on the Mersenne Twister. We’ll focus on the original version that had internal state consists of vector x of length 640 filled with 32-bit numbers. The ideas in this post would apply equally to all MT versions.

The first call to MT returns a “tempered” version of x[0]. The next call returned a tempered version of x[1], and so on. After every 640 calls, the state is scrambled. This scrambling is where the “twist” in the name Mersenne Twister comes from. (The Mersenne part comes from the fact that the period of an MT generator is a Mersenne prime.)

Tempering

Here is Python code for performing the tempering step.

def temper(y):
    y ^= (y >> 11) 
    y ^= (y <<  7) & 0x9d2c5680 
    y ^= (y << 15) & 0xefc60000 
    y ^= (y >> 18)  
    return y

Each step is reversible, and so the temper function is reversible.

Because the tempering step is reversible, the first output can be used to infer the first element of the internal state, the second output to infer the second element, and so on. After 640 calls one can know the entire internal state and predict the rest of the generator’s output from then on.

Linear algebra

The bitwise operations above all correspond to linear operations over GF(2), the field with just two elements, 0 and 1. Addition in this field is XOR and multiplication is AND.

Each step corresponds to multiplying a vector of 32 bits on the left by a 32 × 32 matrix with entries that are 0’s and 1’s, with the understanding that the sum of two bits is their XOR and the product of two bits is their AND. Equivalently, arithmetic is carried out mod 2. So you can compute the matrix-vector product as ordinary integers if you then reduce every integer mod 2.

We will find the matrix M that corresponds to the temper operation and prove that it is invertible by finding its inverse. This proves that tempering is invertible, and one could compute the inverse of tempering by multiplying by M−1, though it would be more efficient to undo tempering by bit twiddling.

One way to recover a matrix is to multiplying by unit vectors ei where the ith component of ei is 1 and the rest of the components are zero. Then

M ei

is the ith column of M.

So we can find the nth column of M by tempering 1 << n = 2n.

M = np.zeros((32, 32), dtype=int)
for i in range(32):
    t = temper(1 << (31-i))
    s = f"{t:032b}"
    for j in range(32):
        M[j, i] = int(s[j])

Let’s generate a random number and compute its tempered form two ways: directly and matrix multiplication.

x = random.getrandbits(32)
y = temper(x)
print(f"{y:032b}")

vx = np.array([int(b) for b in f"{x:032b}"]) # vector form of x
vy = M @ vx % 2 # vector form of y
print("".join(str(b) for b in vy))

Both produce the same bits:

10100101100101101100110101000110
10100101100101101100110101000110

We can find the matrix representation of the untemper function by inverting the matrix M. However, we need to invert it over the field GF(2), not over the integers or reals.

import galois
GF2 = galois.GF(2)
Minv = np.linalg.inv(GF2(M))

Here are visualizations of M and its inverse using a black square for a 1 and a white square for a 0.

M:

M−1:

The next post will back up and look at the linear algebra of the components that comprise the Mersenne Twister.

Aligning one matrix with another

Suppose you have two n × n matrices, A and B, and you would like to find a rotation matrix Ω that lines up B with A. That is, you’d like to find Ω such that

A = ΩB.

This is asking too much, except in the trivial case of A and B being 1 × 1 matrices. You could view the matrix equation above as a set of n² equations in real numbers, but the space of orthogonal matrices only has n(n − 1) degrees of freedom [1].

When an equation doesn’t have an exact solution, the next best thing is to get as close as possible to a solution, typically in a least squares sense. The orthogonal Procrustes problem is to find an orthogonal matrix Ω minimizing the distance between A and ΩB That is, we want to minimize

|| A − ΩB ||

subject to the constraint that Ω is orthogonal. The matrix norm used in this problem is the Frobenius norm, the sum of the squares of the matrix entries. The Frobenius norm is the 2-norm if we straighten the matrices into vectors of dimension n².

Peter Schönemann found a solution to the orthogonal Procrustes problem in 1964. His solution involves singular value decomposition (SVD). This shouldn’t be surprising since SVD solves the problem of finding the closest thing to an inverse of an non-invertible matrix. (More on that here.)

Schönemann’s solution is to set MABT and find its singular value decomposition

M = UΣVT.

Then

Ω = UVT.

Python code

The following code illustrates solving the orthogonal Procrustes problem for random matrices.

import numpy as np

n = 3

# Generate random n x n matrices A and B
rng = np.random.default_rng(seed=20260211) 
A = rng.standard_normal((n, n))
B = rng.standard_normal((n, n))

# Compute M = A * B^T
M = A @ B.T

# SVD: M = U * Sigma * V^T
U, s, Vt = np.linalg.svd(M, full_matrices=False)

# R = U * V^T
R = U @ Vt

# Verify that R * R^T is very nearly the identity matrix
print("||R^T R - I||_F =", np.linalg.norm(R.T @ R - np.eye(n), ord="fro"))

In this example the Frobenius norm between RRT and I is 4 × 10−16, so essentially RRT = I to machine precision.

Related posts

[1] Every column of an orthogonal matrix Ω must have length 1, so that gives n constraints. Furthermore, each pair of columns must be orthogonal, which gives n choose 2 more constraints. We start with Ω having n² degrees of freedom, but then remove n and then n(n − 1)/2 degrees of freedom.

n² − nn(n − 1)/2 = n(n − 1)/2

Representing octonions as matrices, sorta

It’s possible to represent complex numbers as a pair of real numbers or 2 × 2 matrices with real entries.

z \leftrightarrow (a, b) \leftrightarrow \begin{bmatrix}a & -b \\ b & a \end{bmatrix}

And it’s possible to represent quaternions as pairs of complex numbers or 2 × 2 matrices with complex entries

q \leftrightarrow (z_0, z_1) \leftrightarrow \begin{bmatrix} z_0 & z_1 \\ -z_1^* & z_0^* \end{bmatrix}

were z* is the complex conjugate of z.

And it’s also possible to represent octonions as pairs of quaternions or 2 × 2 matrices with quaternion entries, with a twist.

o \leftrightarrow (q_0, q_1) \leftrightarrow \begin{bmatrix} q_0 & q_1 \\ -q_1^* & q_0^* \end{bmatrix}

where q* is the quaternion conjugate of q.

Matrix multiplication is associative, but octonion multiplication is not, so something has to give. We have to change the definition of matrix multiplication slightly.

\begin{bmatrix} \alpha_0 & \alpha_1 \\ \alpha_2 & \alpha_3 \end{bmatrix}\circ\begin{bmatrix} \beta_0 & \beta_1 \\ \beta_2 & \beta_3 \end{bmatrix}=\begin{bmatrix} \alpha_0\beta_0+\beta_2\alpha_1 & \beta_1\alpha_0+\alpha_1\beta_3\\ \beta_0\alpha_2+\alpha_3\beta_2 & \alpha_2\beta_1+\alpha_3\beta_3 \end{bmatrix}

In half the products, the beta term comes before the alpha term. This wouldn’t matter if the alpha and beta terms commuted, e.g. if they were complex numbers this would be ordinary matrix multiplication. But the alphas and betas are quaternions, and so order matters, and the matrix product defined above is not the standard matrix product.

Going back to the idea of matrices of matrices that I wrote about a few days ago, we could represent the octonions as 2 × 2 matrices whose entries are 2 × 2 matrices of complex numbers, etc.

If you look closely at the matrix representations above, you’ll notice that the matrix representations of quaternions and octonions doesn’t quite match the pattern of the complex numbers. There should be a minus sign in the top right corner and not in the bottom left corner. You could do it that way, but there’s a sort of clash of conventions going on here.

Drazin pseudoinverse

The most well-known generalization of the inverse of a matrix is the Moore-Penrose pseudoinverse. But there is another notion of generalized inverse, the Drazin pseudoinverse, for square matrices.

If a matrix A has an inverse A−1 then it also has a Moore-Penrose pseudoinverse A+ and a Drazin pseudoinverse AD and

A−1 = A+ = AD.

When the matrix A is not invertible the two notions of pseudoinverse might not agree.

One way to think about the two generalized inverses is that both are based on a canonical form. The Moore-Penrose pseudoinverse is based on the singular value decomposition. The Drazin inverse is based on the Jordan canonical form.

Jordan canonical form

Every square matrix A can be put into Jordan canonical form

P−1 A PJ

where J is the Jordan canonical form and P is an invertible matrix whose columns are the generalized eigenvectors of A.

If the matrix A is diagonalizable, the matrix J is diagonal. Otherwise the matrix J is block diagonal. Each block has an eigenvalue on the diagonal; the size of the block equals the multiplicity of the eigenvalue. There are 1s on the diagonal above the main diagonal and zeros everywhere else.

If all the eigenvalues of A are non-zero, the matrix is invertible and it is easy to compute the inverse from the Jordan form. To invert J, simply invert each block of J. This is clear from the way you multiply partitioned matrices.

To find the inverse of A, multiply on the left by P and on the right by P−1. That is,

A−1 = P J−1 P−1.

Proof:

A (P J−1 P−1) = (P J P−1)(P J−1 P−1) = I

Drazin pseudoinverse

One way to compute the Drazin inverse, at least in theory, is from the Jordan canonical form. In practice, you might want to compute the Drazin inverse a different way.

If all the eigenvalues of A are non-zero, A is invertible and its Drazin pseudoinverse is its inverse.

If zero is an eigenvalue, the block(s) in J corresponding to 0 cannot be inverted. If a block is 1 by 1 then it just has a zero in it. If a block is larger, it will have 1’s above the main diagonal. Replace the 1s by 0s. This gives us the Drazin pseudoinverse of J. To obtain the pseudoinverse of J, multiply on the left by P and on the right by P−1 just as above.

Categorization

The Drazin pseudoinverse can be categorized by a few properties just as the Moore-Penrose pseudoinverse. Let’s compare the properties.

The defining properties of the Moore-Penrose pseudoinverse A+ of a matrix A are

  1. A A+ A = A
  2. A+ A A+ = A+
  3. (A A+)* = A A+
  4. (A+ A)* = A+ A

The defining properties of the Drazin pseudoinverse are

  1. A ADAD A
  2. AD A AD = AD
  3. Ak+1 ADA Ak

where k is the smallest non-negative integer such that rank(Ak + 1) = rank(Ak).

The second properties of each pseudoinverse are analogous. The first property of Drazin inverse is sorta like the 3rd and 4th properties of the Moore-Penrose pseudoinverse.

The analog of A A+ A = A does not hold; in general A AD A does not equal A.

The Drazin property Ak+1 ADA Ak is nothing like any property for Moore-Penrose. It suggests that Drazin inverses behave nicely when you raise them to exponents, which is true.

You can see this exponent condition in the Jordan normal form. The blocks in J corresponding to a 0 eigenvector are nilpotent, i.e. they become zero when you raise them to a high enough power. When you raise J to some power k, the same k as above, the nilpotent blocks go away.

Note that although the Moore-Penrose pseudoinverse is symmetric, the Drazin inverse is not. That is (A+)+ = A but (AD)DA in general.

Related posts

Effective graph resistance

I’ve mentioned the Moore-Penrose pseudoinverse of a matrix a few times, most recently last week. This post will give an application of the pseudoinverse: computing effective graph resistance.

Given a graph G, imagine replacing each edge with a one Ohm resistor. The effective resistance between two nodes in G is the electrical resistance between those the two nodes.

Calculating graph resistance requires inverting the graph Laplacian, but the graph Laplacian isn’t invertible. We’ll resolve this paradox shortly, but in the mean time this sounds like a job for the pseudoinverse, and indeed it is.

The graph Laplacian of a graph G is the matrix L = D − A where D is the diagonal matrix whose entries are the degrees of each node and A is the adjacency matrix. The vector of all 1’s

1 = (1, 1, 1, …, 1)

is in the null space of L but if G is connected then the null space is one dimensional. More generally, the dimension of the null space equals the number of components in G.

Calculating graph resistance requires solving the linear system

Lw = v

where v and w are orthogonal to 1. If the graph G is connected then the system has a unique solution. The graph Laplacian L is not invertible over the entire space, but it is invertible in the orthogonal complement to the null space.

The Moore-Penrose pseudoinverse L+ gives the graph resistance matrix. The graph resistance between two nodes a and b is given by

Rab = (eaeb)T L+ (eaeb)

where ei is the vector that has all zero entries except for a 1 in the ith slot.

For more on graph resistance see Effective graph resistance by Wendy Ellens et al.

More spectral graph theory posts

Multiplying a matrix by its transpose

An earlier post claimed that there practical advantages to partitioning a matrix, thinking of the matrix as a matrix of matrices. This post will give an example.

Let M be a square matrix and suppose we need to multiply M by its transpose MT. We can compute this product faster than multiplying two arbitrary matrices of the same size by exploiting the fact that MMT will be a symmetric matrix.

We start by partitioning M into four blocks

M = \begin{bmatrix}A & B \\ C & D \end{bmatrix}

Then

M^\intercal = \begin{bmatrix}A^\intercal & C^\intercal \\ B^\intercal & D^\intercal \end{bmatrix}

and

MM^\intercal = \begin{bmatrix} AA^\intercal  + BB^\intercal & AC^\intercal  + BD^\intercal \\CA^\intercal + DB^\intercal & CC^\intercal + DD^\intercal \end{bmatrix}

Now for the first clever part: We don’t have to compute both of the off-diagonal blocks because each is the transpose of the other. So we can reduce our calculation by 25% by not calculating one of the blocks, say the lower left block.

And now for the second clever part: apply the same procedure recursively. The diagonal blocks in MMT involve a matrix times its transpose. That is, we can partition A and use the same idea to compute AAT and do the same for BBT, CCT, and DDT. The off diagonal blocks require general matrix multiplications.

The net result is that we can compute MMT in about 2/3 the time it would take to multiply two arbitrary matrices of the same size.

Recently a group of researchers found a way to take this idea even further, partitioning a matrix into a 4 by 4 matrix of 16 blocks and doing some clever tricks. The RXTX algorithm can compute MMT in about 26/41 the time required to multiply arbitrary matrices, a savings of about 5%. A 5% improvement may be significant if it appears in the inner loop of a heavy computation. According to the authors, “The algorithm was discovered by combining Machine Learning-based search methods with Combinatorial Optimization.”

Related posts