Probability that a random binary matrix is invertible

The two latest posts have involved invertible matrices with 0 and 1 entries. If you fill an n × n matrix with 0s and 1s randomly, how likely is it to be invertible?

What kind of inverse?

There are a couple ways to find the probability that a binary matrix is invertible, depending on what you mean by the inverse.

Suppose you have a matrix M filled with 0s and 1s and you’re looking for a matrix N such that MN is the identity matrix. Do you want the entries of N to also be 0s and 1s? And when you multiply the matrices, are you doing ordinary integer arithmetic or are you working mod 2?

In the previous posts we were working over GF(2), the field with two elements, 0 and 1. All the elements of a matrix are either 0 or 1, and arithmetic is carried out mod 2. In that context there’s a nice expression for the probability a square matrix is invertible.

If you’re working over the real numbers, the probability of binary matrix being invertible is higher. One way to see this is that the inverse of a binary matrix is allowed to be binary but it isn’t required to be.

Another way to see this is to look at determinants. If you think of a matrix M as a real matrix whose entries happen to only be 0 or 1, M is invertible if and only its determinant is non-zero. But if you think of M as a matrix over GF(2), the entries are either 0 or 1 out of necessity, and M is invertible if and only if its determinant, computed in GF(2), is non-zero. If the determinant of M as a real matrix is a non-zero even number, then M is invertible as a real matrix but not as a matrix over GF(2).

Probability of invertibility in GF(2)

Working over GF(2), what is the probability that a random matrix is invertible? Turns out it’s just as easy to answer a more general question: what is the probability that a random n × n matrix over GF(q), a finite field with q elements, is invertible? This is

\prod_{i=1}^n\left( 1 - \frac{1}{q^i} \right)

When q = 2 and n = 8 this probability is 0.289919. The probability is roughly the same for all larger values of n, converging to approximately 0.288788 as n → ∞.

Probability of invertibility in ℝ

What is the probability that an 8 × 8 matrix with random 0 and 1 entries is invertible as a real matrix? We can estimate this by simulation.

import numpy as np

def simulate_prob_invertible_real(n, numreps=1000):
    s = 0
    for _ in range(numreps):
        M = np.random.randint(0, 2, size=(n, n))
        det = np.linalg.det(M)
        if abs(det) > 1e-9:
            s += 1
    return s/numreps

When n = 8, I got 0.5477 when running the code with 10,000 reps.

When n = 32, I got a probability of 1. Obviously it is possible for a 32 × 32 binary matrix to be singular, but it’s very unlikely: it didn’t happen in 10,000 random draws.

The linear algebra of bit twiddling

The previous post looked at the tempering step of the Mersenne Twister, formulating a sequence of bit operations as multiplication by a matrix mod 2. This post will look at the components more closely.

The theorems of linear algebra generally hold independent of the field of scalars. Typically the field is ℝ or ℂ, but most of basic linear algebra works the same over every field [1]. In particular, we can do linear algebra over a finite field, and we’re interested in the most finite of finite fields GF(2), the field with just two elements, 0 and 1.

In GF(2), addition corresponds to XOR. We will denote this by ⊕ to remind us that although it’s addition, it’s not the usual addition, i.e. 1 ⊕ 1 = 0. Similarly, multiplication corresponds to AND. We’ll work with 8-bit numbers to make the visuals easier to see.

Shifting a number left one bit corresponds to multiplication by a matrix with 1’s below the diagonal main. Shifting left by k bits is the same as shifting left by 1 bit k times, so the the matrix representation for x << k is the kth power of the matrix representation of shifting left once. This matrix has 1s on the kth diagonal below the main diagonal. Below is the matrix for shifting left two bits, x << k.

Right shifts are the mirror image of left shifts. Here’s the matrix for shifting right two bits, x >> k.

Shifts are not fully invertible because bits either fall off the left or the right end. The steps in the Mersenne Twister are invertible because shifts are always XOR’d with the original argument. For example, although the function that takes x to x >> 2 is not invertible, the function that takes x to x ⊕ (x >> 2) is invertible. This operation corresponds to the matrix below.

This is an upper triangular matrix, so its determinant is the product of the diagonal elements. These are all 1s, so the determinant is 1, and the matrix is invertible.

Bitwise AND multiplies each bit of the input by the corresponding bit in another number known as the mask. The bits aligned with a 1 are kept and the bits aligned with a 0 are cleared. This corresponds to multiplying by a diagonal matrix whose diagonal elements correspond to the bits in the mask. For example, here is the matrix that corresponds to taking the bitwise AND with 10100100.

Each of the steps in the Mersenne Twister tempering process are invertible because they all correspond to triangular matrices with all 1’s on the diagonal. For example, the line

y ^= (y <<  7) & 0x9d2c5680 

says to shift the bits of y left 7 places, then zero out the elements corresponding to 0s in the mask, then XOR the result with y. In matrix terms, we multiply by a lower triangular matrix with zeros on the main diagonal, then multiply by a diagonal matrix that zeros out some of the terms, then add the identity matrix. So the matrix corresponding to the line of code above is lower triangular, with all 1s on the diagonal, so it is invertible.

[1] Until you get to eigenvalues. Then it matters whether the field is algebraically complete, which no finite field is.

Reverse engineering Mersenne Twister with Linear Algebra

The Mersenne Twister (MT) is a random number generator with good statistical properties but bad cryptographic properties. In buzzwords, it’s a PRNG but not a CSPRNG.

This post will show how the internal state of a MT generator can be recovered from its output. We’ll do this using linear algebra. The bit twiddling approach is more common and more efficient, but the linear algebra approach is conceptually simpler.

How MT works

There are numerous variations on the Mersenne Twister. We’ll focus on the original version that had internal state consists of vector x of length 640 filled with 32-bit numbers. The ideas in this post would apply equally to all MT versions.

The first call to MT returns a “tempered” version of x[0]. The next call returned a tempered version of x[1], and so on. After every 640 calls, the state is scrambled. This scrambling is where the “twist” in the name Mersenne Twister comes from. (The Mersenne part comes from the fact that the period of an MT generator is a Mersenne prime.)

Tempering

Here is Python code for performing the tempering step.

def temper(y):
    y ^= (y >> 11) 
    y ^= (y <<  7) & 0x9d2c5680 
    y ^= (y << 15) & 0xefc60000 
    y ^= (y >> 18)  
    return y

Each step is reversible, and so the temper function is reversible.

Because the tempering step is reversible, the first output can be used to infer the first element of the internal state, the second output to infer the second element, and so on. After 640 calls one can know the entire internal state and predict the rest of the generator’s output from then on.

Linear algebra

The bitwise operations above all correspond to linear operations over GF(2), the field with just two elements, 0 and 1. Addition in this field is XOR and multiplication is AND.

Each step corresponds to multiplying a vector of 32 bits on the left by a 32 × 32 matrix with entries that are 0’s and 1’s, with the understanding that the sum of two bits is their XOR and the product of two bits is their AND. Equivalently, arithmetic is carried out mod 2. So you can compute the matrix-vector product as ordinary integers if you then reduce every integer mod 2.

We will find the matrix M that corresponds to the temper operation and prove that it is invertible by finding its inverse. This proves that tempering is invertible, and one could compute the inverse of tempering by multiplying by M−1, though it would be more efficient to undo temporing by bit twiddling.

One way to recover a matrix is to multiplying by unit vectors ei where the ith component of ei is 1 and the rest of the components are zero. Then

M ei

is the ith column of M.

So we can find the nth column of M by tempering 1 << n = 2n.

M = np.zeros((32, 32), dtype=int)
for i in range(32):
    t = temper(1 << (31-i))
    s = f"{t:032b}"
    for j in range(32):
        M[j, i] = int(s[j])

Let’s generate a random number and compute its tempered form two ways: directly and matrix multiplication.

x = random.getrandbits(32)
y = temper(x)
print(f"{y:032b}")

vx = np.array([int(b) for b in f"{x:032b}"]) # vector form of x
vy = M @ vx % 2 # vector form of y
print("".join(str(b) for b in vy))

Both produce the same bits:

10100101100101101100110101000110
10100101100101101100110101000110

We can find the matrix representation of the untemper function by inverting the matrix M. However, we need to invert it over the field GF(2), not over the integers or reals.

import galois
GF2 = galois.GF(2)
Minv = np.linalg.inv(GF2(M))

Here are visualizations of M and its inverse using a black square for a 1 and a white square for a 0.

M:

M−1:

The next post will back up and look at the linear algebra of the components that comprise the Mersenne Twister.

Calculating curvature

Curvature is conceptually simple but usually difficult to calculate. For a level set curve f(xy) = c, such as in the previous couple posts, the equation for curvature is

\kappa = \frac{\left|f_y^2f_{xx}-2f_xf_yf_{xy}+f_x^2f_{yy}\right|}{\bigl(f_x^2+f_y^2\bigr)^{3/2}}

Even when f has a fairly simple expression, the expression for κ can be complicated.

If we define

f(x, y) = y^3 - 3 x^2 - 3 y^2 - 3 x^2 y

then the level set of f(xy) = c is an equilateral triangle when c = −4. The level sets are smoothed triangles for −4 < c < 0.

The curvature of these level sets at any point is given by

\frac{\left|2 (y+1) \left((y-2)^2-3 x^2\right) \left(x^2+y^2\right)\right|}{\left(x^4+2 x^2 (y (y+6)+2)+(y-2)^2 y^2\right)^{3/2}}

Simplification

But there is one instance in which curvature is easy to calculate. For the graph of a function g(x) = y, the curvature is approximately the absolute value of the second derivative of g, provided the first derivative is small.

 \kappa = \frac{|g^{\prime\prime}(x)|}{(1 + g^\prime(x))^{3/2}} \approx |g^{\prime\prime}(x)|

At a local maximum or local minimum of g(x) the approximation is exact because the first derivative is zero.

Max and min curvature of smoothed triangles

This means that in the example above, we can calculate the maximum and minimum curvature of the level sets. The maximum curvature occurs in the corners and the minimum occurs in the middle of the sides. By symmetry there are three maxima and three minima, but we can take the ones corresponding to x = 0 for convenience. There we find the curvature is simply

|6 + 6y|

When x = 0, we have

f(x,y) = c = y^3 - 3y^2

and so the maximum and minimum curvature are the two roots of the cubic equation for y that lie in the interval [−1, 2]. (There’s another root greater than 2.)

For example, when c = −3, the roots are 0.8794, 1.3473, and 2.5321. The first root corresponds to the minimum curvature, the second to the maximum, and the third is outside our region of interest. It follows that the minimum curvature is 0.7237 and the maximum is 14.0838.

When c = −1 the minimum and maximum curvature are 2.80747 and 9.91622 respectively,

Since c = −4 corresponds to the triangle, the minimum curvature is 0 and the maximum is infinite. As c increases, the minimum and maximum curvature come together because the level set is becoming more round.

Smoothed polygons

The previous post constructed a triangular analog of the squircle, the unit circle in the p-norm where p is typically around 4. The case p = 2 is a Euclidean circle and the limit as p → ∞ is a Euclidean square.

The previous post introduced three functions Li(xy) such that the level set of each function

\{(x,y) \mid L_i(x,y) = 1\}

forms a side of a triangle. Then it introduced a soft penalty for each L being away 1, and the level sets of that penalty function formed the rounded triangles we were looking for.

Another approach would be to change the L‘s slightly so that the sides are the levels sets Li(xy) = 0. The advantage to this formulation is that the product of three numbers is 0 if and only if one of the numbers is zero. That means if we define

f(x, y) = L_1(x, y) L_2(x, y) L_3(x, y)
then the set of points

\{(x, y) \mid f(x,y) = c\}

corresponds to the three lines when c = 0 and the level sets for small c > 0 are nearly triangles. The level sets will be smooth if the gradient is non-zero, i.e. c is not zero.

This approach is not unique to triangles. You could create smooth approximations any polygon by multiplying linear functions that define the sides. Or you could do something similar with curved arcs.

If we define our L’s by

\begin{align*} L_1(x,y) &= y\\ L_2(x,y) &= y - 2x - 2 \\ L_3(x,y) &= 3y + x -3\\ \end{align*}

then our curves will be the level sets of

f(x, y) = y(y - 2 x - 2)  (3 y + x - 3)

A few level sets are shown below. The level set for c = 0 is the straight lines.

Note the level sets are not connected. If you’re interested in approximate triangles, you want the components that are inside the triangle.

 

Triangular analog of the squircle

TimF left a comment on my guitar pick post saying the image was a “squircle-ish analog for an isosceles triangle.” That made me wonder what a more direct analog of the squircle might be for a triangle.

A squircle is not exactly a square with rounded corners. The sides are continuously curved, but curved most at the corners. See, for example, this post.

Suppose the sides of our triangle are given by L1(x, y) = 1 for i = 1, 2, 3. For example,

\begin{align*}
L_1(x, y) &= y \\
L_2(x, y) &= \phantom{-}\frac{\sqrt{3}}{2}x - \frac{1}{2}y \\
L_3(x, y) &= -\frac{\sqrt{3}}{2}x - \frac{1}{2}y \\
\end{align*}

We design a function f(x, y) as a soft penalty for points not being on one of the sides and look at the set of points f(x, y) = 1.

f(x, y) = \left( \sum_{i=1}^3 \exp(p (L_i(x, y) - 1)) \right)^{1/p}

You might recognize this as a Lebesgue norm, analogous to the one used to define a squircle.

The larger p is, the heavier the penalty for being far from a side and the closer the level set f(x, y) = 1 comes to being a triangle.

Unified config files

I try to maintain a consistent work environment across computers that I use. The computers differ for important reasons, but I’d rather they not differ for unimportant reasons.

Unified keys

One thing I do is remap keys so that the same key does the same thing everywhere, to the extent that’s practical. This requires remapping keys. In particular, I want the key functionality, not the key name, to be the same across operating systems. For example, the Command key on a Mac does what the Control key does on Windows and Linux. I have my machines set up so the key in the lower left corner, whatever you call it, does things like copy, paste, and cut.

Unified config files

I use bash everywhere even though zshell is the default on Mac OS. But Linux and MacOS are sufficiently different that I have two .bashrc files, one for each OS. However, both .bashrc files source a common file .bash_aliases for aliases. You can set that up by putting the following code in your .bashrc file.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

Sometimes you need some kind of branching logic even for two machines running the same OS. For example, if you have two computers, named Castor and Pollux, you might have code like the following in your .bashrc.

HOSTNAME_SHORT=$(hostname -s)
if [ "$HOSTNAME_SHORT" = "Castor" ]; then
...
elif [ "$HOSTNAME_SHORT" = "Pollux" ]; then
...

One problem with using hostname is that the OS can change the name on you. At least MacOS will do this; I don’t know whether other operating systems will. If you give your machine an uncommon name then this is unlikely to happen.

I have one init.el file for Emacs. It contains some branching logic for OS:

(cond
    ((string-equal system-type "windows-nt") ; Microsoft Windows
     ...)
    ((string-equal system-type "gnu/linux") ; linux
     ...)
    ((string-equal system-type "darwin") ; Mac
     ...)
)

You can also branch on hostname.

(if (string-equal (system-name) "Castor")
...)
(if (string-equal (system-name) "Pollux")
...)

This isn’t difficult, but in the short run it’s easier to just make one ad hoc edit to a config file at a time, letting the files drift apart. Along those lines, see bicycle skills.

The mythology of category theory

Yesterday a friend and I had a conversation about category theory, how it can be a useful pattern description language, but also about how people have unrealistic expectations for it, believing category theory can deliver something for nothing.

Later I ran across the following post from Qiaochu Yuan. It felt as if he had overheard my conversation and summarized it in a tweet:

category theory is just some straightforwardly useful stuff for some purposes in some fields! you can elegantly simplify and streamline some proofs. then there is the mythology of category theory, which is some other thing entirely, mostly wishful thinking and projection afaict

His phrase “the mythology of category theory” gives a name to this idea that category theory can deliver specific outputs without specific inputs. It helps to distinguish CT as a scrapbook of patterns from CT as sorcery.

Changing one character in a PDF

I saw a post on X saying

Changing a hyphen to an en-dash increases your PDF file size by ~10 bytes.

My first thought was that it had something to do with hyphen being an ASCII character and an en-dash not. Changing a hyphen to an en-dash would make a UTF-8 encoded text file a couple bytes longer. (See why here.) Maybe adding one non-ASCII character could cause the file to include a glyph it didn’t before.

I did a couple experiments. I made a minimal LaTeX file with only the text

    See pages 9-10.

and another with

    See pages 9--10.

(In LaTeX, a hyphen compiles to a hyphen and two hyphens compile to an en-dash.)

I compiled both files using pdflatex. The PDF with the hyphen was 13172 bytes and the one with the en-dash was 13099 bytes. Replacing the hyphen with the en-dash made the file 73 bytes smaller.

I repeated the experiment with a Libre Office ODT document. Changing the hyphen to an en-dash reduced the file size from 13548 bytes to 13514 bytes.

Then I went back to LaTeX and pasted a paragraph of lorem ipsum text into both files. Now the file with the hyphen produced a 18131 byte PDF and the file with the en-dash produced a 18203 byte PDF. So in this instance changing the hyphen to an en-dash increased the size of the file by 72 bytes.

After adding the lorem ipsum text to the ODT files, the PDF resulting from the file with the hyphen was 23 bytes larger, 17846 bytes versus 17823 bytes.

PDFs are complicated. Changing one character can make the file bigger or smaller, by an unpredictable amount. It depends, among other things, on what software was used to create the PDF. Even changing one letter in an all-ASCII text can change the size of the PDF, I suppose due to some internal text compression and aesthetic control characters. I don’t pretend to understand what’s going on inside a PDF.

Related posts

The shape of a guitar pick

I saw a post on X that plotted the function

(log x)² + (log y)² = 1.

Of course the plot of

x² + y² = 1

is a circle, but I never thought what taking logs would do to the shape.

Here’s what the contours look like setting the right hand side equal to 1, 2, 3, …, 10.

ContourPlot[Log[x]^2 + Log[y]^2, {x, 0, 10}, {y, 0, 10}, 
    Contours -> Range[10]]

The dark blue contour near the origin reminded me of a guitar pick, so I decided to take a stab at creating an equation for the shape of a guitar pick.

I wanted to rotate the image so the axis of symmetry for the pick is vertical, so I replaced x and y with xy and x − y.

The aspect ratio was too wide, so I experimented with

log(ykx)² + log(y − kx)² = r²

where increasing k increases the height-to-width ratio. After a little experimentation I settled on k = 1.5 and r = 1.

This has an aspect ratio of roughly 5:4, which is about what I measured from a photo of a guitar pick.

Updating: refining the fit

After posting this article on X, Paul Graham replied with a photo of a Fender guitar pick with the image above overlaid. The fit was fairly good, but the aspect ratio wasn’t quite right.

So then I did a little research. The shape referred to in this post is known as the “351,” but even for the 351 shape the aspect ratio varies slightly between picks.

Setting k = 1.6 gives a better fit to Paul Graham’s pick.

The blue line represents my fit using k = 1.5 and the red line represents my fit using k = 1.6.