Cyclic permutations and trace

The trace of a square matrix is the sum of the elements on its main diagonal.

The order in which you multiply matrices matters: in general, matrix multiplication is not commutative. But the trace of a product of matrices may or may not depend on the order of multiplication.

Specifically, trace doesn’t change if you take the last matrix and move it to the front. You can apply this repeatedly the get all cyclic permutations.

If you just have two matrices, A and B, then order doesn’t matter because there is only one permutation of two things, and it’s a cyclic permutation. That is trace(AB) = trace(BA).

But in general only cyclic permutations leave the trace unchanged. The following Python code illustrates this.

    import numpy as np

    N = 4

    A = np.random.rand(N, N)
    B = np.random.rand(N, N)
    C = np.random.rand(N, N)

    print( (A@B@C).trace() )
    print( (B@C@A).trace() )
    print( (C@A@B).trace() )
    print( (C@B@A).trace() )
    print( (B@A@C).trace() )
    print( (A@C@B).trace() )

This shows that ABC, BCA, and CAB all have the same trace (5.793 in this example) and CBA, BAC, and ACB all have the same trace (4.851).

In this case there were only two trace values: one for the forward sequence and its rotations, and one for the reversed sequence and its rotations. With more matrices there are more possibilities.

    D = np.random.rand(N, N)

    print( (A@B@C@D).trace() )
    print( (C@D@A@B).trace() )
    print( (C@D@B@A).trace() )
    print( (D@C@B@A).trace() )

Now we see that ABCD and CDBA have the same trace (12.632), because they are rotations of each other, but the other two permutations have difference traces (13.843 and 12.564).

If we’re working with symmetric matrices, then reversing the order doesn’t change the trace. Here’s a quick proof for the product of three symmetric matrices:

trace(ABC) = trace((ABC)T) = trace(CT BT AT) = trace(CBA)

So for three symmetric matrices, the trace is the same for all permutations.

The following code shows that the order of multiplication can change the trace, even for symmetric matrices, if you have four matrices.

    A += A.T
    B += B.T
    C += C.T
    D += D.T

    print( (C@D@A@B).trace() )
    print( (C@D@B@A).trace() )

The first statement prints 202.085 and the second 222.211.

Related posts

Trick for 2×2 eigenvalues

3Blue1Brown has a nice new video on how to calculate the eigenvalues of 2×2 matrices.

The most common way to find the eigenvalues of a 2×2 matrix A is working straight from the definition, solving det(A – λI) = 0.

This is fine when you’re learning what eigenvalues are. But if you’ve already learned all the theory and just want to calculate the eigenvalues, there’s an easier way.

As shown in the video, the eigenvalues are

m ± √(m² – p)

where m is the mean of the elements on the main diagonal and p is the determinant.

What is a polynomial?

When you leave the comfort of the real numbers, you might be mistaken about what a polynomial is.

Suppose you’re looking at polynomials over some finite field. Why would you do that? Numerous practical applications, but that’s a topic for another post.

You look in some reference that’s talking about polynomials and you see things like

x³ + x

referred to as a polynomial over the field with two elements, 0 and 1. “Yeah, I get it. It’s just like middle school, except the coefficients are zeros and ones.”

No, it’s not like middle school.

What happens when you stick in x = 0? You get 0. What happens when you stick in 1? You get 0, because 1 + 1 = 0 mod 2.

The function that takes a bit and returns the cube of the bit plus the bit itself always evaluates to 0 when working mod 2.

We have a non-zero polynomial that when interpreted as a function equals the zero function.

The resolution to this tension is that in general, a polynomial is not a function. You can associate a function with a polynomial by evaluating the polynomial for some value of x, but two different polynomials may correspond to the same function.

I have a vague memory of some professor correcting me for conflating a polynomial with a function. That seemed like pure pedantry at the time. It would have been helpful if said professor had explained why the distinction mattered. It never mattered in any example I could think of.

When you go back and look at how polynomials are defined over an arbitrary field, there are two possibilities. A reference may use powers of x and coefficients from the field, but say somewhere that the x is a “formal symbol” or something like that. The other possibility is to define polynomials as infinite sequences of field elements for which only a finite number of elements are nonzero, along with addition and multiplication rules that look just like what you’d expect for polynomials.

This may seem like unnecessary formalism. And for polynomials over real numbers it is. If two real polynomials differ as polynomials, i.e. they differ in one of their coefficients, then there is some place where they differ as functions. But that’s not true in general.

Multicolor reproducing cellular automata

The previous post looked at a cellular automaton introduced by Edward Fredkin. It has only two states: a cell is on or off. At each step, each cell is set to the sum of the states of the neighboring cells mod 2. So a cell is on if it had an odd number neighbors turned on, and is turned off if it had an even number of neighbors turned on.

You can look at cellular automata with more states, often represented as colors. If the number of states per cell is prime, then the extension of Fredkin’s automaton to multiple states also reproduces its initial pattern. Instead of taking the sum of the neighboring states mod 2, more generally you take the sum of the neighboring states mod p.

This theorem is due to Terry Winograd, cited in the same source as in the previous post.

Here’s an example with 11 states. The purple background represents 0. As before, I start with an ‘E’ pattern, but I make it multi-colored.

Initial state

Before turn on the automaton, we need to clarify what we mean by “neighborhood.” In this example, I mean cells to the north, south, east, and west. You could include the diagonals, but that would be different example, though still self-reproducing.

After the first step, the pattern blurs.

step 1

After 10 steps the original pattern is nowhere to be seen.

step 10

And then suddenly on the next step the pattern reappears.

step 11

I don’t know whether there are theorems on how long it takes the pattern to reappear, or how the time depends on the initial pattern, but in my experiments, the a pattern reappears after 4 steps with 2 states, 9 steps with 3 states, 5 steps with 5 states, 7 steps with 7 states, and 11 steps with 11 states.

Here’s an animation of the 11-color version, going through 22 steps.

Animated gif of CA

Self-reproducing cellular automata

Edward Fredkin is best known these days for the Fredkin gate, a universal reversible circuit.

I recently found out that Fredkin was one of the pioneers in cellular automata. His student Edwin Banks describes a cellular automaton introduced by Fredkin [1].

Edward Fredkin of MIT has described the following interesting cellular space. If the states are 0 and 1 and if the result state is obtained by taking the sum of the neighbors, modulo 2, then every initial configuration will reproduce copies of itself about the initial configuration.

In other words, start with some bit pattern in an infinite checkerboard of bits. Then at each step, replace each bit by the parity of the sum of the neighboring bits. Eventually you’ll see copies of that original pattern.

There’s some ambiguity regarding what “neighbors” means. Banks’ dissertation considers the neighbors of a bit to be the bits to the north, south, east, and west, and this implies that Banks has these neighbors in mind when he describes Fredkin’s automaton. Other sources say Fredkin also considered diagonally adjacent bits as part of the neighborhood, i.e. northwest, northeast, southwest, and southeast.

Banks goes on to say

Terry Winograd (1970) generalized this result showing that any neighborhood, not just the four nearest neighbors, and any number of dimensions still yields the same results.

I take it by “the four nearest neighbors” the neighbors to the north, south, east, and west. I’m not sure what the “any neighborhood” means in Winograd’s theorem; I imagine there must be some implied restriction.

In any case, I’ll show by example that both definitions of “neighborhood” discussed above lead to reproducing the original pattern, though they do not reproduce it in the same way.

I’ll start with an ‘E’ in the middle of a grid. The black squares represent 0s and the white squares represent 1s.

initial state

Here are the first eight steps in evolving this pattern, using the rule that “neighbor” only includes north, south, east, and west neighbors.

If we consider each point to have eight neighbors, i.e. we include diagonals, here’s what the first eight steps look like.

Update: Here’s an animated version of the first automaton

and the second

See the next post for a generalization using more than two states per cell, using colors rather than black and white cells. If the number of colors is prime, and you take the sum of the neighboring states mod p rather than mod 2, you get analogous results.

[1] Edwin R. Banks, Information Processing and Transmission in Cellular Automata. MIT dissertation. January 1971.

Aliquot ratio distribution

The previous post looked at repeatedly applying the function s(n) which is the sum of the divisors of n less than n. It is an open question whether the sequence

s( s( s( … s(n) … ) ) )

always converges or enters a loop. In fact, it’s an open question of whether the sequence starting with n = 276 converges.

A heuristic argument for why the sequence ought to converge, at least much of the time, is that s(n) is usually less than n. This can be made precise as explained here: in the limit as N goes to infinity, the proportion of numbers less than N for which s(n) < n is at least 0.752.

Even though applying s usually leads to a smaller result, conceivably it could lead to a much larger result when it increases, as with the hailstone problem.

I made some histograms of the ratio s(n) / n to get a feel for how much s increases or decreases its argument. (I imagine this has all been done before, but I haven’t seen it.)

Here’s my first attempt, for n up to 1000.

Histogram of aliquot ratios

So apparently the ratio is often less than 1.

I thought it would be clearer to work on a log scale and center the graph at 0 so I next made the following graph.

Histogram of log aliquot ratios

This shows that the aliquot function s never increases its argument by much, and it often reduces its argument quite a bit, at least for numbers less than 1000.

Next I made a histogram with numbers less than 1,000,000 to see whether the pattern continued. Here’s what I got.

Histogram of log aliquot ratios for numbers less than a million

The most obvious feature is the huge spike. If you can ignore that, mentally cropping the image to just the bottom part, it looks basically like the previous graph.

What’s up with that spike? It’s in the bin containing log(0.5). In other words, the ratio s(n)/n is often near 0.5.

So I thought maybe there’s some number theoretic reason the ratio is often exactly 1/2. That’s not the case. It’s only exactly 1/2 once, when n = 2, but it’s often near 1/2.

In any case, applying s very often returns a smaller number than you started with, so it’s plausible that the iterated aliquot sequence would often converge. But empirically it looks like some sequences, such as the one starting at n = 276, diverge. They somehow weave through all the numbers for which the sequence would come tumbling down.

The iterated aliquot problem

Let s(n) be the sum of the proper divisors of n, the divisors less than n itself. A number n is called excessive, deficient, or perfect depending on whether s(n) – n is positive, negative, or 0 respectively, as I wrote about a few days ago.

The number s(n) is called the aliquot sum of n. The iterated aliquot problem asks whether repeated iterations of s always converge to a perfect number or to 0. This problem has something of the flavor of the Collatz conjecture, a.k.a. the hailstone problem.

As mentioned here, about 1/4 of all integers are excessive and about 3/4 are deficient. That is, s(n) is smaller than n for about 3 out of 4 integers. Based on just this, it seems plausible that iterated aliquots would often converge. (See the next post for a visualization of how much the function s increases or decreases its argument.)

We can investigate the iterated aliquot problem with the following Python code.

    from sympy import divisors
    def s(n):
        if n == 0:
            return 0
        return sum(divisors(n)) - n
    def perfect(n):
        return n == s(n)
    def iterate(n):
        i = 0
        while True:
            if perfect(n) or n == 0:
                return i
            n = s(n)
            i += 1

The function iterate counts how many iterations it takes for repeated applications of s to converge.

For n = 1 through 24, the sequence of iterates converges to 0, the only exception being the perfect number 6.

When n = 25 the sequence converges, but it converges to 6 rather than 0. That is, 25 is the smallest number which is not perfect but for which the sequence terminates in a perfect number.

When n = 138, the sequence converges, but it takes 178 iterations to do so. For smaller n the sequence converges much sooner.

When n = 220, we see something new. s(220) = 284, and s(284) = 220. That is, our code is stuck in an infinite loop.

So we need to go back and refine our question: does the sequence of iterations always converge to 0, converge to a perfect number, or enter a loop?

We have to modify the Python code above to look for loops;

    def iterate(n):
        i = 0
        seen = set()
        while True:
            if perfect(n) or n == 0 or n in seen:
                return i
            n = s(n)
            i += 1

At least for n < 276 the sequence converges.

When I tried 276, the iterations got much larger and slower. I let it run for 400 iterations and the last iteration was


This took 150 seconds. I switched over to Mathematica and it calculated the same result in 0.6 seconds.

Then I asked Mathematica to run 600 iterations and that took 64 seconds. I went up to 700 iterations and that took 6133 seconds. So it seems that each set of 100 iterations takes about 10 times longer than the previous one.

The last result I got was


Maybe the iterations turn around eventually, but so far it looks like they’ve reached escape velocity. The growth isn’t monotonic. It goes up and down, but more often goes up.

The problem of whether the sequence always converges is unsolved. I don’t know whether it’s known whether the particular sequence starting at 276 converges.

Update: See the first comment. It’s an open problem whether the sequence starting at 276 converges.

Cologarithms and Entropy

The term “cologarithm” was once commonly used but now has faded from memory. Here’s a plot of the frequency of the terms cololgarithm and colog from Google’s Ngram Viewer.

Frequency of colog and cologarithm

The cologarithm base b is the logarithm base 1/b, or equivalently, the negative of the logarithm base b.

cologb x = log1/b x = -logb x

I suppose people spoke of cologarithms more often when they did calculations with logarithm tables.


There’s one place where I would be tempted to use the colog notation, and that’s when speaking of Shannon entropy.

The Shannon entropy of a random variable with N possible values, each with probability pi, is defined to be

-\sum_{i=1}^N p_i \log_b \, p_i

At first glace this looks wrong, as if entropy is negative. But you’re taking the logs of numbers less than 1, so the logs are negative, and the negative sign outside the sum makes everything positive.

If we write the same definition in terms of cologarithms, we have

 \sum_{i=1}^N p_i \,\text{colog}_b \, p_i

which looks better, except it contains the unfamiliar colog.

Bits, nats, and dits

These days entropy is almost always measured in units of bits, i.e. using b = 2 in the definition above. This wasn’t always the case.

When logs are taken base e, the result is in units of nats. And when the logs are taken base 10, the result is in units of dits.

So binary logs give bits, natural logs give nats, and decimal logs give dits.

Bits are sometimes called “shannons” and dits were sometimes called “bans” or “hartleys.” The codebreakers at Bletchley Park during WWII used bans.

The unit “hartley” is named after Ralph Hartley. Hartley published a paper in 1928 defining what we now call Shannon entropy using logarithms base 10. Shannon cites Hartley in the opening paragraph of his famous paper A Mathematical Theory of Communication.

Related posts

Corollary of a well-known fact

When students learn about decimals, they’re told that every fraction either has a terminating decimal expansion or a repeating decimal expansion. The previous post gives a constructive proof of the converse: given a repeating fraction (in any base) it shows how to find what rational number it corresponds to.

Maybe you learned this so long ago that it seems obvious. But here’s a corollary that I don’t believe is so obvious.

For any positive integer n and for any integer b > 1, with n and b relatively prime, there is some number of the form bk – 1 which is divisible by n.

So if I take some number, say 37, I can find a hexadecimal number consisting entirely of Fs which is divisible by 37. In fact FFFFFFFFF will do the trick. This is just the statement above with b = 16.

How does this follow from every rational number having a repeating decimal expansion?

Write 1/n as a repeating decimal in base b. (The assumption that n and b are relatively prime implies that the decimal repeats.) If the period of this decimal is k, then the previous post shows that 1/n is a multiple of 1/(bk – 1). It follows that n is a divisor of bk – 1.

Related post: A bevy of ones

Repeating decimals in any base

My previous post ended with a discussion of repeating binary decimals such as

0.00110011…two = 1/5.

For this post I’ll explain how calculations like that are done, how to convert a repeating decimal in any base to a fraction.

First of all, we only need to consider repeating decimals of the form

0.1b, 0.01b, 0.001b, etc.

because every repeating decimal is an integer times an expression like one above. For example,

0.424242… = 42 × 0.010101…

You can think of that example as base 10, but it’s equally true in any base that has a 4, i.e. any base greater than 4.

Now suppose we have an expression

0.\underbrace{000\ldots1}_{n \text{ terms}}\underbrace{000\ldots1}_{n \text{ terms}}\cdots

in base b.

We can see that this expressions is

\sum_{k=1}^\infty \left(b^{-n} \right )^k = \frac{1}{b^n - 1}

by summing a geometric series.

So going back to our example above,

0.010101\ldots_b = \frac{1}{b^2 -1}

If we’re working in base 10, this equals 1/99. If we’re working in hexadecimal, this is 1/FFhex = 1/255.

I’ll finish with a duodecimal example. Suppose we have


and want to convert it to a fraction. We have

0.7\text{AB}7\text{AB}\ldots_{\text{twelve}} = 7\text{AB}_{\text{twelve}} \times 0.001001\ldots_{\text{twelve}} = \frac{7\text{AB}_{\text{twelve}}}{BBB_{\text{twelve}}}

Or 1139/1727 in base 10.