Groups of order 2023

How many groups are there with 2023 elements?

There’s obviously at least one: Z2023, the integers mod 2023.

Now 2023 = 7 × 289 = 7 × 17 × 17 and so we could also look at

Z7 + Z17 + Z17

where + denotes direct sum. An element of this group has the form (a, b, c) and the sum

(a, b, c) + (a′, b′, c′)

is defined by

((a + a)′ mod 7, (b + b′) mod 17, (c + c)′ mod 17).

Is this a different group than Z2023? Are there any other groups of order 2023?

Let’s first restrict our attention to Abelian groups. The classification theorem for finite Abelian groups tells us that there are two Abelian groups of order 2023:

Z7 + Z289


Z7 + Z17 + Z17

But what about Z2023? There’s a theorem [1] that says

ZmnZm + Zn

if and only if m and n are relatively prime. Since 7 and 289 are relatively prime, t

Z2023Z7 + Z289.

The theorem also says that Z17 + Z17 is not isomorphic to Z289 and it follows that their direct sums with Z7 are not isomorphic.

So we’ve demonstrated two non-isomorphic Abelian groups of order 2023, and a classification theorem says these are the only Abelian groups. There are no non-Abelian groups of order 2023, though that’s harder to show, and so we’ve found all the Abelian groups with 2023 elements.

More group theory posts

[1] Sketch of proof. Let d be the greatest common divisor of m and n. If d > 1 then every element of Zm + Zn has order mn/d < mn and so Zm + Zn if cannot be isomorphic to Zmn. On the other hand, if d = 1, then Zm + Zn has an element of order mn and so is cyclic.

The center may not hold

“… Things fall apart; the centre cannot hold …” — Yeats, The Second Coming


Center of a group

The center of a group is the set of elements that commute with everything else in the group. For example, matrix multiplication is not commutative in general. You can’t count on AB being equal to BA, though of course sometimes it does, and in fact there are some matrices that commute with all other matrices.

The identity matrix I, for example, commutes with everything. So do multiples of the identity matrix. In fact, those are the only matrices that commute with everything, so the center of the group of invertible n × n matrices consists of multiples of the identity matrix [1].


Now suppose we have a homomorphism f between two groups, G and H. That is, f is a function from G to H such that for any two elements g1 and g2 in G,

f(g1 g2) = f(g1) f(g2).

The function f gives a picture of G inside H that preserves multiplication in the sense above. Does f map the center of G to the center of H? In terms of a diagram, does the following diagram commute?

Here the center of a group is denoted with Z(). Why? The convention goes back to the German word zentrum for center. The hooks on the arrows indicate inclusion. If we restrict f to Z(G), is its image contained in Z(H)? Maybe.

Positive example: Quaternions

A homomorphism may or may not take the center to the center. If a group is Abelian, then the center of the group is the entire group because everything commutes with everything. So then if G and H are Abelian, f takes the center of G (i.e. all of G) to the center of H (i.e. H).

For a less trivial example, let H× be the group of non-zero quaternions under multiplication. Quaternion multiplication is not commutative, so the center Z(H×) is not trivial, and in fact the center is R×, the non-zero real numbers. Let q be a quaternion of unit length and let q* be its conjugate. Define f to be a rotation

f: pqpq*

Then f is a homomorphism and it takes the center, i.e. non-zero real numbers, to itself because if r is real then

qrq* = rqq* = r.

Now for an example where the center does not hold.

Negative example: Heisenberg matrices

Consider the group of matrices of the form

\begin{bmatrix} 1 & a & b \\ 0 & 1 & c \\ 0 & 0 & 1 \end{bmatrix}

I’ve gotten a little ahead of myself by calling the set of such matrices a group, but in fact it is a group, and it’s known as the Heisenberg group.

The center of the Heisenberg group is the set of matrices of the form above where a = c = 0, matrices with 1s on the diagonal, a possibly non-zero element in the top right corner, and zeros everywhere else.

Let f be the inclusion map from the Heisenberg group to the group of all invertible 3 × 3 matrices. The image of the center is itself, so it contains matrices with non-zero elements in the upper right corner. But as we said at the top of the post, the center of the group of invertible matrices is the set of diagonal matrices (with all diagonal elements equal) and so it doesn’t contain matrices with non-zero elements off the main diagonal.

Related posts

[1] To be precise, let K be a field and n a positive integer. The center of the general linear group GLn(K) is the set of elements of the form kI for k in K.

Why is the word problem hard?

This post is about the word problem. When mathematicians talk about “the word problem” we’re not talking about elementary math problems expressed in prose, such as “If Johnny has three apples, ….”

The word problem in algebra is to decide whether two strings of symbols are equivalent given a set of algebraic rules. I go into a longer description here.

I know that the word problem is hard, but I hadn’t given much thought to why it is hard. It has always seemed more like a warning sign than a topic to look into.

“Why don’t we just …?”

“That won’t work because … is equivalent to the word problem.”

In logic terminology, the word problem is undecidable. There can be no algorithm to solve the word problem in general, though there are algorithms for special cases.

So why is the word problem undecidable? As with most (all?) things that have been shown to be undecidable, the word problem is undecidable because the halting problem is undecidable. If there were an algorithm that could solve the word problem in general, you could use it to solve the halting problem. The halting problem is to write a program that can determine whether another arbitrary program will always terminate, and it can be shown that no such program can exist.

The impulse for writing this post came from stumbling across Keith Conrad’s expository article Why word problems are hard. His article goes into some of the algebraic background of the problem—free groups, (in)finitely generated groups, etc.—and gives some flavor for why the word problem is harder than it looks. The bottom line is that the word problem is hard because the halting problem is hard, but Conrad’s article gives you a little more of an idea what’s going on that just citing a logic theorem.

I still don’t have a cocktail-party answer for why the word problem is hard. Suppose a bright high school student who had heard of the word problem were at a cocktail party (drinking a Coke, of course) and asked why the word problem is hard. Suppose also that this student had not heard of the halting problem. Would the simplest explanation be to start by explaining the halting problem?

Suppose we change the setting a bit. You’re teaching a group theory course and the students know about groups and generators, but not about the halting problem, how might you give them a feel for why the word problem is hard? You might ask them to read Keith Conrad’s paper and point out that it shows that simpler problems than the word problem are harder than they seem at first.

Related posts

How small can a multiplicative group be?

The previous post looked at the multiplicative group of integers modulo a number of the form n = pq where p and q are prime. This post looks at general n.

The multiplicative group mod n consists of the integers from 1 to n-1 that are relative prime to n. So the size of this group is φ(n) where φ is Euler’s “totient” function.

If n is a large number, how large might the multiplicative group of integers mod n be? Or equivalently, what can we say about the size of φ(n)?

Upper bounds are easy. If n is prime, then φ(n) is n-1, and so for large n, φ(n) can be essentially as large as n. The more interesting question is how small φ(n) can be.

The following lower bound comes from [1]. Theorem 15 from that reference says that for n > 2, we have [2]

n/\varphi(n) \leq e^\gamma \log \log n + 5/(2 \log \log n)

Taking the reciprocal of both sides, we have a lower bound on the ratio of φ(n) to n.

We can use this to show that if n is a k-bit number, with k somewhere in the thousands, then φ(n) is at least a k-4 bit number.

Related posts

[1] J. Barkley Rosser, Lowell Schoenfeld. Approximate formulas for some functions of prime numbers. Illinois J. Math. 6(1): 64-94 (March 1962). DOI: 10.1215/ijm/1255631807

[2] Except when

n = 223092870 = 2×3×5×7×11×13×17×19×23,

the product of the first nine primes. With this value of n, φ(n)/n = 0.1636.

Encryption in groups of unknown order

One way of looking at RSA encryption, a way that generalizes to new methods, is that the method is based on group operations inside a group of unknown order, i.e. unknown to most people. Another way of putting it is that RSA encryption takes place in a group where everybody knows how to multiply but not everyone knows how to divide. This will be explained below.

RSA encryption starts by finding two large primes p and q. You compute the product

n = pq

and make it public, but keep the factors p and q secret. The RSA method lets anyone send you encrypted messages by doing certain operations in the multiplicative group of the integers mod n. (Details here.) Let’s call this group G.

The elements of G are the integers from 1 to n-1 that are relative prime to n. The group operation is multiplication mod n, i.e. to multiply two elements of G, multiply them first as ordinary integers, then divide the result by n and keep the remainder.

The order of G, the number of elements in G, is

φ(n) = (p – 1)(q – 1).

You know p and q, and so you know φ(n), but the public does not. The security of RSA depends on the assumption that the public cannot compute φ(n). If someone could factor n, they could compute φ(n), but it is assumed that for large enough p and q it is not computationally feasible to factor n.

The public knows how to multiply in G but not how to divide. That is, anyone can carry out multiplication, but they cannot compute multiplicative inverses. Only you know how to divide in G, because you know φ(n) and Euler’s theorem.

In some sense the public knows everything there is to know about G. It’s the multiplicative group of integers mod n, and you tell them what n is. And that does tell them all they need to know to send you messages, but in practice it doesn’t tell them enough to decrypt messages anyone else sends you.

When you’re using RSA for public key cryptography, you’re telling the world “Here’s an n. To communicate securely with me, carry out certain algorithms with the integers relatively prime to n.”

Someone might object “But how do we know whether an integer is relatively prime to n? You haven’t told us its factors.”

You could reply “Don’t worry about it. It’s extremely unlikely that you’d run into a number that isn’t relatively prime to n. In fact, if you did, you’d break my system wide open. But if you’re worried about it, you can efficiently confirm that your numbers are relatively prime to n.”

Let’s unpack that last statement. We’ve already said that the number of positive integers less and n and relatively prime to n is (p – 1)(q – 1). So the number that are not relatively prime is

pq – (p – 1)(q – 1) = p + q – 1

and the probability of accidentally running into a one of these numbers is

(p + q – 1)/pq

Now if p and q are 300 digit numbers, for example, then this probability is on the order of one chance in 10300.

The Euclidean algorithm lets you find the greatest common factor of enormous numbers quickly. If you have a number k and you want to test whether it’s relatively prime to n, you can compute gcd(k, n). If k was chosen at random, the gcd is extremely likely to be 1, i.e. relatively prime to n. But if the answer is not 1, then it’s either p or q, in which case you’ve broken the encryption.


If you can factor large numbers, you can break RSA encryption. But it’s conceivable that you may be able to break RSA without being able to factor large numbers. That is in fact the case when RSA is implemented poorly. But aside from implementation flaws, nobody knows whether breaking RSA is as hard as factoring. Michael Rabin came up with a variation on RSA that is provably as hard as factoring, though I don’t know whether it has ever been used in practice.

More cryptography posts here.

Probability of commuting

A couple years ago I wrote a blog post looking at how close the quaternions come to commuting. That is, the post looked at the average norm of xyyx.

A related question would be to ask how often quaternions do commute, i.e. the probability that xyyx = 0 for randomly chosen x and y.

There’s a general theorem for this [1]. For a discrete non-abelian group, the probability that two elements commute, chosen uniformly at random, is never more than 5/8 for any group.

To put it another way, in a finite group either all pairs of elements commute with each other or no more than 5/8 of all pairs commute, with no possibilities in between. You can’t have a group, for example, in which exactly 3 out of 4 pairs commute.

What if we have an infinite group like the quaternions?

Before we can answer that, we’ve got to say how we’d compute probabilities. With a finite group, the natural thing to do is make every point have equal probability. For a (locally compact) infinite group the natural choice is Haar measure.

Subject to some technical conditions, Haar measure is the only measure that interacts as expected with the group structure. It’s unique up to a constant multiple, and so it’s unique when we specify that the measure of the whole group has to be 1.

For compact non-abelian groups with Haar measure, we again get the result that no more than 5/8 of pairs commute.

[1] W. H. Gustafson. What is the Probability that Two Group Elements Commute? The American Mathematical Monthly, Nov., 1973, Vol. 80, No. 9, pp. 1031-1034.

The word problem

Most people have heard of word problems, but not as many have heard of the word problem. If you’re imagining that the word problem is some superlatively awful word problem, I can assure you it’s not. It’s both simpler and weirder than that.

The word problem is essentially about whether you can always apply algebraic rules in an automated way. The reason it is called the word problem is that you start by a description of your algebraic system in terms of symbols (“letters”) and concatenations of symbols (“words”) subject to certain rules, also called relations.

The word problem for groups

For example, you can describe a group by saying it contains a and b, and it satisfies the relations

a² = b²


a-1ba = b-1.

A couple things are implicit here. We’ve said this a group, and since every element in a group has an inverse, we’ve implied that a-1 and b-1 are in the group as well. Also from the definition of a group comes the assumption that multiplication is associative, that there’s an identity element, and that inverses work like they’re supposed to.

In the example above, you could derive everything about the group from the information given. In particular, someone could give you two words—strings made up of a, b, a-1, and b-1—and you could determine whether they are equal by applying the rules. But in general, this is not possible for groups.


The bad news is that in general this isn’t possible. In computer science terminology, the word problem is undecidable. There is no algorithm that can tell whether two words are equal given a list of relations, at least not in general. There are special cases where the word problem is solvable, but a general algorithm is not possible.

The word problem for semigroups

I presented the word problem above in the context of groups, but you could look at the word problem in more general contexts, such as semigroups. A semigroup is closed under some associative binary operation, and that’s it. There need not be any inverses or even an identity element.

Here’s a concrete example of a semigroup whose word problem has been proven to be undecidable. As before we have two symbols, a and b. And because we are in a semigroup, not a group, there are no inverses. Our semigroup consists of all finite sequences make out of a‘s and b‘s, subject to these five relations.

aba2b2 = b2a2ba

a2bab2a = b2a3ba

aba3b2 = ab2aba2

b3a2b2a2ba = b3a2b2a4

a4b2a2ba = b2a4

Source: Term Rewriting and All That


When I first saw groups presented this as symbols and relations, I got my hopes up that a large swath of group theory could be automated. A few minutes later my naive hopes were dashed. So in my mind I thought “Well, then this is hopeless.”

But that is not true. Sometimes the word problem is solvable. It’s like many other impossibility theorems. There’s no fifth degree analog of the quadratic equation in general, but there are fifth degree polynomials whose roots can be found in closed form. There’s no program that can tell whether any arbitrary program will halt, but that doesn’t mean you can’t tell whether some programs halt.

It didn’t occur to me at the time that it would be worthwhile to explore the boundaries, learning which word problems can or cannot be solved. It also didn’t occur to me that I would run into things like the word problem in practical applications, such as simplifying symbolic expressions and optimizing their evaluation. Undecidable problems lurk everywhere, but you can often step around them.

Nonlinear mod 5

This post is a follow-on to the previous post on perfectly nonlinear functions. In that post we defined a way to measure the degree of nonlinearity of a function between two Abelian groups. We looked at functions that take sequences of four bits to a single bit. In formal terms, our groups were GF(24) and GF(2).

In this post we’ll start out by looking at integers mod 5 to show how the content of the previous post takes a different form in different groups. Sometimes the simplicity of working in binary makes things harder to see. For example, we noted in the previous post that the distinction between addition and subtraction didn’t matter. It matters in general!

Let B be the group of integers mod 5 and A the group of pairs of integers mod 5. That is, A is the direct product B × B. We will compute the uniformity of several functions from A to B. (Recall that less uniformity means more nonlinearity.) Then we’ll conclude by looking what happens if we work modulo other integers.

Python code

Here’s the code we’ll use to compute uniformity.

    from itertools import product
    from numpy import array

    m = 5
    r = range(m)

    def deriv(f, x, a):
        return (f(x + a) - f(x))%m

    def delta(f, a, b):
        return {x for x in product(r, r) if all(deriv(f, array(x), a) == b)}

    def uniformity(f):
        u = 0
        for a in product(r, r):
            if a == (0, 0):
            for b in product(r, r):
                t = len(delta(f, array(a), array(b)))
                if t > u:
                    u = t
        return u

We didn’t call attention to it last time, but the function f(x + a) – f(x) is called a derivative of f, and that’s why we named the function above deriv.

Here are a few functions whose uniformity we’d like to compute.

    # (u, v) -> u^2 + v^2 
    def F2(x): return sum(x**2)%m

    # (u, b) -> u^3 + v^3 
    def F3(x): return sum(x**3)%m

    # (u, b) -> u^2 + v
    def G(x):  return (x[0]**2 + x[1])%m

    # (u, v) -> uv
    def H(x):  return (x[0]*x[1])%m

Now let’s look at their uniformity values.

    print( uniformity(F2) )
    print( uniformity(F3) )
    print( uniformity(G) )
    print( uniformity(H) )

Results mod 5

This prints out 5, 10, 25, and 5. This says that the functions F2 and H are the most nonlinear, in fact “perfectly nonlinear.” The function G, although nonlinear, has the same uniformity as a linear function.

The function F3 may be the most interesting, having intermediate uniformity. In some sense the sum of cubes is closer to being a linear function than the sum of squares is.

Other moduli

We can easily look at other groups by simply changing the value of m at the top of the code.

If we set the modulus to 7, we get analogous results. The uniformities of the four functions are 7, 14, 49, and 7. They’re ranked exactly as they were when the modulus was 5.

But that’s not always the case for other moduli as you can see in the table below. Working mod 8, for example, the sum of squares is more uniform than the sum of cubes, the opposite of the case mod 5 or mod 7.

    |  m |  F2 |  F3 |   G |  H |
    |  2 |   4 |   4 |   4 |  2 |
    |  3 |   3 |   9 |   9 |  3 |
    |  4 |  16 |   8 |  16 |  8 |
    |  5 |   5 |  10 |  25 |  5 |
    |  6 |  36 |  36 |  36 | 18 |
    |  7 |   7 |  14 |  49 |  7 |
    |  8 |  64 |  32 |  64 | 32 |
    |  9 |  27 |  81 |  81 | 27 |
    | 10 | 100 | 100 | 100 | 50 |
    | 11 |  11 |  22 | 121 | 11 |
    | 12 | 144 | 144 | 144 | 72 |

In every case, G is the most uniform function and H is the least. Also, G is strictly more uniform than H. But there are many different ways F2 and F3 can fit between G and H.

How big is the monster?

Voyager 1 photo of Jupiter. Jan. 6, 1979

Symmetries are captured by mathematical groups. And just as you can combine kinds symmetry to form new symmetries, you can combine groups to form new groups.

So-called simple groups are the building blocks of groups as prime numbers are the building blocks of integers [1].

Finite simple groups have been fully classified, and they fall into several families, with 26 exceptions that fall outside any of these families. The largest of these exceptional groups is called the monster.

The monster is very large, containing approximately 8 × 1053 elements. I saw a video by Richard Boucherds where he mentioned in passing that the number of elements in the group is a few orders of magnitude larger than the number of atoms in earth.

I tried to find a comparison that is closer to 8 × 1053 and settled on the number of atoms in Jupiter.

The mass of Jupiter is about 2 × 1027 kg. The planet is roughly 3/4 hydrogen and 1/4 helium by mass, and from that you can calculate that Jupiter contains about 1054 atoms.

Before doing my calculation with Jupiter, I first looked at lengths such as the diameter of the Milky Way in angstroms. But even the diameter of the observable universe in angstroms is far too small, only about 9 × 1036.

More on finite simple groups

[1] The way you build up groups from simple groups is more complicated than the way you build integers out of primes, but there’s still an analogy there.

The photo at the top of the post was taken by Voyager 1 on January 6, 1979.

Group symmetry of copula operations

You don’t often see references to group theory in a statistics book. Not that there aren’t symmetries in statistics that could be described in terms of groups, but this isn’t often pointed out.

Here’s an example from An Introduction to Copulas by Roger Nelsen.

Show that under composition the set of operations of forming the survival copula, the dual of a copula, and the co-copula of a given copula, along with the identity (i.e., ^, ~, *, and i) yields the dihedral group.

Nelsen gives the following multiplication table for copula operations.

    o | i ^ ~ *
    i | i ^ ~ *
    ^ | ^ i * ~
    ~ | ~ * i ^
    * | * ~ ^ i

The rest of this post explains what a copula is and what the operations above are.

What is a copula?

At a high level, a copula is a mathematical device for modeling the dependence between random variables. Sklar’s theorem says you can express the joint distribution of a set of random variables in terms of their marginal distributions and a copula. If the distribution functions are continuous, the copula is unique.

The precise definition of a copula is technical. We’ll limit ourselves to copulas in two dimensions to make things a little simpler.

Let I be the unit interval [0, 1]. Then a (two-dimensional) copula is a function from I × I to I  that satisfies

\begin{align*} C(0, v) &= 0\\ C(u, 0) &= 0\\ C(u, 1) &= u\\ C(1, v) &= v \end{align*}

and is 2-increasing.

The idea of a 2-increasing function is that “gradients point northeast.” Specifically, for all points (x1, y1) and (x2, y2) with x1x2 and y1y2, we have

C(x_2, y_2) - C(x_2, y_1) - C(x_1, y_2) + C(x_1, y_1) \,\geq\, 0

The definition of copula makes no mention of probability, but the 2-increasing condition says that C acts like the joint CDF of two random variables.

Survival copula, dual copula, co-copula

For a given copula C, the corresponding survival copula, dual copula, and co-copula are defined by

\begin{align*} \hat{C}(u, v) &= u + v - 1 + C(1-u, 1-v) \\ \tilde{C}(u, v) &= u + v - C(u,v) \\ C^*(u,v) &= 1 - C(1-u, 1-v) \end{align*}


The reason for the name “survival” has to do with a survival function, i.e. complementary CDF of a random variable. The survival copula is another copula, but the dual copula and co-copulas aren’t actually copulas.

This post hasn’t said much too about motivation or application—that would take a lot more than a short blog post—but it has included enough that you could verify that the operations do compose as advertised.

Update: See this post for more algebraic structure for copulas, a sort of convolution product.