The center may not hold

Posted on 18 December 2021 by John

“… Things fall apart; the centre cannot hold …” — Yeats, The Second Coming

Center of a group

The center of a group is the set of elements that commute with everything else in the group. For example, matrix multiplication is not commutative in general. You can’t count on AB being equal to BA, though of course sometimes it does, and in fact there are some matrices that commute with all other matrices.

The identity matrix I, for example, commutes with everything. So do multiples of the identity matrix. In fact, those are the only matrices that commute with everything, so the center of the group of invertible n × n matrices consists of multiples of the identity matrix [1].

Homomorphisms

Now suppose we have a homomorphism f between two groups, G and H. That is, f is a function from G to H such that for any two elements g₁ and g₂ in G,

f(g₁ g₂) = f(g₁) f(g₂).

The function f gives a picture of G inside H that preserves multiplication in the sense above. Does f map the center of G to the center of H? In terms of a diagram, does the following diagram commute?

Here the center of a group is denoted with Z(). Why? The convention goes back to the German word zentrum for center. The hooks on the arrows indicate inclusion. If we restrict f to Z(G), is its image contained in Z(H)? Maybe.

Positive example: Quaternions

A homomorphism may or may not take the center to the center. If a group is Abelian, then the center of the group is the entire group because everything commutes with everything. So then if G and H are Abelian, f takes the center of G (i.e. all of G) to the center of H (i.e. H).

For a less trivial example, let H^× be the group of non-zero quaternions under multiplication. Quaternion multiplication is not commutative, so the center Z(H^×) is not trivial, and in fact the center is R^×, the non-zero real numbers. Let q be a quaternion of unit length and let q^* be its conjugate. Define f to be a rotation

f: p ↦ qpq^*

Then f is a homomorphism and it takes the center, i.e. non-zero real numbers, to itself because if r is real then

qrq^* = rqq^* = r.

Now for an example where the center does not hold.

Negative example: Heisenberg matrices

Consider the group of matrices of the form

$\begin{bmatrix} 1 & a & b \\ 0 & 1 & c \\ 0 & 0 & 1 \end{bmatrix}$

I’ve gotten a little ahead of myself by calling the set of such matrices a group, but in fact it is a group, and it’s known as the Heisenberg group.

The center of the Heisenberg group is the set of matrices of the form above where a = c = 0, matrices with 1s on the diagonal, a possibly non-zero element in the top right corner, and zeros everywhere else.

Let f be the inclusion map from the Heisenberg group to the group of all invertible 3 × 3 matrices. The image of the center is itself, so it contains matrices with non-zero elements in the upper right corner. But as we said at the top of the post, the center of the group of invertible matrices is the set of diagonal matrices (with all diagonal elements equal) and so it doesn’t contain matrices with non-zero elements off the main diagonal.

[1] To be precise, let K be a field and n a positive integer. The center of the general linear group GL_n(K) is the set of elements of the form kI for k in K.

Why is the word problem hard?

Posted on 14 September 2021 by John

This post is about the word problem. When mathematicians talk about “the word problem” we’re not talking about elementary math problems expressed in prose, such as “If Johnny has three apples, ….”

The word problem in algebra is to decide whether two strings of symbols are equivalent given a set of algebraic rules. I go into a longer description here.

I know that the word problem is hard, but I hadn’t given much thought to why it is hard. It has always seemed more like a warning sign than a topic to look into.

“Why don’t we just …?”

“That won’t work because … is equivalent to the word problem.”

In logic terminology, the word problem is undecidable. There can be no algorithm to solve the word problem in general, though there are algorithms for special cases.

So why is the word problem undecidable? As with most (all?) things that have been shown to be undecidable, the word problem is undecidable because the halting problem is undecidable. If there were an algorithm that could solve the word problem in general, you could use it to solve the halting problem. The halting problem is to write a program that can determine whether another arbitrary program will always terminate, and it can be shown that no such program can exist.

The impulse for writing this post came from stumbling across Keith Conrad’s expository article Why word problems are hard. His article goes into some of the algebraic background of the problem—free groups, (in)finitely generated groups, etc.—and gives some flavor for why the word problem is harder than it looks. The bottom line is that the word problem is hard because the halting problem is hard, but Conrad’s article gives you a little more of an idea what’s going on that just citing a logic theorem.

I still don’t have a cocktail-party answer for why the word problem is hard. Suppose a bright high school student who had heard of the word problem were at a cocktail party (drinking a Coke, of course) and asked why the word problem is hard. Suppose also that this student had not heard of the halting problem. Would the simplest explanation be to start by explaining the halting problem?

Suppose we change the setting a bit. You’re teaching a group theory course and the students know about groups and generators, but not about the halting problem, how might you give them a feel for why the word problem is hard? You might ask them to read Keith Conrad’s paper and point out that it shows that simpler problems than the word problem are harder than they seem at first.

Encryption in groups of unknown order

Posted on 25 August 2021 by John

One way of looking at RSA encryption, a way that generalizes to new methods, is that the method is based on group operations inside a group of unknown order, i.e. unknown to most people. Another way of putting it is that RSA encryption takes place in a group where everybody knows how to multiply but not everyone knows how to divide. This will be explained below.

RSA encryption starts by finding two large primes p and q. You compute the product

n = pq

and make it public, but keep the factors p and q secret. The RSA method lets anyone send you encrypted messages by doing certain operations in the multiplicative group of the integers mod n. (Details here.) Let’s call this group G.

The elements of G are the integers from 1 to n − 1 that are relative prime to n. The group operation is multiplication mod n, i.e. to multiply two elements of G, multiply them first as ordinary integers, then divide the result by n and keep the remainder.

The order of G, the number of elements in G, is

φ(n) = (p − 1)(q − 1).

You know p and q, and so you know φ(n), but the public does not. The security of RSA depends on the assumption that the public cannot compute φ(n). If someone could factor n, they could compute φ(n), but it is assumed that for large enough p and q it is not computationally feasible to factor n.

The public knows how to multiply in G but not how to divide. That is, anyone can carry out multiplication, but they cannot compute multiplicative inverses. Only you know how to divide in G, because you know φ(n) and Euler’s theorem.

In some sense the public knows everything there is to know about G. It’s the multiplicative group of integers mod n, and you tell them what n is. And that does tell them all they need to know to send you messages, but in practice it doesn’t tell them enough to decrypt messages anyone else sends you.

When you’re using RSA for public key cryptography, you’re telling the world “Here’s an n. To communicate securely with me, carry out certain algorithms with the integers relatively prime to n.”

Someone might object “But how do we know whether an integer is relatively prime to n? You haven’t told us its factors.”

You could reply “Don’t worry about it. It’s extremely unlikely that you’d run into a number that isn’t relatively prime to n. In fact, if you did, you’d break my system wide open. But if you’re worried about it, you can efficiently confirm that your numbers are relatively prime to n.”

Let’s unpack that last statement. We’ve already said that the number of positive integers less and n and relatively prime to n is (p – 1)(q – 1). So the number that are not relatively prime is

pq − (p − 1)(q – 1) = p + q − 1

and the probability of accidentally running into a one of these numbers is

(p + q − 1)/pq

Now if p and q are 300 digit numbers, for example, then this probability is on the order of one chance in 10³⁰⁰.

The Euclidean algorithm lets you find the greatest common factor of enormous numbers quickly. If you have a number k and you want to test whether it’s relatively prime to n, you can compute gcd(k, n). If k was chosen at random, the gcd is extremely likely to be 1, i.e. relatively prime to n. But if the answer is not 1, then it’s either p or q, in which case you’ve broken the encryption.

***

If you can factor large numbers, you can break RSA encryption. But it’s conceivable that you may be able to break RSA without being able to factor large numbers. That is in fact the case when RSA is implemented poorly. But aside from implementation flaws, nobody knows whether breaking RSA is as hard as factoring. Michael Rabin came up with a variation on RSA that is provably as hard as factoring, though I don’t know whether it has ever been used in practice.

The word problem

Posted on 19 October 2020 by John

Most people have heard of word problems, but not as many have heard of the word problem. If you’re imagining that the word problem is some superlatively awful word problem, I can assure you it’s not. It’s both simpler and weirder than that.

The word problem is essentially about whether you can always apply algebraic rules in an automated way. The reason it is called the word problem is that you start by a description of your algebraic system in terms of symbols (“letters”) and concatenations of symbols (“words”) subject to certain rules, also called relations.

The word problem for groups

For example, you can describe a group by saying it contains a and b, and it satisfies the relations

a² = b²

and

a⁻¹ba = b⁻¹.

A couple things are implicit here. We’ve said this a group, and since every element in a group has an inverse, we’ve implied that a⁻¹ and b⁻¹ are in the group as well. Also from the definition of a group comes the assumption that multiplication is associative, that there’s an identity element, and that inverses work like they’re supposed to.

In the example above, you could derive everything about the group from the information given. In particular, someone could give you two words—strings made up of a, b, a⁻¹, and b⁻¹—and you could determine whether they are equal by applying the rules. But in general, this is not possible for groups.

Undecidable

The bad news is that in general this isn’t possible. In computer science terminology, the word problem is undecidable. There is no algorithm that can tell whether two words are equal given a list of relations, at least not in general. There are special cases where the word problem is solvable, but a general algorithm is not possible.

The word problem for semigroups

I presented the word problem above in the context of groups, but you could look at the word problem in more general contexts, such as semigroups. A semigroup is closed under some associative binary operation, and that’s it. There need not be any inverses or even an identity element.

Here’s a concrete example of a semigroup whose word problem has been proven to be undecidable. As before we have two symbols, a and b. And because we are in a semigroup, not a group, there are no inverses. Our semigroup consists of all finite sequences make out of a‘s and b‘s, subject to these five relations.

aba²b² = b²a²ba

a²bab²a = b²a³ba

aba³b² = ab²aba²

b³a²b²a²ba = b³a²b²a⁴

a⁴b²a²ba = b²a⁴

Source: Term Rewriting and All That

Experience

When I first saw groups presented this as symbols and relations, I got my hopes up that a large swath of group theory could be automated. A few minutes later my naive hopes were dashed. So in my mind I thought “Well, then this is hopeless.”

But that is not true. Sometimes the word problem is solvable. It’s like many other impossibility theorems. There’s no fifth degree analog of the quadratic equation in general, but there are fifth degree polynomials whose roots can be found in closed form. There’s no program that can tell whether any arbitrary program will halt, but that doesn’t mean you can’t tell whether some programs halt.

It didn’t occur to me at the time that it would be worthwhile to explore the boundaries, learning which word problems can or cannot be solved. It also didn’t occur to me that I would run into things like the word problem in practical applications, such as simplifying symbolic expressions and optimizing their evaluation. Undecidable problems lurk everywhere, but you can often step around them.

Nonlinear mod 5

Posted on 10 October 2020 by John

This post is a follow-on to the previous post on perfectly nonlinear functions. In that post we defined a way to measure the degree of nonlinearity of a function between two Abelian groups. We looked at functions that take sequences of four bits to a single bit. In formal terms, our groups were GF(2⁴) and GF(2).

In this post we’ll start out by looking at integers mod 5 to show how the content of the previous post takes a different form in different groups. Sometimes the simplicity of working in binary makes things harder to see. For example, we noted in the previous post that the distinction between addition and subtraction didn’t matter. It matters in general!

Let B be the group of integers mod 5 and A the group of pairs of integers mod 5. That is, A is the direct product B × B. We will compute the uniformity of several functions from A to B. (Recall that less uniformity means more nonlinearity.) Then we’ll conclude by looking what happens if we work modulo other integers.

Python code

Here’s the code we’ll use to compute uniformity.

    from itertools import product
    from numpy import array

    m = 5
    r = range(m)

    def deriv(f, x, a):
        return (f(x + a) - f(x))%m

    def delta(f, a, b):
        return {x for x in product(r, r) if all(deriv(f, array(x), a) == b)}

    def uniformity(f):
        u = 0
        for a in product(r, r):
            if a == (0, 0):
                continue
            for b in product(r, r):
                t = len(delta(f, array(a), array(b)))
                if t > u:
                    u = t
        return u

We didn’t call attention to it last time, but the function f(x + a) – f(x) is called a derivative of f, and that’s why we named the function above deriv.

Here are a few functions whose uniformity we’d like to compute.

    # (u, v) -> u^2 + v^2 
    def F2(x): return sum(x**2)%m

    # (u, b) -> u^3 + v^3 
    def F3(x): return sum(x**3)%m

    # (u, b) -> u^2 + v
    def G(x):  return (x[0]**2 + x[1])%m

    # (u, v) -> uv
    def H(x):  return (x[0]*x[1])%m

Now let’s look at their uniformity values.

    print( uniformity(F2) )
    print( uniformity(F3) )
    print( uniformity(G) )
    print( uniformity(H) )

Results mod 5

This prints out 5, 10, 25, and 5. This says that the functions F2 and H are the most nonlinear, in fact “perfectly nonlinear.” The function G, although nonlinear, has the same uniformity as a linear function.

The function F3 may be the most interesting, having intermediate uniformity. In some sense the sum of cubes is closer to being a linear function than the sum of squares is.

Other moduli

We can easily look at other groups by simply changing the value of m at the top of the code.

If we set the modulus to 7, we get analogous results. The uniformities of the four functions are 7, 14, 49, and 7. They’re ranked exactly as they were when the modulus was 5.

But that’s not always the case for other moduli as you can see in the table below. Working mod 8, for example, the sum of squares is more uniform than the sum of cubes, the opposite of the case mod 5 or mod 7.

    |----+-----+-----+-----+----|
    |  m |  F2 |  F3 |   G |  H |
    |----+-----+-----+-----+----|
    |  2 |   4 |   4 |   4 |  2 |
    |  3 |   3 |   9 |   9 |  3 |
    |  4 |  16 |   8 |  16 |  8 |
    |  5 |   5 |  10 |  25 |  5 |
    |  6 |  36 |  36 |  36 | 18 |
    |  7 |   7 |  14 |  49 |  7 |
    |  8 |  64 |  32 |  64 | 32 |
    |  9 |  27 |  81 |  81 | 27 |
    | 10 | 100 | 100 | 100 | 50 |
    | 11 |  11 |  22 | 121 | 11 |
    | 12 | 144 | 144 | 144 | 72 |
    |----+-----+-----+-----+----|

In every case, G is the most uniform function and H is the least. Also, G is strictly more uniform than H. But there are many different ways F2 and F3 can fit between G and H.

Group symmetry of copula operations

Posted on 16 March 2020 by John

You don’t often see references to group theory in a statistics book. Not that there aren’t symmetries in statistics that could be described in terms of groups, but this isn’t often pointed out.

Here’s an example from An Introduction to Copulas by Roger Nelsen.

Show that under composition the set of operations of forming the survival copula, the dual of a copula, and the co-copula of a given copula, along with the identity (i.e., ^, ~, *, and i) yields the dihedral group.

Nelsen gives the following multiplication table for copula operations.

    o | i ^ ~ *
    -----------
    i | i ^ ~ *
    ^ | ^ i * ~
    ~ | ~ * i ^
    * | * ~ ^ i

The rest of this post explains what a copula is and what the operations above are.

What is a copula?

At a high level, a copula is a mathematical device for modeling the dependence between random variables. Sklar’s theorem says you can express the joint distribution of a set of random variables in terms of their marginal distributions and a copula. If the distribution functions are continuous, the copula is unique.

The precise definition of a copula is technical. We’ll limit ourselves to copulas in two dimensions to make things a little simpler.

Let I be the unit interval [0, 1]. Then a (two-dimensional) copula is a function from I × I to I that satisfies

$\begin{align*} C(0, v) &= 0\\ C(u, 0) &= 0\\ C(u, 1) &= u\\ C(1, v) &= v \end{align*}$

and is 2-increasing.

The idea of a 2-increasing function is that “gradients point northeast.” Specifically, for all points (x₁, y₁) and (x₂, y₂) with x₁ ≤ x₂ and y₁ ≤ y₂, we have

$C(x_2, y_2) - C(x_2, y_1) - C(x_1, y_2) + C(x_1, y_1) \,\geq\, 0$

The definition of copula makes no mention of probability, but the 2-increasing condition says that C acts like the joint CDF of two random variables.

Survival copula, dual copula, co-copula

For a given copula C, the corresponding survival copula, dual copula, and co-copula are defined by

$\begin{align*} \hat{C}(u, v) &= u + v - 1 + C(1-u, 1-v) \\ \tilde{C}(u, v) &= u + v - C(u,v) \\ C^*(u,v) &= 1 - C(1-u, 1-v) \end{align*}$

respectively.

The reason for the name “survival” has to do with a survival function, i.e. complementary CDF of a random variable. The survival copula is another copula, but the dual copula and co-copulas aren’t actually copulas.

This post hasn’t said much too about motivation or application—that would take a lot more than a short blog post—but it has included enough that you could verify that the operations do compose as advertised.

Update: See this post for more algebraic structure for copulas, a sort of convolution product.

More group theory posts

Center of a group

Homomorphisms

Positive example: Quaternions

Negative example: Heisenberg matrices

Related posts

Related posts

Related posts

The word problem for groups

Undecidable

The word problem for semigroups

Experience

Python code

Results mod 5

Other moduli

More on finite simple groups

What is a copula?

Survival copula, dual copula, co-copula