Normal and non-normal subgroups

The word “normal” in mathematical nomenclature does not always means “usual” or “customary” as it does in colloquial English. Instead, it might that something has a convenient property. That is the case for normal subgroups.

We can do things with normal subgroups that we cannot do with other subgroups, such as take quotients, and so once normal subgroups are introduced in an algebra class, non-normal subgroups disappear from the discussion. A student might be left with the impression that non-normal subgroups don’t exist.

This post will give a very simple example of a group with a non-normal subgroup, and show how we can’t do operations with this group that we can with normal subgroups.

Definition of normal subgroup

A normal subgroup of a group G is a subgroup N such gN = Ng for any element g in G. That is, if I multiply everything in N by g on the left or I multiply everything in N by g on the right, I get the same set of elements.

This does not mean that gn = ng for every n in N. It means that for every n in N, there is some element m in N such that gn = mg. The elements n and m might be the same, but they might not.

In an Abelian (commutative) group, gn always equals ng, and so all subgroups are normal. But most groups are not Abelian.

Structure-preserving functions

Along with every algebraic structure there are functions that preserve aspects of that structure. In the category of groups, these structure-preserving functions are called homomorphisms, coming from the Greek words for “same” and “shape.”

A homomorphism between groups gives you a sort of picture of the first group inside the second group in a way that is consistent with the structure of groups. Specifically, if f is a homomorphism from a groups G to a group H, then

f(xy) = f(x) f(y).

Here we denote the group operation by writing two things next to each other. So “xy” means the group operation in G applied to x and y. This operation may or may not be multiplication. The same is true on the right-hand side: f(x) f(y) means the group operation in H applied to f(x) and f(y).

For example, if we let G be the real numbers with the group operation being addition, and we let H be the positive real numbers with the group operation being multiplication, then the exponential function is a homomorphism between these groups:

exp(x + y) = exp(x) exp(y).

Kernels

The kernel of a homomorphism between G and H is the subset of things in G that get sent to the identity element in H. In the example above, the identity element in H is 1, and the only element in G that gets mapped to 1 is the number 0. It’s not a coincidence that 0 is the identity element in G: homomorphisms always send the identify element of one group to the identity element of the other group. The kernel of a homomorphism always includes the identity, but it may also include more.

Normal subgroups are subgroups that can be kernels. A subgroup of G is normal if and only if there is some homomorphism from G to another group that has that subgroup as its kernel.

A non-normal subgroup

Let G be the group of permutations on three elements. The group operation is composition, i.e. do one permutation then do the other.

The identity element is the permutation that leaves everything alone. Denote this by 1. Another element of G is the permutation that swaps the first two elements. We’ll denote this by (12).

So if our three elements are a, b, and c, then the permutation 1 takes (abc) to (abc). And the permutation (12) takes (abc) to (b, ac).

The two elements 1 and (12) form a subgroup of G. But we will show that a homomorphism from G to a group H cannot take these two elements, and only these two elements, to the identity on H. It won’t matter what group H is, but we’ll need a name for the identity element in H. Let’s call it e.

Let f be a homomorphism from G to H. () By definition

f(xy) = f(x) f(y)

and by applying this to (xy)z we have

f(xyz) = f(x) f(y) f(z)

If we let z be the inverse of x and y is in the kernel of f, we have

f(xyx−1) = f(x) f(y) f(x−1) = f(x) e f(x)−1 = e.

This says that if y is in the kernel of f, xyx−1 must also be in the kernel of f for every x.

The permutation that swaps the second and third elements, (23), is its own inverse: swap the second and third elements twice and you’re back where you started. So if (12) is in the kernel of f then so is (23)(12)(23). You can work out that (23)(12)(23) reverses three elements. This shows that if the subgroup consisting of 1 and (12) is in the kernel of f, so is the reverse permutation, which is not part of our subgroup. So our subgroup alone cannot be a kernel of a homomorphism.

Related posts

Elliptic curve Diffie-Hellman key exchange

I concluded the previous post by saying elliptic curve Diffie-Hellman key exchange (ECDHE) requires smaller keys than finite field Diffie-Hellman (FFDHE) to obtain the same level of security.

How much smaller are we talking about? According to NIST recommendations, a 256-bit elliptic curve provides about the same security as working over a 3072-bit finite field. Not only are elliptic curves smaller, they scale better. A 512-bit elliptic curve is believed to be about as secure as a 15360-bit finite field: a factor of 2x for elliptic curves and a factor of 5x for finite fields.

The core idea of Diffie-Hellman is to pick a group G, an element g, and a large number of x. If y is the result of starting with x and applying the group operation x times, it is difficult to recover x from knowing y. This is called the discrete logarithm problem, taking its name from the case of the group operation being multiplication. But the inverse problem is still called the discrete logarithm problem when the group is additive.

In FFDHE the group G is the multiplicative group of a generator g modulo a large prime p. Applying the group operation (i.e. multiplication) to g a number of times x is computing

y = gx

and x is rightly called a discrete logarithm; the process is directly analogous to taking the logarithm of a real number.

In ECDHE the group is given by addition on an elliptic curve. Applying the group operation x times to g, adding g to itself x times, is written xg. The problem of recovering x from xg is still called the discrete logarithm problem, though you could call it the discrete “division” problem.

Some groups are unsuited for Diffie-Hellman cryptography because the discrete logarithm problem is easy. If we let G be the additive group modulo a prime (not the multiplicative group) then it is easy to recover x from xg.

Note that when we talk about applying the group operation a large number of times, we mean a really large number of times, in theory, though not in practice. If you’re working over an elliptic curve with on the order of 2256 elements, and x is on the order of 2256, then xg is the result of adding x to itself on the order of 2256 times. But in practice you’d double g on the order of 256 times. See fast exponentiation.

In the post on FFDHE we said that you have to be careful that your choice of prime and generator doesn’t give the group structure that a cryptanalysist could exploit. This is also true for the elliptic curves used in ECDHE, and even more so because elliptic curves are more subtle than finite fields.

If large-scale quantum computing ever becomes practical, Diffie-Hellman encryption will be broken because a quantum computer can solve discrete logarithm problems efficiently via Schor’s algorithm. This applies equally to finite fields and elliptic curves.

Related posts

Group theory and RSA encryption

RSA encryption a map from numbers mod n to numbers mod n where n is a public key. A message is represented as an integer m and is encrypted by computing

c = me mod n

where e is part of the public key. In practice, e is usually 65537 though it does not have to be.

Multiplicative group

As discussed in an earlier post, not all messages m can be decrypted unless we require m to be relatively prime to n. In practice this is almost certainly the case: discovering a message m not relatively prime to n is equivalent to finding a factor of n and breaking the encryption.

If we limit ourselves to messages which can be encrypted and decrypted, our messages come not from the integers mod n but from the multiplicative group of integers mod n: the integers less than and relatively prime to n form a group G under multiplication.

The encryption function that maps m to me is an invertible function on G. Its inverse is the function that maps c to cd where d is the private key. Encryption is an automorphism of G because

(m1 m2) e = m1e m2e mod n.

Totient functions

Euler’s theorem tells us that

mφ(n) = 1 mod n

for all m in G. Here φ is Euler’s totient function. There are φ(n) elements in G, and so we could see this as a consequence of Lagrange’s theorem: the order of an element divides the order of a group.

Now the order of a particular m might be less than φ(n). That is, we know that if we raise m to the exponent φ(n) we will get 1, but maybe a smaller exponent would do. In fact, maybe a smaller exponent would do for all m.

Carmichael’s totient function λ(n) is the smallest exponent k such that

mk = 1 mod n

for all m. For some values of n the two totient functions are the same, i.e. λ(n) = φ(n). But sometimes λ(n) is strictly less than φ(n). And going back to Lagrange’s theorem, λ(n) always divides φ(n).

For example, there are 4 positive integers less than and relatively prime to 8: 1, 3, 5, and 7. Since φ(8) = 4, Euler’s theorem says that the 4th power of any of these numbers will be congruent to 1 mod 8. That is true, but its also true that the square of any of these numbers is congruent to 1 mod 8. That is, λ(8) = 2.

Back to RSA

Now for RSA encryption, n = pq where p and q are large primes and pq. It follows that

φ(pq) = φ(p) φ(q) = (p − 1)(q − 1).

On the other hand,

λ(pq) = lcm( λ(p), λ(q) ) = lcm(p − 1, q − 1).

Since p − 1 and q − 1 at least share a factor of 2,

λ(n) ≤ φ(n)/2.

Example

It’s possible that λ(n) is smaller than φ(n) by more than a factor of 2. For example,

φ(7 × 13) = 6 × 12 = 72

but

λ(7 × 13) = lcm(6, 12) = 12.

You could verify this last calculation with the following Python code:

>>> from sympy import gcd
>>> G = set(n for n in range(1, 91) if gcd(n, 91) == 1)
>>> set(n**12 % 91 for n in s)

This returns {1}.

Implementation

The significance of Carmichael’s totient to RSA is that φ(n) can be replaced with λ(n) when finding private keys. Given a public exponent e, we can find d by solving

ed = 1 mod λ(n)

rather than

ed = 1 mod φ(n).

This gives us a smaller private key d which might lead to faster decryption.

OpenSSL example

I generated an RSA key with openssl as in this post

    openssl genpkey -out fd.key -algorithm RSA \
      -pkeyopt rsa_keygen_bits:2048 -aes-128-cbc

and read it using

    openssl pkey -in  fd.key -text -noout

The public exponent was 65537 as noted above. I then brought the numbers in the key over to Python.

    from sympy import lcm

    modulus = xf227d5...a9
    prime1 = 0xf33514...d9
    prime2 = 0xfee496...51
    assert(prime1*prime2 == modulus)

    publicExponent = 65537
    privateExponent = 0x03896d...91

    phi = (prime1 - 1)*(prime2 - 1)
    lamb = lcm(prime1 - 1, prime2 - 1)
    assert(publicExponent*privateExponent % lamb == 1)
    assert(publicExponent*privateExponent % phi != 1)

This confirms that the private key d is the inverse of e = 65537 using modulo λ(pq) and not modulo φ(pq).

Related posts

Möbius transformations over a finite field

A Möbius transformation is a function of the form

g(z) = \frac{az + b}{cz + d}

where adbc = 1.

We usually think of z as a complex number, but it doesn’t have to be. We could define Möbius transformations in any context where we can multiply, add, and divide, i.e. over any field. In particular, we could work over a finite field such as the integers modulo a prime. The plot above represents a Möbius transformation over a finite field which we will describe below.

There is a subtle point, however. In the context of the complex numbers, the transformation above doesn’t quite map the complex plane onto the complex plane. It maps the complex plane minus one point to the complex plane minus one point. The domain is missing the point z = −d/c because that value makes the denominator zero. It’s also missing a point in the range, namely a/c.

The holes in the domain and range are a minor irritant, analogous to the pea in The Princess and the Pea. You can work around the holes, though the formalism is a little complicated. But over a finite field, the holes are a big deal. If you’re working over the integers mod 7, for example, then 1/7th of your domain is missing.

In the case of the complex numbers, the usual fix is to replace the complex numbers ℂ with the extended complex numbers ℂ ∪ ∞ and say that g(−d/c) = ∞ and g(∞) = a/c. There are a couple ways to make this more rigorous/elegant. The topological approach is to think of ℂ ∪ ∞ as the Riemann sphere. The algebraic approach is to think of it as a projective space.

Now let’s turn to finite fields, say the integers mod 17, which we will write as ℤ17. For a concrete example, let’s set a = 3, b = 8, c = 6, and d = 5. Then adbc = 1 mod 17. The multiplicative inverse of 6 mod 17 is 3, so we have a hole in the domain when

z = −d/c = −5/6 = −5 × 3 = − 15 = 2 mod 17.

Following the patch used with complex numbers, we define g(2) to be ∞, and we define

g(∞) = a/c = 3/6 = 3 × 3 = 9 mod 17.

That’s all fine, except now we’re not actually working over ℤ17 but rather ℤ17 ∪ ∞. We could formalize this by saying we’re working in a projective space over ℤ17. For this post let’s just say we’re working over set G with 18 elements that mostly follows the rules of ℤ17 but has a couple additional rules.

Now our function g maps G onto G. No holes.

Here’s how we might implement g in Python.

    def g(n):
        if n == 2:
            return 17
        if n == 17:
            return 9
        a, b, c, d = 3, 8, 6, 5
        denom = c*n + d
        denom_inverse = pow(denom, -1, 17)
        return (a*n + b)*denom_inverse % 17

The plot at the top of the post arranges 18 points uniformly around a circle and connects n to g(n).

    from numpy import pi, linspace, sin, cos
    import matplotlib.pyplot as plt

    θ = 2*pi/18
    t = linspace(0, 2*pi)
    plt.plot(cos(t), sin(t), 'b-')

    for n in range(18):
        plt.plot(cos(n*θ), sin(n*θ), 'bo')
        plt.plot([cos(n*θ), cos(g(n)*θ)],
                 [sin(n*θ), sin(g(n)*θ)], 'g-')
    plt.gca().set_aspect("equal")
    plt.show()

Application to cryptography

What use is this? Möbius transformations over finite fields [1] are “higgledy-piggledy” in the words of George Marsaglia, and so they can be used to create random-like permutations. In particular, Möbius transformations over finite fields are used to design S-boxes for use in symmetric encryption algorithms.

Related posts

[1] Technically, finite fields plus an element at infinity.

[2] “If the [pseudorandom] numbers are not random, they are at least higgledy-piggledy.” — RNG researcher George Marsaglia

Ruzsa distance

A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group.

Subset difference

Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A B is defined the set

A - B = \{a - b \mid a \in A, b \in B \}

As is customary with Abelian groups, we denote the group operation by + and a b means the group operation applied to a and the inverse of b.

For example, let G be the group of integers mod 10. Let A = {1, 5} and B = {3, 7}. Then A B is the set {2, 4, 8}. There are only three elements because 1 − 3 and 5 − 7 are both congruent to 8.

Ruzsa distance

The Ruzsa distance between two subsets of an Abelian group is defined by

d(A,B) = \log \frac{|A-B|}{|A|^{1/2}\, |B|^{1/2}}

where |S| denotes the number of elements in a set S.

The Ruzsa distance is not a metric, but it fails in an unusual way. the four axioms of a metric are

  1. d(x, x) = 0
  2. d(x, y) > 0 unless x = y
  3. d(x, y) = d(y, x)
  4. d(x, z) ≤ d(x, y) + d(y, z)

The first axiom is usually trivial, but it’s the only one that doesn’t hold for Ruzsa distance. In particular, the last axiom, the triangle inequality, does hold.

To see that the first axiom does not always hold, lets again let G be the integers mod 10 and let A = {1, 3}. Then A A is the set {0, 2, 8} and d(A, A) = log 3/2 > 0.

Sometimes d(A, A) does equal zero. If A = G then A A = A, and so d(A, A) = log 1 = 0.

Entropic Ruzsa distance

If we switch from subsets of G to random variables taking values in G we can define an analog of Ruzsa distance between random variables X and Y, the entropic Ruzsa distance

d_{\text{ent}}(X, Y) = H(X' + Y') - \frac{1}{2}H(X) - \frac{1}{2}H(Y)

where X′ and Y′ are independent copies of X and Y and H is Shannon entropy. For more on entropic Ruzsa distance, see this post by Terence Tao.

Note that if A and B are subsets of G, and X and Y are uniform random variables with support on A and B respectively, then the negative terms above correspond to the log of 1/|A|½ |B|½.  The H(X′ + Y′) term isn’t the log of |AB| though because for one thing its a sum rather than a difference. For another, the sum of uniform random variables may not be uniform: there may be more than one way to end up at a particular sum, and so sum values will have higher probability.

 

Cayley graphs in Mathematica

The previous post mentioned the group S4, the group of all permutations of a set with four elements. This post will show a way to visualize this group.

The Mathematica command

    CayleyGraph[
        SymmetricGroup[4], 
        VertexLabels -> Placed["Name", Center],
        VertexSize -> 0.4]

generates the graph below.

Cayley graph of alternating group S4

This is an interesting image, but what does it mean?

The elements of S4 are represented by the circled numbers. The numbers correspond to the permutations of four elements, listed in lexicographical order. If you label the four elements a, b, c, and d then the permutations are listed in alphabetical order. Permutation 1 is [1, 2, 3, 4] to itself and Permutation 24 is its reverse [4, 3, 2, 1].

In the Mathematica application, mousing over a number shows which permutation it represents, though the static image above doesn’t have this feature.

The blue arrows represent the permutation that swaps the first two elements. So the blue arrow between node 1 and node 7 says that swapping the first two elements of Permutation 1 gives you Permutation 7, which is [2, 1, 3, 4]. The blue arrow going back from 7 to 1 says that the same swapping operation applied to Permutation 7 returns you to Permutation 1.

All the blue arrows come in pairs because swapping is its own inverse.

The green arrows represent a rotation. For example, the green arrow from 1 to 10 says that rotation turns [1, 2, 3, 4] into [2, 3, 4, 1]. The rotation operation is not its own inverse, so the arrows only go in one direction. But every green arrow is part of a diamond because applying the rotation operation four times sends you back where you started.

You can get from any permutation to any other permutation by repeatedly either swapping the first two elements or applying a rotation. In group theoretical terminology, these two permutations generate the group S4.

Related posts

Permutations and centralizers in Mathematica

I had a fleeting thought about how little I use group theory when I realize I used it just this week.

A couple days ago I needed to know which permutations of 4 elements commute with reversal. If r takes a sequence and reverses it, I need to find all permutations p such that pr = rp.

In group theory jargon, the group of all permutations of 4 elements is the symmetric group S4. The subgroup of elements that commute with r is the centralizer of r. So my task was to find the centralizer of r in S4. How do I pose this task to Mathematica?

Mathematica represents permutations as disjoint cycles. The permutation r is represented as

    Cycles[{{4, 1}, {2, 3}}]

because swapping the first and last elements, then swapping the middle two elements, reverses a list of four elements.

To find the centralizer of r I asked Mathematica

    GroupCentralizer[SymmetricGroup[4], Cycles[{{4, 1}, {2, 3}}]]

This returns

    PermutationGroup[{Cycles[{{1, 4}}], Cycles[{{2, 3}}], Cycles[{{1, 2}, {3, 4}}]}]

This does list the permutations that commute with r but rather the generators of the group of such permutations. If we ask for the elements of the group above with

    GroupElements[%]

this returns

    {Cycles[{}], 
     Cycles[{{2, 3}}], 
     Cycles[{{1, 2}, {3, 4}}], 
     Cycles[{{1, 2, 4, 3}}], 
     Cycles[{{1, 3, 4, 2}}], 
     Cycles[{{1, 3}, {2, 4}}], 
     Cycles[{{1, 4}}], 
     Cycles[{{1, 4}, {2, 3}}]}

I use basic group theory and other algebra all the time, but I don’t think of it that way. In this example, I had a question about permutations, and it only occurred to me later that I could phrase my question in the vocabulary of group theory. I use ideas from algebra more often than I use the vocabulary of algebra.

Related posts

Groups of order 2023

How many groups are there with 2023 elements?

There’s obviously at least one: Z2023, the integers mod 2023.

Now 2023 = 7 × 289 = 7 × 17 × 17 and so we could also look at

Z7 + Z17 + Z17

where + denotes direct sum. An element of this group has the form (a, b, c) and the sum

(a, b, c) + (a′, b′, c′)

is defined by

((a + a)′ mod 7, (b + b′) mod 17, (c + c)′ mod 17).

Is this a different group than Z2023? Are there any other groups of order 2023?

Let’s first restrict our attention to Abelian groups. The classification theorem for finite Abelian groups tells us that there are two Abelian groups of order 2023:

Z7 + Z289

and

Z7 + Z17 + Z17

But what about Z2023? There’s a theorem [1] that says

ZmnZm + Zn

if and only if m and n are relatively prime. Since 7 and 289 are relatively prime, t

Z2023Z7 + Z289.

The theorem also says that Z17 + Z17 is not isomorphic to Z289 and it follows that their direct sums with Z7 are not isomorphic.

So we’ve demonstrated two non-isomorphic Abelian groups of order 2023, and a classification theorem says these are the only Abelian groups. There are no non-Abelian groups of order 2023, though that’s harder to show, and so we’ve found all the Abelian groups with 2023 elements.

More group theory posts

[1] Sketch of proof. Let d be the greatest common divisor of m and n. If d > 1 then every element of Zm + Zn has order mn/d < mn and so Zm + Zn if cannot be isomorphic to Zmn. On the other hand, if d = 1, then Zm + Zn has an element of order mn and so is cyclic.

The center may not hold

“… Things fall apart; the centre cannot hold …” — Yeats, The Second Coming

 

Center of a group

The center of a group is the set of elements that commute with everything else in the group. For example, matrix multiplication is not commutative in general. You can’t count on AB being equal to BA, though of course sometimes it does, and in fact there are some matrices that commute with all other matrices.

The identity matrix I, for example, commutes with everything. So do multiples of the identity matrix. In fact, those are the only matrices that commute with everything, so the center of the group of invertible n × n matrices consists of multiples of the identity matrix [1].

Homorphisms

Now suppose we have a homomorphism f between two groups, G and H. That is, f is a function from G to H such that for any two elements g1 and g2 in G,

f(g1 g2) = f(g1) f(g2).

The function f gives a picture of G inside H that preserves multiplication in the sense above. Does f map the center of G to the center of H? In terms of a diagram, does the following diagram commute?

Here the center of a group is denoted with Z(). Why? The convention goes back to the German word zentrum for center. The hooks on the arrows indicate inclusion. If we restrict f to Z(G), is its image contained in Z(H)? Maybe.

Positive example: Quaternions

A homomorphism may or may not take the center to the center. If a group is Abelian, then the center of the group is the entire group because everything commutes with everything. So then if G and H are Abelian, f takes the center of G (i.e. all of G) to the center of H (i.e. H).

For a less trivial example, let H× be the group of non-zero quaternions under multiplication. Quaternion multiplication is not commutative, so the center Z(H×) is not trivial, and in fact the center is R×, the non-zero real numbers. Let q be a quaternion of unit length and let q* be its conjugate. Define f to be a rotation

f: pqpq*

Then f is a homomorphism and it takes the center, i.e. non-zero real numbers, to itself because if r is real then

qrq* = rqq* = r.

Now for an example where the center does not hold.

Negative example: Heisenberg matrices

Consider the group of matrices of the form

\begin{bmatrix} 1 & a & b \\ 0 & 1 & c \\ 0 & 0 & 1 \end{bmatrix}

I’ve gotten a little ahead of myself by calling the set of such matrices a group, but in fact it is a group, and it’s known as the Heisenberg group.

The center of the Heisenberg group is the set of matrices of the form above where a = c = 0, matrices with 1s on the diagonal, a possibly non-zero element in the top right corner, and zeros everywhere else.

Let f be the inclusion map from the Heisenberg group to the group of all invertible 3 × 3 matrices. The image of the center is itself, so it contains matrices with non-zero elements in the upper right corner. But as we said at the top of the post, the center of the group of invertible matrices is the set of diagonal matrices (with all diagonal elements equal) and so it doesn’t contain matrices with non-zero elements off the main diagonal.

Related posts

[1] To be precise, let K be a field and n a positive integer. The center of the general linear group GLn(K) is the set of elements of the form kI for k in K.

Why is the word problem hard?

This post is about the word problem. When mathematicians talk about “the word problem” we’re not talking about elementary math problems expressed in prose, such as “If Johnny has three apples, ….”

The word problem in algebra is to decide whether two strings of symbols are equivalent given a set of algebraic rules. I go into a longer description here.

I know that the word problem is hard, but I hadn’t given much thought to why it is hard. It has always seemed more like a warning sign than a topic to look into.

“Why don’t we just …?”

“That won’t work because … is equivalent to the word problem.”

In logic terminology, the word problem is undecidable. There can be no algorithm to solve the word problem in general, though there are algorithms for special cases.

So why is the word problem undecidable? As with most (all?) things that have been shown to be undecidable, the word problem is undecidable because the halting problem is undecidable. If there were an algorithm that could solve the word problem in general, you could use it to solve the halting problem. The halting problem is to write a program that can determine whether another arbitrary program will always terminate, and it can be shown that no such program can exist.

The impulse for writing this post came from stumbling across Keith Conrad’s expository article Why word problems are hard. His article goes into some of the algebraic background of the problem—free groups, (in)finitely generated groups, etc.—and gives some flavor for why the word problem is harder than it looks. The bottom line is that the word problem is hard because the halting problem is hard, but Conrad’s article gives you a little more of an idea what’s going on that just citing a logic theorem.

I still don’t have a cocktail-party answer for why the word problem is hard. Suppose a bright high school student who had heard of the word problem were at a cocktail party (drinking a Coke, of course) and asked why the word problem is hard. Suppose also that this student had not heard of the halting problem. Would the simplest explanation be to start by explaining the halting problem?

Suppose we change the setting a bit. You’re teaching a group theory course and the students know about groups and generators, but not about the halting problem, how might you give them a feel for why the word problem is hard? You might ask them to read Keith Conrad’s paper and point out that it shows that simpler problems than the word problem are harder than they seem at first.

Related posts