Group theory and RSA encryption

RSA encryption a map from numbers mod n to numbers mod n where n is a public key. A message is represented as an integer m and is encrypted by computing

c = me mod n

where e is part of the public key. In practice, e is usually 65537 though it does not have to be.

Multiplicative group

As discussed in an earlier post, not all messages m can be decrypted unless we require m to be relatively prime to n. In practice this is almost certainly the case: discovering a message m not relatively prime to n is equivalent to finding a factor of n and breaking the encryption.

If we limit ourselves to messages which can be encrypted and decrypted, our messages come not from the integers mod n but from the multiplicative group of integers mod n: the integers less than and relatively prime to n form a group G under multiplication.

The encryption function that maps m to me is an invertible function on G. Its inverse is the function that maps c to cd where d is the private key. Encryption is an automorphism of G because

(m1 m2) e = m1e m2e mod n.

Totient functions

Euler’s theorem tells us that

mφ(n) = 1 mod n

for all m in G. Here φ is Euler’s totient function. There are φ(n) elements in G, and so we could see this as a consequence of Lagrange’s theorem: the order of an element divides the order of a group.

Now the order of a particular m might be less than φ(n). That is, we know that if we raise m to the exponent φ(n) we will get 1, but maybe a smaller exponent would do. In fact, maybe a smaller exponent would do for all m.

Carmichael’s totient function λ(n) is the smallest exponent k such that

mk = 1 mod n

for all m. For some values of n the two totient functions are the same, i.e. λ(n) = φ(n). But sometimes λ(n) is strictly less than φ(n). And going back to Lagrange’s theorem, λ(n) always divides φ(n).

For example, there are 4 positive integers less than and relatively prime to 8: 1, 3, 5, and 7. Since φ(8) = 4, Euler’s theorem says that the 4th power of any of these numbers will be congruent to 1 mod 8. That is true, but its also true that the square of any of these numbers is congruent to 1 mod 8. That is, λ(8) = 2.

Back to RSA

Now for RSA encryption, n = pq where p and q are large primes and pq. It follows that

φ(pq) = φ(p) φ(q) = (p − 1)(q − 1).

On the other hand,

λ(pq) = lcm( λ(p), λ(q) ) = lcm(p − 1, q − 1).

Since p − 1 and q − 1 at least share a factor of 2,

λ(n) ≤ φ(n)/2.


It’s possible that λ(n) is smaller than φ(n) by more than a factor of 2. For example,

φ(7 × 13) = 6 × 12 = 72


λ(7 × 13) = lcm(6, 12) = 12.

You could verify this last calculation with the following Python code:

>>> from sympy import gcd
>>> G = set(n for n in range(1, 91) if gcd(n, 91) == 1)
>>> set(n**12 % 91 for n in s)

This returns {1}.


The significance of Carmichael’s totient to RSA is that φ(n) can be replaced with λ(n) when finding private keys. Given a public exponent e, we can find d by solving

ed = 1 mod λ(n)

rather than

ed = 1 mod φ(n).

This gives us a smaller private key d which might lead to faster decryption.

OpenSSL example

I generated an RSA key with openssl as in this post

    openssl genpkey -out fd.key -algorithm RSA \
      -pkeyopt rsa_keygen_bits:2048 -aes-128-cbc

and read it using

    openssl pkey -in  fd.key -text -noout

The public exponent was 65537 as noted above. I then brought the numbers in the key over to Python.

    from sympy import lcm

    modulus = xf227d5...a9
    prime1 = 0xf33514...d9
    prime2 = 0xfee496...51
    assert(prime1*prime2 == modulus)

    publicExponent = 65537
    privateExponent = 0x03896d...91

    phi = (prime1 - 1)*(prime2 - 1)
    lamb = lcm(prime1 - 1, prime2 - 1)
    assert(publicExponent*privateExponent % lamb == 1)
    assert(publicExponent*privateExponent % phi != 1)

This confirms that the private key d is the inverse of e = 65537 using modulo λ(pq) and not modulo φ(pq).

Related posts

Möbius transformations over a finite field

A Möbius transformation is a function of the form

g(z) = \frac{az + b}{cz + d}

where adbc = 1.

We usually think of z as a complex number, but it doesn’t have to be. We could define Möbius transformations in any context where we can multiply, add, and divide, i.e. over any field. In particular, we could work over a finite field such as the integers modulo a prime. The plot above represents a Möbius transformation over a finite field which we will describe below.

There is a subtle point, however. In the context of the complex numbers, the transformation above doesn’t quite map the complex plane onto the complex plane. It maps the complex plane minus one point to the complex plane minus one point. The domain is missing the point z = −d/c because that value makes the denominator zero. It’s also missing a point in the range, namely a/c.

The holes in the domain and range are a minor irritant, analogous to the pea in The Princess and the Pea. You can work around the holes, though the formalism is a little complicated. But over a finite field, the holes are a big deal. If you’re working over the integers mod 7, for example, then 1/7th of your domain is missing.

In the case of the complex numbers, the usual fix is to replace the complex numbers ℂ with the extended complex numbers ℂ ∪ ∞ and say that g(−d/c) = ∞ and g(∞) = a/c. There are a couple ways to make this more rigorous/elegant. The topological approach is to think of ℂ ∪ ∞ as the Riemann sphere. The algebraic approach is to think of it as a projective space.

Now let’s turn to finite fields, say the integers mod 17, which we will write as ℤ17. For a concrete example, let’s set a = 3, b = 8, c = 6, and d = 5. Then adbc = 1 mod 17. The multiplicative inverse of 6 mod 17 is 3, so we have a hole in the domain when

z = −d/c = −5/6 = −5 × 3 = − 15 = 2 mod 17.

Following the patch used with complex numbers, we define g(2) to be ∞, and we define

g(∞) = a/c = 3/6 = 3 × 3 = 9 mod 17.

That’s all fine, except now we’re not actually working over ℤ17 but rather ℤ17 ∪ ∞. We could formalize this by saying we’re working in a projective space over ℤ17. For this post let’s just say we’re working over set G with 18 elements that mostly follows the rules of ℤ17 but has a couple additional rules.

Now our function g maps G onto G. No holes.

Here’s how we might implement g in Python.

    def g(n):
        if n == 2:
            return 17
        if n == 17:
            return 9
        a, b, c, d = 3, 8, 6, 5
        denom = c*n + d
        denom_inverse = pow(denom, -1, 17)
        return (a*n + b)*denom_inverse % 17

The plot at the top of the post arranges 18 points uniformly around a circle and connects n to g(n).

    from numpy import pi, linspace, sin, cos
    import matplotlib.pyplot as plt

    θ = 2*pi/18
    t = linspace(0, 2*pi)
    plt.plot(cos(t), sin(t), 'b-')

    for n in range(18):
        plt.plot(cos(n*θ), sin(n*θ), 'bo')
        plt.plot([cos(n*θ), cos(g(n)*θ)],
                 [sin(n*θ), sin(g(n)*θ)], 'g-')

Application to cryptography

What use is this? Möbius transformations over finite fields [1] are “higgledy-piggledy” in the words of George Marsaglia, and so they can be used to create random-like permutations. In particular, Möbius transformations over finite fields are used to design S-boxes for use in symmetric encryption algorithms.

Related posts

[1] Technically, finite fields plus an element at infinity.

[2] “If the [pseudorandom] numbers are not random, they are at least higgledy-piggledy.” — RNG researcher George Marsaglia

Ruzsa distance

A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group.

Subset difference

Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A B is defined the set

A - B = \{a - b \mid a \in A, b \in B \}

As is customary with Abelian groups, we denote the group operation by + and a b means the group operation applied to a and the inverse of b.

For example, let G be the group of integers mod 10. Let A = {1, 5} and B = {3, 7}. Then A B is the set {2, 4, 8}. There are only three elements because 1 − 3 and 5 − 7 are both congruent to 8.

Ruzsa distance

The Ruzsa distance between two subsets of an Abelian group is defined by

d(A,B) = \log \frac{|A-B|}{|A|^{1/2}\, |B|^{1/2}}

where |S| denotes the number of elements in a set S.

The Ruzsa distance is not a metric, but it fails in an unusual way. the four axioms of a metric are

  1. d(x, x) = 0
  2. d(x, y) > 0 unless x = y
  3. d(x, y) = d(y, x)
  4. d(x, z) ≤ d(x, y) + d(y, z)

The first axiom is usually trivial, but it’s the only one that doesn’t hold for Ruzsa distance. In particular, the last axiom, the triangle inequality, does hold.

To see that the first axiom does not always hold, lets again let G be the integers mod 10 and let A = {1, 3}. Then A A is the set {0, 2, 8} and d(A, A) = log 3/2 > 0.

Sometimes d(A, A) does equal zero. If A = G then A A = A, and so d(A, A) = log 1 = 0.

Entropic Ruzsa distance

If we switch from subsets of G to random variables taking values in G we can define an analog of Ruzsa distance between random variables X and Y, the entropic Ruzsa distance

d_{\text{ent}}(X, Y) = H(X' + Y') - \frac{1}{2}H(X) - \frac{1}{2}H(Y)

where X′ and Y′ are independent copies of X and Y and H is Shannon entropy. For more on entropic Ruzsa distance, see this post by Terence Tao.

Note that if A and B are subsets of G, and X and Y are uniform random variables with support on A and B respectively, then the negative terms above correspond to the log of 1/|A|½ |B|½.  The H(X′ + Y′) term isn’t the log of |AB| though because for one thing its a sum rather than a difference. For another, the sum of uniform random variables may not be uniform: there may be more than one way to end up at a particular sum, and so sum values will have higher probability.


Cayley graphs in Mathematica

The previous post mentioned the group S4, the group of all permutations of a set with four elements. This post will show a way to visualize this group.

The Mathematica command

        VertexLabels -> Placed["Name", Center],
        VertexSize -> 0.4]

generates the graph below.

Cayley graph of alternating group S4

This is an interesting image, but what does it mean?

The elements of S4 are represented by the circled numbers. The numbers correspond to the permutations of four elements, listed in lexicographical order. If you label the four elements a, b, c, and d then the permutations are listed in alphabetical order. Permutation 1 is [1, 2, 3, 4] to itself and Permutation 24 is its reverse [4, 3, 2, 1].

In the Mathematica application, mousing over a number shows which permutation it represents, though the static image above doesn’t have this feature.

The blue arrows represent the permutation that swaps the first two elements. So the blue arrow between node 1 and node 7 says that swapping the first two elements of Permutation 1 gives you Permutation 7, which is [2, 1, 3, 4]. The blue arrow going back from 7 to 1 says that the same swapping operation applied to Permutation 7 returns you to Permutation 1.

All the blue arrows come in pairs because swapping is its own inverse.

The green arrows represent a rotation. For example, the green arrow from 1 to 10 says that rotation turns [1, 2, 3, 4] into [2, 3, 4, 1]. The rotation operation is not its own inverse, so the arrows only go in one direction. But every green arrow is part of a diamond because applying the rotation operation four times sends you back where you started.

You can get from any permutation to any other permutation by repeatedly either swapping the first two elements or applying a rotation. In group theoretical terminology, these two permutations generate the group S4.

Related posts

Permutations and centralizers in Mathematica

I had a fleeting thought about how little I use group theory when I realize I used it just this week.

A couple days ago I needed to know which permutations of 4 elements commute with reversal. If r takes a sequence and reverses it, I need to find all permutations p such that pr = rp.

In group theory jargon, the group of all permutations of 4 elements is the symmetric group S4. The subgroup of elements that commute with r is the centralizer of r. So my task was to find the centralizer of r in S4. How do I pose this task to Mathematica?

Mathematica represents permutations as disjoint cycles. The permutation r is represented as

    Cycles[{{4, 1}, {2, 3}}]

because swapping the first and last elements, then swapping the middle two elements, reverses a list of four elements.

To find the centralizer of r I asked Mathematica

    GroupCentralizer[SymmetricGroup[4], Cycles[{{4, 1}, {2, 3}}]]

This returns

    PermutationGroup[{Cycles[{{1, 4}}], Cycles[{{2, 3}}], Cycles[{{1, 2}, {3, 4}}]}]

This does list the permutations that commute with r but rather the generators of the group of such permutations. If we ask for the elements of the group above with


this returns

     Cycles[{{2, 3}}], 
     Cycles[{{1, 2}, {3, 4}}], 
     Cycles[{{1, 2, 4, 3}}], 
     Cycles[{{1, 3, 4, 2}}], 
     Cycles[{{1, 3}, {2, 4}}], 
     Cycles[{{1, 4}}], 
     Cycles[{{1, 4}, {2, 3}}]}

I use basic group theory and other algebra all the time, but I don’t think of it that way. In this example, I had a question about permutations, and it only occurred to me later that I could phrase my question in the vocabulary of group theory. I use ideas from algebra more often than I use the vocabulary of algebra.

Related posts

Groups of order 2023

How many groups are there with 2023 elements?

There’s obviously at least one: Z2023, the integers mod 2023.

Now 2023 = 7 × 289 = 7 × 17 × 17 and so we could also look at

Z7 + Z17 + Z17

where + denotes direct sum. An element of this group has the form (a, b, c) and the sum

(a, b, c) + (a′, b′, c′)

is defined by

((a + a)′ mod 7, (b + b′) mod 17, (c + c)′ mod 17).

Is this a different group than Z2023? Are there any other groups of order 2023?

Let’s first restrict our attention to Abelian groups. The classification theorem for finite Abelian groups tells us that there are two Abelian groups of order 2023:

Z7 + Z289


Z7 + Z17 + Z17

But what about Z2023? There’s a theorem [1] that says

ZmnZm + Zn

if and only if m and n are relatively prime. Since 7 and 289 are relatively prime, t

Z2023Z7 + Z289.

The theorem also says that Z17 + Z17 is not isomorphic to Z289 and it follows that their direct sums with Z7 are not isomorphic.

So we’ve demonstrated two non-isomorphic Abelian groups of order 2023, and a classification theorem says these are the only Abelian groups. There are no non-Abelian groups of order 2023, though that’s harder to show, and so we’ve found all the Abelian groups with 2023 elements.

More group theory posts

[1] Sketch of proof. Let d be the greatest common divisor of m and n. If d > 1 then every element of Zm + Zn has order mn/d < mn and so Zm + Zn if cannot be isomorphic to Zmn. On the other hand, if d = 1, then Zm + Zn has an element of order mn and so is cyclic.

The center may not hold

“… Things fall apart; the centre cannot hold …” — Yeats, The Second Coming


Center of a group

The center of a group is the set of elements that commute with everything else in the group. For example, matrix multiplication is not commutative in general. You can’t count on AB being equal to BA, though of course sometimes it does, and in fact there are some matrices that commute with all other matrices.

The identity matrix I, for example, commutes with everything. So do multiples of the identity matrix. In fact, those are the only matrices that commute with everything, so the center of the group of invertible n × n matrices consists of multiples of the identity matrix [1].


Now suppose we have a homomorphism f between two groups, G and H. That is, f is a function from G to H such that for any two elements g1 and g2 in G,

f(g1 g2) = f(g1) f(g2).

The function f gives a picture of G inside H that preserves multiplication in the sense above. Does f map the center of G to the center of H? In terms of a diagram, does the following diagram commute?

Here the center of a group is denoted with Z(). Why? The convention goes back to the German word zentrum for center. The hooks on the arrows indicate inclusion. If we restrict f to Z(G), is its image contained in Z(H)? Maybe.

Positive example: Quaternions

A homomorphism may or may not take the center to the center. If a group is Abelian, then the center of the group is the entire group because everything commutes with everything. So then if G and H are Abelian, f takes the center of G (i.e. all of G) to the center of H (i.e. H).

For a less trivial example, let H× be the group of non-zero quaternions under multiplication. Quaternion multiplication is not commutative, so the center Z(H×) is not trivial, and in fact the center is R×, the non-zero real numbers. Let q be a quaternion of unit length and let q* be its conjugate. Define f to be a rotation

f: pqpq*

Then f is a homomorphism and it takes the center, i.e. non-zero real numbers, to itself because if r is real then

qrq* = rqq* = r.

Now for an example where the center does not hold.

Negative example: Heisenberg matrices

Consider the group of matrices of the form

\begin{bmatrix} 1 & a & b \\ 0 & 1 & c \\ 0 & 0 & 1 \end{bmatrix}

I’ve gotten a little ahead of myself by calling the set of such matrices a group, but in fact it is a group, and it’s known as the Heisenberg group.

The center of the Heisenberg group is the set of matrices of the form above where a = c = 0, matrices with 1s on the diagonal, a possibly non-zero element in the top right corner, and zeros everywhere else.

Let f be the inclusion map from the Heisenberg group to the group of all invertible 3 × 3 matrices. The image of the center is itself, so it contains matrices with non-zero elements in the upper right corner. But as we said at the top of the post, the center of the group of invertible matrices is the set of diagonal matrices (with all diagonal elements equal) and so it doesn’t contain matrices with non-zero elements off the main diagonal.

Related posts

[1] To be precise, let K be a field and n a positive integer. The center of the general linear group GLn(K) is the set of elements of the form kI for k in K.

Why is the word problem hard?

This post is about the word problem. When mathematicians talk about “the word problem” we’re not talking about elementary math problems expressed in prose, such as “If Johnny has three apples, ….”

The word problem in algebra is to decide whether two strings of symbols are equivalent given a set of algebraic rules. I go into a longer description here.

I know that the word problem is hard, but I hadn’t given much thought to why it is hard. It has always seemed more like a warning sign than a topic to look into.

“Why don’t we just …?”

“That won’t work because … is equivalent to the word problem.”

In logic terminology, the word problem is undecidable. There can be no algorithm to solve the word problem in general, though there are algorithms for special cases.

So why is the word problem undecidable? As with most (all?) things that have been shown to be undecidable, the word problem is undecidable because the halting problem is undecidable. If there were an algorithm that could solve the word problem in general, you could use it to solve the halting problem. The halting problem is to write a program that can determine whether another arbitrary program will always terminate, and it can be shown that no such program can exist.

The impulse for writing this post came from stumbling across Keith Conrad’s expository article Why word problems are hard. His article goes into some of the algebraic background of the problem—free groups, (in)finitely generated groups, etc.—and gives some flavor for why the word problem is harder than it looks. The bottom line is that the word problem is hard because the halting problem is hard, but Conrad’s article gives you a little more of an idea what’s going on that just citing a logic theorem.

I still don’t have a cocktail-party answer for why the word problem is hard. Suppose a bright high school student who had heard of the word problem were at a cocktail party (drinking a Coke, of course) and asked why the word problem is hard. Suppose also that this student had not heard of the halting problem. Would the simplest explanation be to start by explaining the halting problem?

Suppose we change the setting a bit. You’re teaching a group theory course and the students know about groups and generators, but not about the halting problem, how might you give them a feel for why the word problem is hard? You might ask them to read Keith Conrad’s paper and point out that it shows that simpler problems than the word problem are harder than they seem at first.

Related posts

How small can a multiplicative group be?

The previous post looked at the multiplicative group of integers modulo a number of the form n = pq where p and q are prime. This post looks at general n.

The multiplicative group mod n consists of the integers from 1 to n − 1 that are relative prime to n. So the size of this group is φ(n) where φ is Euler’s “totient” function.

If n is a large number, how large might the multiplicative group of integers mod n be? Or equivalently, what can we say about the size of φ(n)?

Upper bounds are easy. If n is prime, then φ(n) is n-1, and so for large n, φ(n) can be essentially as large as n. The more interesting question is how small φ(n) can be.

The following lower bound comes from [1]. Theorem 15 from that reference says that for n > 2, we have [2]

n/\varphi(n) \leq e^\gamma \log \log n + 5/(2 \log \log n)

Taking the reciprocal of both sides, we have a lower bound on the ratio of φ(n) to n.

We can use this to show that if n is a k bit number, with k somewhere in the thousands, then φ(n) is at least a k − 4 bit number.

Related posts

[1] J. Barkley Rosser, Lowell Schoenfeld. Approximate formulas for some functions of prime numbers. Illinois J. Math. 6(1): 64-94 (March 1962). DOI: 10.1215/ijm/1255631807

[2] Except when

n = 223092870 = 2×3×5×7×11×13×17×19×23,

the product of the first nine primes. With this value of n, φ(n)/n = 0.1636.

Encryption in groups of unknown order

One way of looking at RSA encryption, a way that generalizes to new methods, is that the method is based on group operations inside a group of unknown order, i.e. unknown to most people. Another way of putting it is that RSA encryption takes place in a group where everybody knows how to multiply but not everyone knows how to divide. This will be explained below.

RSA encryption starts by finding two large primes p and q. You compute the product

n = pq

and make it public, but keep the factors p and q secret. The RSA method lets anyone send you encrypted messages by doing certain operations in the multiplicative group of the integers mod n. (Details here.) Let’s call this group G.

The elements of G are the integers from 1 to n − 1 that are relative prime to n. The group operation is multiplication mod n, i.e. to multiply two elements of G, multiply them first as ordinary integers, then divide the result by n and keep the remainder.

The order of G, the number of elements in G, is

φ(n) = (p − 1)(q − 1).

You know p and q, and so you know φ(n), but the public does not. The security of RSA depends on the assumption that the public cannot compute φ(n). If someone could factor n, they could compute φ(n), but it is assumed that for large enough p and q it is not computationally feasible to factor n.

The public knows how to multiply in G but not how to divide. That is, anyone can carry out multiplication, but they cannot compute multiplicative inverses. Only you know how to divide in G, because you know φ(n) and Euler’s theorem.

In some sense the public knows everything there is to know about G. It’s the multiplicative group of integers mod n, and you tell them what n is. And that does tell them all they need to know to send you messages, but in practice it doesn’t tell them enough to decrypt messages anyone else sends you.

When you’re using RSA for public key cryptography, you’re telling the world “Here’s an n. To communicate securely with me, carry out certain algorithms with the integers relatively prime to n.”

Someone might object “But how do we know whether an integer is relatively prime to n? You haven’t told us its factors.”

You could reply “Don’t worry about it. It’s extremely unlikely that you’d run into a number that isn’t relatively prime to n. In fact, if you did, you’d break my system wide open. But if you’re worried about it, you can efficiently confirm that your numbers are relatively prime to n.”

Let’s unpack that last statement. We’ve already said that the number of positive integers less and n and relatively prime to n is (p – 1)(q – 1). So the number that are not relatively prime is

pq − (p − 1)(q – 1) = p + q − 1

and the probability of accidentally running into a one of these numbers is

(p + q − 1)/pq

Now if p and q are 300 digit numbers, for example, then this probability is on the order of one chance in 10300.

The Euclidean algorithm lets you find the greatest common factor of enormous numbers quickly. If you have a number k and you want to test whether it’s relatively prime to n, you can compute gcd(k, n). If k was chosen at random, the gcd is extremely likely to be 1, i.e. relatively prime to n. But if the answer is not 1, then it’s either p or q, in which case you’ve broken the encryption.


If you can factor large numbers, you can break RSA encryption. But it’s conceivable that you may be able to break RSA without being able to factor large numbers. That is in fact the case when RSA is implemented poorly. But aside from implementation flaws, nobody knows whether breaking RSA is as hard as factoring. Michael Rabin came up with a variation on RSA that is provably as hard as factoring, though I don’t know whether it has ever been used in practice.

More cryptography posts here.