Circular coordinate art

About three years ago I ran across a strange coordinate system in which familiar functions lead to interesting plots. The system is called “circular coordinates” but it is not polar coordinates.

This morning I was playing around with this again.

Here’s a plot of f(x) = x.

f(x) = x

And here’s a plot of f(x) = cos(8x).

See this post for details of circular coordinates.

Here is Python code to make the plots. You can experiment with your own plots by changing the definition of f.

# See Mathematics Magazine, v 52 no 3, p175

from numpy import cos
from numpy import linspace
import matplotlib.pyplot as plt'seaborn-v0_8-muted')

def g(u, c, f):
    t = f(u) + c
    return 2*u*t**2 / (u**2 + t**2)

def h(u, c, f):
    t = f(u) + c
    return 2*u*u*t / (u**2 + t**2)

t = linspace(-7, 7, 10000)
fig, ax = plt.subplots()
f = lambda x: cos(8*x) 
for c in range(-10, 11):
    ax.plot(g(t, c, f), h(t, c, f))

When there is only one group of a given size

Today’s date, US style, is 9/26/2023, and there is only one group, up to isomorphism, of size 9262023. You could verify this in Mathematica with the command


which returns 1.

For a given n, when is there only one group of size n?

There are two requirements. First, n has to be the product of distinct primes, i.e. no prime appears in the factorization with a power greater than 1. Second, no prime divides one less than another prime.


9262023 = 3 × 41 × 257 ×293

and you can check that 3 does not divide 40, 256, or 292, nor does 41 divide 2, 252, or 292, etc.

A more compact way to state the criteria above is to say

gcd(n, φ(n)) = 1

where φ(n) is Euler’s totient function, the number of positive numbers less than n and relatively prime to n.

Why are these criteria equivalent? If

n = pqr


φ(n) = (p − 1)(q − 1)(r − 1)…

If n and φ(n) have a nontrivial common factor, it has to be one of the prime factors of n, and none of these divide any term of φ(n).

Source: Dieter Jungnickel. On the Uniqueness of the Cyclic Group of Order n. The American Mathematical Monthly, Vol. 99, No. 6. (Jun. – Jul., 1992), pp. 545–547.

Analogy between prime numbers and simple groups

Simple groups are the building blocks of groups similar to the way prime numbers are the building blocks of integers. This post will unpack this analogy in two ways:

  1. How do simple groups compare to prime numbers?
  2. How does the composition of simple groups compare to the composition of prime numbers?

The former analogy is stronger than the latter.

Primes and simple groups

A simple group has no nontrivial subgroups, just a prime number has no nontrivial factors. Except that’s not quite right. A simple group is defined as having no nontrivial normal subgroups. The previous post compares normal and non-normal subgroups. Normal subgroups have nice properties which are necessary for decomposition and composition. You can’t define quotients for non-normal groups.

Every subgroup of an Abelian group is normal, so in the context of Abelian groups it is true that simple groups have no nontrivial subgroups, i.e. the only subgroups of a simple Abelian group G are the identity and G itself. It follows from Sylow’s theorems that the order of a finite Abelian group with no nontrivial factors must be an integer with no nontrivial factors, i.e. a prime number. Every Abelian finite simple group must be isomoprphic to the integers mod p for some prime p.

Non-Abelian finite simple groups do not have prime order, but they not decomposable in the sense described below.

Composition and decomposition

Prime numbers compose to form other numbers by products. You can also compose groups by taking products, though you need more than that. It is not the case that all finite groups are products of finite simple groups.

Let ℤn denote the cyclic group of order n and let ⊕ denote direct sum. The group ℤ4 is not isomorphic to ℤ2 ⊕ ℤ2. Even in the case of Abelian groups, not all Abelian groups are the direct sum or direct product of simple groups. [1]

Finite groups can be decomposed into smaller finite simple groups, but we can’t easily or uniquely rebuild a group from this decomposition.

The Jordan-Hölder theorem says that a finite group G has a composition series

1 = H0H1 ⊲ … ⊲ Hn = G

where each H is a maximal normal subgroup of the next, the quotients Hi+1 / Hi of consecutive are simple groups. The composition series is not unique, but all such series are equivalent in a sense that the Jordan-Hölder theorem makes precise.

It seems to me that the composition series ought to be called a decomposition series in that you can start with G and find the H‘s, but it’s a difficult problem, known as “the extension problem,” to reconstruct G from the H‘s, and in general there are multiple solutions.

The analogy to prime numbers would be if there was an essentially unique way to factor a number, but not a unique way to multiply the factors back together.


Some people thought that the classification of finite simple groups would be the end group theory. That has not been the case. Some also thought sequencing of the human genome would lead to cures for a huge range of diseases. That has not been the case either. Reductionism often produces disappointing results.

Related posts

[1] In the context of Abelian groups, (direct) products and coproducts (i.e. direct sums) are isomorphic.

Normal and non-normal subgroups

The word “normal” in mathematical nomenclature does not always means “usual” or “customary” as it does in colloquial English. Instead, it might that something has a convenient property. That is the case for normal subgroups.

We can do things with normal subgroups that we cannot do with other subgroups, such as take quotients, and so once normal subgroups are introduced in an algebra class, non-normal subgroups disappear from the discussion. A student might be left with the impression that non-normal subgroups don’t exist.

This post will give a very simple example of a group with a non-normal subgroup, and show how we can’t do operations with this group that we can with normal subgroups.

Definition of normal subgroup

A normal subgroup of a group G is a subgroup N such gN = Ng for any element g in G. That is, if I multiply everything in N by g on the left or I multiply everything in N by g on the right, I get the same set of elements.

This does not mean that gn = ng for every n in N. It means that for every n in N, there is some element m in N such that gn = mg. The elements n and m might be the same, but they might not.

In an Abelian (commutative) group, gn always equals ng, and so all subgroups are normal. But most groups are not Abelian.

Structure-preserving functions

Along with every algebraic structure there are functions that preserve aspects of that structure. In the category of groups, these structure-preserving functions are called homomorphisms, coming from the Greek words for “same” and “shape.”

A homomorphism between groups gives you a sort of picture of the first group inside the second group in a way that is consistent with the structure of groups. Specifically, if f is a homomorphism from a groups G to a group H, then

f(xy) = f(x) f(y).

Here we denote the group operation by writing two things next to each other. So “xy” means the group operation in G applied to x and y. This operation may or may not be multiplication. The same is true on the right-hand side: f(x) f(y) means the group operation in H applied to f(x) and f(y).

For example, if we let G be the real numbers with the group operation being addition, and we let H be the positive real numbers with the group operation being multiplication, then the exponential function is a homomorphism between these groups:

exp(x + y) = exp(x) exp(y).


The kernel of a homomorphism between G and H is the subset of things in G that get sent to the identity element in H. In the example above, the identity element in H is 1, and the only element in G that gets mapped to 1 is the number 0. It’s not a coincidence that 0 is the identity element in G: homomorphisms always send the identify element of one group to the identity element of the other group. The kernel of a homomorphism always includes the identity, but it may also include more.

Normal subgroups are subgroups that can be kernels. A subgroup of G is normal if and only if there is some homomorphism from G to another group that has that subgroup as its kernel.

A non-normal subgroup

Let G be the group of permutations on three elements. The group operation is composition, i.e. do one permutation then do the other.

The identity element is the permutation that leaves everything alone. Denote this by 1. Another element of G is the permutation that swaps the first two elements. We’ll denote this by (12).

So if our three elements are a, b, and c, then the permutation 1 takes (abc) to (abc). And the permutation (12) takes (abc) to (b, ac).

The two elements 1 and (12) form a subgroup of G. But we will show that a homomorphism from G to a group H cannot take these two elements, and only these two elements, to the identity on H. It won’t matter what group H is, but we’ll need a name for the identity element in H. Let’s call it e.

Let f be a homomorphism from G to H. () By definition

f(xy) = f(x) f(y)

and by applying this to (xy)z we have

f(xyz) = f(x) f(y) f(z)

If we let z be the inverse of x and y is in the kernel of f, we have

f(xyx-1) = f(x) f(y) f(x-1) = f(x) e f(x)-1 = e.

This says that if y is in the kernel of f, xyx-1 must also be in the kernel of f for every x.

The permutation that swaps the second and third elements, (23), is its own inverse: swap the second and third elements twice and you’re back where you started. So if (12) is in the kernel of f then so is (23)(12)(23). You can work out that (23)(12)(23) reverses three elements. This shows that if the subgroup consisting of 1 and (12) is in the kernel of f, so is the reverse permutation, which is not part of our subgroup. So our subgroup alone cannot be a kernel of a homomorphism.

Related posts

Mersenne primes are unsafe

In the previous post I mentioned that a particular Mersenne prime would be unsuitable for cryptography. In fact, all Mersenne primes are unsuitable for cryptography.

A prime number p is called “safe” if

p = 2q + 1

where q is also a prime. Safe primes are called safe because p − 1 does not have small factors (other than 2). The factors of p − 1 correspond to subgroups of the group used for encryption, and small groups can be exploited to attack encryption.

Mersenne numbers are numbers of the form

Mn = 2n − 1.

Mersenne primes are Mersenne numbers that are also prime. A necessary condition for Mn to be prime is for n to be prime. This condition is not sufficient. For example,

211 − 1 = 23 × 89.

But is necessary, for reasons we’ll get into shortly.

If Mn = 2q + 1, then q = Mn−1. But if n is a prime, then n − 1 is not a prime, with one exception: n = 3. So the only Mersenne prime that is a safe prime is M3 = 7, which is not a particularly large prime. Public key cryptography uses numbers in the thousands of digits, not single digits.

Why does n have to be prime before Mn stands a chance of being prime?

If a > 1, then xa − 1 can be factored:

xa − 1 = (x − 1)(xa−1 + xa−2 + … + 1)

If n can be factored into ab, then set x = 2b. This shows that 2ab − 1 has a factor, namely 2b − 1.

In the previous post we said that M127 − 1 has a lot of small factors. We can find some of those factors easily:

M127 − 1 = 2 M126 = 2 (2126 − 1)

and (2126 − 1) is divisible by 2k – 1 for every k that divides 126.

The nontrivial factors of 126 are 2, 3, 6, 7, 9, 14, 18, 21, 42, 63, and so 2k – 1 is a factor of M126 for k equal to each of these numbers. This is enough to fully factor 2126 − 1 into

3³ × 7² × 19 × 43 × 73 × 127 × 337 × 5419 × 92737 × 649657 × 77158673929

given in the footnote from the previous post. You could easily come up with this factorization using pencil and paper, though it would not be easy to determine by hand that the last factor is a prime number.

Victorian public key cryptography

Electronic computers were invented before public key cryptography. Would public key cryptography have been possible before computers?

The security of RSA encryption depends on the ratio of the difficulty of factoring relative to the difficulty of multiplication. This ratio was high, maybe higher, before modern computers.

Suppose the idea of RSA encryption had occurred to Lewis Carroll (1832–1898). What key size would have been necessary for security in his day? Would it have been practical to manually encrypt data using keys of that size?

I imagine if you handed Victorians a pair of public and private RSA keys, they could have manually carried out public key encryption. But coming up with private keys, i.e. finding pairs of large prime numbers, would be harder than carrying out the encryption process.

The largest prime discovered without a computer was 2127 − 1, proved prime by Edouard Lucas in 1876. Such primes would have been large enough—I doubt it was feasible to factor the product of 40-digit primes at the time—but this was a prime of a very special form, a Mersenne prime. Lucas had developed an algorithm for efficiently testing whether a Mersenne number is prime. To this day the largest known primes are Mersenne primes. More on this here.

Lucas would not have been able to produce two 40-digit primes. The largest known prime in 1851 had 12 digits:


Because of the special form of this number, it would seem that even coming up with 12-digit primes was quite an achievement. Euler (1707–1783) had found a 10-digit prime, but it was also a Mersenne prime. Large primes without special structure were unknown.

Perhaps if Lewis Carroll had found a couple moderately large primes, he might have presented them to his queen to be used in public key cryptography. Their product could be published in newspapers, but the factors would be state secrets. Anyone could send Queen Victoria encrypted messages via public communication.

Diffie-Hellman public key encryption might have been more practical. It only requires one large prime, and that prime can be made public. Everyone can share it.

The prime p that Lucas discovered would do, until people realized that p − 1 has a lot of small factors [1], which could be used to break Diffie-Hellman cryptography. I don’t know that any large safe primes were known until more recently.

If someone from the future had given the Victorians a large safe prime, Diffie-Hellman cryptography would have been possible, though laborious. Someone could write a steampunk novel about a time traveler giving the pre-computerized world a big safe prime and teaching them Diffie-Hellman cryptography.


[1] 2126 − 1 = 3³ × 7² × 19 × 43 × 73 × 127 × 337 × 5419 × 92737 × 649657 × 77158673929

See the next post for a theorem that would allow you to find this factorization by hand.

Primes, weeds, and military precision

Here’s a quote from Don Zagier that I found in Larry Rolen’s lecture notes on modular forms.

There are two facts about the distribution of prime numbers of which I hope to convince you so overwhelmingly that they will be permanently engraved in your hearts. The first is that, despite their simple definition and role as the building blocks of the natural numbers, the prime numbers grow like weeds among the natural numbers, seeming to obey no other law than that of chance, and nobody can predict where the next one will sprout. The second fact is even more astonishing, for it states just the opposite: that the prime numbers exhibit stunning regularity, that there are laws governing their behavior, and that they obey these laws with almost military precision.

Emphasis added.

Continued fractions as matrix products

A continued fraction of the form

\cfrac{a_1}{b_1 + \cfrac{a_2}{b_2 + \cfrac{a_3}{b_3 + \ddots}}}

with n terms can be written as the composition

f_1 \circ f_2 \circ f_3 \circ \cdots \circ f_n


f_i(z) = \frac{a_1}{b_i + z}

As discussed in the previous post, a Möbius transformation can be associated with a matrix. And the composition of Möbius transformations is associated with the product of corresponding matrices. So the continued fraction at the top of the post is associated with the following product of matrices.

\begin{pmatrix} 0 & a_1 \\ 1 & b_1\end{pmatrix} \begin{pmatrix} 0 & a_2 \\ 1 & b_2\end{pmatrix} \begin{pmatrix} 0 & a_3 \\ 1 & b_3\end{pmatrix} \cdots \begin{pmatrix} 0 & a_n \\ 1 & b_n\end{pmatrix}

The previous post makes precise the terms “associated with” above: Möbius transformations on the complex plane ℂ correspond to linear transformations on the projective plane P(ℂ). This allows us to include ∞ in the domain and range without resorting to hand waving.

Matrix products are easier to understand than continued fractions, and so moving to the matrix product representation makes it easier to prove theorems.

Related posts

Fractional linear and linear

A function of the form

g(z) = \frac{az + b}{cz + d}

where adbc ≠ 0 is sometimes called a fractional linear transformation or a bilinear transformation. I usually use the name Möbius transformation.

In what sense are Möbius transformations linear transformations? They’re nonlinear functions unless b = c = 0. And yet they’re analogous to linear transformations. For starters, the condition adbc ≠ 0 appears to be saying that a determinant is non-zero, i.e. that a matrix is non-singular.

The transformation g is closely associated with the matrix

\begin{pmatrix} a & b \\ c & d \end{pmatrix}

but there’s more going on than a set of analogies. The reason is that Möbius transformation are linear transformations, but not on the complex numbers ℂ.

When you’re working with Möbius transformations, you soon want to introduce ∞. Things get complicated if you don’t. Once you add ∞ theorems become much easier to state, and yet there’s a nagging feeling that you may be doing something wrong by informally introducing ∞. This feeling is justified because tossing around ∞ without being careful can lead to wrong conclusions.

So how can we rigorously deal with ∞? We could move from numbers (real or complex) to pairs of numbers, as with fractions. We replace the complex number z with the equivalence class of all pairs of complex numbers whose ratio is z. The advantage of this approach is that you get to add one special number, the equivalence class of all pairs whose second number is 0, i.e. fractions with zero in the denominator. This new number system is called P(ℂ), where “P” stands for “projective.”

Möbius transformations are projective linear transformations. They’re linear on P(ℂ), though not on ℂ.

When we multiply the matrix above by the column vector (z 1)T we get

\begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} z \\ 1 \end{pmatrix} = \begin{pmatrix} az + b \\ cz + d \end{pmatrix}

and since our vectors are essentially fractions, the right hand side corresponds to g(z) if the second component of the vector, cz + d, is not zero.

If cz + d = 0, that’s OK. Everything is fine while we’re working in P(ℂ), but we get an element of P(ℂ) that does not correspond to an element of ℂ, i.e. we get ∞.

We’ve added ∞ to the domain and range of our Möbius transformations without any handwaving. We’re just doing linear algebra on finite complex numbers.

There’s a little bit of fine print. In P(ℂ) we can’t allow both components of a pair to be 0, and non-zero multiples of the same vector are equivalent, so we’re not quite doing linear algebra. Strictly speaking a Möbius transformation is a projective linear transformation, not a linear transformation.

It takes a while to warm up to the idea of moving from complex numbers to equivalence classes of pairs of complex numbers. The latter seems unnecessarily complicated. And it often is unnecessary. In practice, you can work in P(ℂ) by thinking in terms of ℂ until you need to have to think about ∞. Then you go back to thinking in terms of P(ℂ). You can think of P(ℂ) as ℂ with a safety net for working rigorously with ∞.

Textbooks usually introduce higher dimensional projective spaces before speaking later, if ever, of one-dimensional projective space. (Standard notation would write P¹(ℂ) rather than P(ℂ) everywhere above.) But one-dimensional projective space is easier to understand by analogy to fractions, i.e. fractions whose denominator is allowed to be zero, provided the numerator is not also zero.

I first saw projective coordinates as an unmotivated definition. “Good morning everyone. We define Pn(ℝ) to be the set of equivalence classes of ℝn+1 where ….” There had to be some payoff for this added complexity, but we were expected to delay the gratification of knowing what that payoff was. It would have been helpful if someone had said “The extra coordinate is there to let us handle points at infinity consistently. These points are not special at all if you present them this way.” It’s possible someone did say that, but I wasn’t ready to hear it at the time.

Related posts

Geometric mean on unit circle

Warm up

The geometric mean of two numbers is the square root of their product. For example, the geometric mean of 9 and 25 is 15.

More generally, the geometric mean of a set of n numbers is the nth root of their product.

Alternatively, the geometric mean of a set of n numbers the exponential of their average logarithm.

\left(\prod_{i=1}^n x_i\right)^{1/n} = \exp\left(\frac{1}{n} \sum_{i=1}^n \log x_i\right)

The advantage of the alternative definition is that it extends to integrals. The geometric mean of a function over a set is the exponential of the average value of its logarithm. And the average of a function over a set is its integral over that set divided by the measure of the set.

Mahler measure

The Mahler measure of a polynomial is the geometric mean over the unit circle of the absolute value of the polynomial.

M(p) = \exp\left( \int_0^1 \log \left|p(e^{2\pi i \theta})\right| \, d\theta\right)

The Mahler measure equals the product of the absolute values of the leading coefficient and roots outside the unit circle. That is, if

p(z) = a \prod_{i=1}^n(z - a_i)


M(p) = |a| \prod_{i=1}^n\max(1, |a_i|)


Let p(z) = 7(z − 2)(z − 3)(z + 1/2). Based on the leading coefficient and the roots, we would expect M(p) to be 42. The following Mathematica code shows this is indeed true by returning 42.

    z = Exp[2 Pi I theta]
    Exp[Integrate[Log[7 (z - 2) (z - 3) (z + 1/2)], {theta, 0, 1}]]

Multiplication and heights

Mahler measure is multiplicative: for any two polynomials p and q, the measure of their product is the product of their measures.

M(pq) = M(p)\,M(q)

A few days ago I wrote about height functions for rational numbers. Mahler measure is a height function for polynomials, and there are theorems bounding Mahler measure by other height functions, such as the sum or maximum of the absolute values of the coefficients.

Related posts