# Consecutive coupon collector problem

## Coupon collector problem

Suppose you have a bag of balls labeled 1 through 1,000. You draw draw balls one at a time and put them back after each draw. How many draws would you have to make before you’ve seen every ball at least once?

This is the coupon collector problem with N = 1000, and the expected number of draws is

N HN

where

HN = 1 + 1/2 + 1/3 + … + 1/N

is the Nth harmonic number.

As N increases, HN approaches log(N) + γ where γ = 0.577… is the Euler-Mascheroni constant, and so the expected time for the coupon collector problem is approximately

N (log(N) + γ).

## Consecutive draws

Now suppose that instead of drawing single items, you draw blocks of consecutive items. For example, suppose the 1,000 balls are arranged in a circle. You pick a random starting point on the circle, then scoop up 10 consecutive balls, then put them back. Now how long would it take to see everything?

By choosing consecutive balls, you make it harder for a single ball to be a hold out. Filling in the holes becomes easier.

## Bucketed problem

Now suppose the 1,000 balls are placed in 100 buckets and the buckets are arranged in a circle. Now instead of choosing 10 consecutive balls, you choose a bucket of 10 balls. Now you have a new coupon collector problem with N = 100.

This is like the problem above, except you are constraining your starting point to be a multiple of n.

## Upper and lower bounds

I’ll use the word “scoop” to mean a selection of n balls at a time to avoid possible confusion over drawing individual balls or groups of balls.

If you scoop n balls at a time by making n independent draws, then you just have the original coupon collector problem, with the expected time divided by n.

If you scoop up n consecutively numbered balls each time, you reduce the expected time to see everything at least once. But your scoops can still overlap. For example, maybe you selected 13 through 22 on one draw, and 19 through 38 on the next.

In the bucketed problem, you reduce the expected time even further. Now your scoops will not partially overlap. (But they may entirely overlap, so it’s not clear that this reduces the total time.)

It would seem that we have sandwiched our problem between two other problems we have the solution to. The longest expected time would be if our scoop is made of n independent draws. Then the expected number of scoops is

N HN / n.

The shortest time is the bucketed problem in which the expected number of scoops is

(N/n) H(N/n).

It seems the problem of scooping n consecutive balls, with no constraint on the starting point, would have expected time somewhere between these two bounds. I say “it seems” because I haven’t proven anything here, just given plausibility arguments.

By the way, we can see how much bucketing reduces the expected time by using the log approximation above. With n independent draws each time, the expected number of scoops is roughly

(N/n) log(N)

whereas with the bucketed problem the expected number of scoops is roughly

(N/n) log(N/n).

## Expected number of scoops

I searched a bit on this topic, and I found many problems with titles like “A variation on the coupon collector problem,” but none of the papers I found considered the variation I’ve written about here. If you work out the expected number of scoops, or find a paper where someone has worked this out, please let me know.

The continuous analog seems like an easier problem, and one that would provide a good approximation. Suppose you have a circle of circumference N and randomly place arcs of length n on the circle. What is the expected time until the circle is covered? I imagine this problem has been worked out many times and may even have a name.

## Simulation results

When N = 1000 and n = 10, the upper and lower bounds work out to 748 and 518.

When I simulated the consecutive coupon collector problem I got an average of 675 scoops, a little more than the average of the upper and lower bounds.

# Regular solids and Monte Carlo integration

Monte Carlo integration is not as simple in practice as it is often introduced. A homework problem might as you to integrate a function of two variables by selecting random points from a cube and counting how many of the points fall below the graph of the function. This would indeed give you an estimate of the volume bounded by the surface and hence the value of the integral.

But Monte Carlo integration is most often used in high dimensions, and the region of interest takes up a tiny proportion of a bounding box. In practice you’d rarely sample uniformly from a high-dimensional box. This post will look at sampling points on a (possibly high-dimensional) sphere.

The rate of convergence of Monte Carlo integration depends on the variance of the samples, and so people look for ways to reduce variance. Antipodal sampling is one such approach. The idea is that a function on a sphere is likely to take on larger values on one side of the sphere and smaller values on the other. So for every point x where the function is sampled, it is also sampled at the diametrically opposite point −x on the assumption/hope that the values of the function at the two points are negatively correlated.

Antipodal sampling is a first step in the direction of a hybrid of regular and random sampling, sampling by random choices of regularly spaced points, such as antipodal points. When this works well, you get a sort of synergy, an integration method that converges faster than either purely systematic or purely random sampling.

If a little is good, then more is better, right? Not necessarily, but maybe, so it’s worth exploring. If I remember correctly, Alan Genz explored this. Instead of just taking antipodal points, you could sample at the points of a regular solid, like a tetrahedron. Randomly select and initial point, create a tetrahedron on the sphere with this as one of the vertices, and sample your integrand at each of the vertices. Or you could think of having a tetrahedron fixed in space and randomly rotating the sphere so that the sphere remains in contact with the vertices of the tetrahedron.

If you’re going to sample at the vertices of a regular solid, you’d like to know what regular solids are possible. In three dimensions, there are five: tetrahedron, hexahedron (cube), octahedron, dodecahedon, and icosahedron. Only the first three of these generalize to dimensions 5 and higher, so you only have three choices in high dimension if you want to sample at the vertices of a regular solid.

Here’s more about the cross polytope, the generalization of the octahedron.

If you want more regularly-spaced points on a sphere than regular solids will permit, you could compromise and use points whose spacing is approximately regular, such as the Fibonacci lattice. You could randomly rotate your Fibonacci lattice to create a randomized quasi-Monte Carlo (RQMC) method.

You have a knob you can turn determining the amount of regularity and the amount of randomness. At one extreme is purely random sampling. As you turn the knob you go from antipodes to tetrahedra and up to cross polytopes. Then there’s a warning line, but you can keep going with things like the Fibonacci lattice, introducing some distortion, sorta like turning the volume of a speaker up past the point of distortion.

In my experience, I’ve gotten the best results near the random sampling end of the spectrum. Antipodal sampling sped up convergence, but other methods not so much. But your mileage may vary; the results depend on the kind of function you’re integrating.

# Cross-platform way to enter Unicode characters

The previous post describes the hoops I jumped through to enter Unicode characters on a Mac. Here’s a script to run from the command line that will copy Unicode characters to the system clipboard. It runs anywhere the Python module `pyperclip` runs.

```    #!/usr/bin/env python3

import sys
import pyperclip

cp = sys.argv
ch = eval(f"chr(0x{cp})")
print(ch)
pyperclip.copy(ch)
```

I called this script `U` so I could type

`    U 03c0`

at the command line, for example, it would print π to the command line and also copy it to the clipboard.

Unlike the MacOS solution in the previous post, this works for any Unicode value, i.e. for code points above FFFF.

On my Linux box I had to install `xclip` before `pyperclip` would work.

# Using Unicode on MacOS

Setting up Unicode on my MacBook took some research, so I’m leaving myself a note here if I need to do it again. Maybe it’ll help someone else too.

Update: I’ve gotten some feedback on this article that suggest people imagine that I want to use this approach to enter large quantities of text, such as typing Cyrillic text one Unicode code point at a time. This is not the case. If I wanted to type Cyrillic text, I’d use a (virtual) Cyrillic keyboard. The use case I have in mind is typing symbols or maybe a single word from a foreign language.

From the System Settings dialog, go to Keyboard and click the Edit button next to Input Sources. Click on the + sign in the lower left corner to add a keyboard. Scroll down to the bottom and click on Other to add Unicode as a keyboard. Now you can toggle between your previous keyboard(s) and Unicode Hex Input. When the latter is active, you can hold down the globe key and type the hex value of a Unicode character to enter that character. For example, you can hold down the globe key and type 03C0 to enter a π symbol. This only works for Unicode characters that can be written with four hexadecimal characters, i.e. up to FFFF, code points in the Basic Multilingual Plane.

## Remapping keys

I’ve remapped my keys so that my muscle memory works across operating systems, so I have the functionality of the globe key mapped to the key. So I hold down the key labeled “command” to enter Unicode characters. More on how I remap keys here.

## Fixing Emacs

When I added the Unicode keyboard, C + space quit working in Emacs because the operating system hijacked that key. Removing the Unicode keyboard doesn’t put the system back like it was.

You can use C + @ instead of C + space for `set-mark-command` but this is an awkward keybinding, especially for such a common function. I bound a different key sequence to `set-mark-command` that I find easier to type.

## Command line

See the next post for a command line script that will copy any Unicode character to the clipboard, including code points higher than the solution above can handle.

# Circular coordinate art

About three years ago I ran across a strange coordinate system in which familiar functions lead to interesting plots. The system is called “circular coordinates” but it is not polar coordinates.

This morning I was playing around with this again.

Here’s a plot of f(x) = x. And here’s a plot of f(x) = cos(8x). See this post for details of circular coordinates.

Here is Python code to make the plots. You can experiment with your own plots by changing the definition of f.

```# See Mathematics Magazine, v 52 no 3, p175

from numpy import cos
from numpy import linspace
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8-muted')

def g(u, c, f):
t = f(u) + c

return 2*u*t**2 / (u**2 + t**2)

def h(u, c, f):
t = f(u) + c
return 2*u*u*t / (u**2 + t**2)

t = linspace(-7, 7, 10000)
fig, ax = plt.subplots()
f = lambda x: cos(8*x)
for c in range(-10, 11):
ax.plot(g(t, c, f), h(t, c, f))
plt.axis("off")
plt.show()
```

# When there is only one group of a given size

Today’s date, US style, is 9/26/2023, and there is only one group, up to isomorphism, of size 9262023. You could verify this in Mathematica with the command

`    FiniteGroupCount`

which returns 1.

For a given n, when is there only one group of size n?

There are two requirements. First, n has to be the product of distinct primes, i.e. no prime appears in the factorization with a power greater than 1. Second, no prime divides one less than another prime.

Now

9262023 = 3 × 41 × 257 ×293

and you can check that 3 does not divide 40, 256, or 292, nor does 41 divide 2, 252, or 292, etc.

A more compact way to state the criteria above is to say

gcd(n, φ(n)) = 1

where φ(n) is Euler’s totient function, the number of positive numbers less than n and relatively prime to n.

Why are these criteria equivalent? If

n = pqr

then

φ(n) = (p − 1)(q − 1)(r − 1)…

If n and φ(n) have a nontrivial common factor, it has to be one of the prime factors of n, and none of these divide any term of φ(n).

Source: Dieter Jungnickel. On the Uniqueness of the Cyclic Group of Order n. The American Mathematical Monthly, Vol. 99, No. 6. (Jun. – Jul., 1992), pp. 545–547.

# Analogy between prime numbers and simple groups

Simple groups are the building blocks of groups similar to the way prime numbers are the building blocks of integers. This post will unpack this analogy in two ways:

1. How do simple groups compare to prime numbers?
2. How does the composition of simple groups compare to the composition of prime numbers?

The former analogy is stronger than the latter.

## Primes and simple groups

A simple group has no nontrivial subgroups, just a prime number has no nontrivial factors. Except that’s not quite right. A simple group is defined as having no nontrivial normal subgroups. The previous post compares normal and non-normal subgroups. Normal subgroups have nice properties which are necessary for decomposition and composition. You can’t define quotients for non-normal groups.

Every subgroup of an Abelian group is normal, so in the context of Abelian groups it is true that simple groups have no nontrivial subgroups, i.e. the only subgroups of a simple Abelian group G are the identity and G itself. It follows from Sylow’s theorems that the order of a finite Abelian group with no nontrivial factors must be an integer with no nontrivial factors, i.e. a prime number. Every Abelian finite simple group must be isomoprphic to the integers mod p for some prime p.

Non-Abelian finite simple groups do not have prime order, but they not decomposable in the sense described below.

## Composition and decomposition

Prime numbers compose to form other numbers by products. You can also compose groups by taking products, though you need more than that. It is not the case that all finite groups are products of finite simple groups.

Let ℤn denote the cyclic group of order n and let ⊕ denote direct sum. The group ℤ4 is not isomorphic to ℤ2 ⊕ ℤ2. Even in the case of Abelian groups, not all Abelian groups are the direct sum or direct product of simple groups. 

Finite groups can be decomposed into smaller finite simple groups, but we can’t easily or uniquely rebuild a group from this decomposition.

The Jordan-Hölder theorem says that a finite group G has a composition series

1 = H0H1 ⊲ … ⊲ Hn = G

where each H is a maximal normal subgroup of the next, the quotients Hi+1 / Hi of consecutive are simple groups. The composition series is not unique, but all such series are equivalent in a sense that the Jordan-Hölder theorem makes precise.

It seems to me that the composition series ought to be called a decomposition series in that you can start with G and find the H‘s, but it’s a difficult problem, known as “the extension problem,” to reconstruct G from the H‘s, and in general there are multiple solutions.

The analogy to prime numbers would be if there was an essentially unique way to factor a number, but not a unique way to multiply the factors back together.

## Reductionism

Some people thought that the classification of finite simple groups would be the end group theory. That has not been the case. Some also thought sequencing of the human genome would lead to cures for a huge range of diseases. That has not been the case either. Reductionism often produces disappointing results.

## Related posts

 In the context of Abelian groups, (direct) products and coproducts (i.e. direct sums) are isomorphic.

# Normal and non-normal subgroups

The word “normal” in mathematical nomenclature does not always means “usual” or “customary” as it does in colloquial English. Instead, it might that something has a convenient property. That is the case for normal subgroups.

We can do things with normal subgroups that we cannot do with other subgroups, such as take quotients, and so once normal subgroups are introduced in an algebra class, non-normal subgroups disappear from the discussion. A student might be left with the impression that non-normal subgroups don’t exist.

This post will give a very simple example of a group with a non-normal subgroup, and show how we can’t do operations with this group that we can with normal subgroups.

## Definition of normal subgroup

A normal subgroup of a group G is a subgroup N such gN = Ng for any element g in G. That is, if I multiply everything in N by g on the left or I multiply everything in N by g on the right, I get the same set of elements.

This does not mean that gn = ng for every n in N. It means that for every n in N, there is some element m in N such that gn = mg. The elements n and m might be the same, but they might not.

In an Abelian (commutative) group, gn always equals ng, and so all subgroups are normal. But most groups are not Abelian.

## Structure-preserving functions

Along with every algebraic structure there are functions that preserve aspects of that structure. In the category of groups, these structure-preserving functions are called homomorphisms, coming from the Greek words for “same” and “shape.”

A homomorphism between groups gives you a sort of picture of the first group inside the second group in a way that is consistent with the structure of groups. Specifically, if f is a homomorphism from a groups G to a group H, then

f(xy) = f(x) f(y).

Here we denote the group operation by writing two things next to each other. So “xy” means the group operation in G applied to x and y. This operation may or may not be multiplication. The same is true on the right-hand side: f(x) f(y) means the group operation in H applied to f(x) and f(y).

For example, if we let G be the real numbers with the group operation being addition, and we let H be the positive real numbers with the group operation being multiplication, then the exponential function is a homomorphism between these groups:

exp(x + y) = exp(x) exp(y).

## Kernels

The kernel of a homomorphism between G and H is the subset of things in G that get sent to the identity element in H. In the example above, the identity element in H is 1, and the only element in G that gets mapped to 1 is the number 0. It’s not a coincidence that 0 is the identity element in G: homomorphisms always send the identify element of one group to the identity element of the other group. The kernel of a homomorphism always includes the identity, but it may also include more.

Normal subgroups are subgroups that can be kernels. A subgroup of G is normal if and only if there is some homomorphism from G to another group that has that subgroup as its kernel.

## A non-normal subgroup

Let G be the group of permutations on three elements. The group operation is composition, i.e. do one permutation then do the other.

The identity element is the permutation that leaves everything alone. Denote this by 1. Another element of G is the permutation that swaps the first two elements. We’ll denote this by (12).

So if our three elements are a, b, and c, then the permutation 1 takes (abc) to (abc). And the permutation (12) takes (abc) to (b, ac).

The two elements 1 and (12) form a subgroup of G. But we will show that a homomorphism from G to a group H cannot take these two elements, and only these two elements, to the identity on H. It won’t matter what group H is, but we’ll need a name for the identity element in H. Let’s call it e.

Let f be a homomorphism from G to H. () By definition

f(xy) = f(x) f(y)

and by applying this to (xy)z we have

f(xyz) = f(x) f(y) f(z)

If we let z be the inverse of x and y is in the kernel of f, we have

f(xyx-1) = f(x) f(y) f(x-1) = f(x) e f(x)-1 = e.

This says that if y is in the kernel of f, xyx-1 must also be in the kernel of f for every x.

The permutation that swaps the second and third elements, (23), is its own inverse: swap the second and third elements twice and you’re back where you started. So if (12) is in the kernel of f then so is (23)(12)(23). You can work out that (23)(12)(23) reverses three elements. This shows that if the subgroup consisting of 1 and (12) is in the kernel of f, so is the reverse permutation, which is not part of our subgroup. So our subgroup alone cannot be a kernel of a homomorphism.

# Mersenne primes are unsafe

In the previous post I mentioned that a particular Mersenne prime would be unsuitable for cryptography. In fact, all Mersenne primes are unsuitable for cryptography.

A prime number p is called “safe” if

p = 2q + 1

where q is also a prime. Safe primes are called safe because p − 1 does not have small factors (other than 2). The factors of p − 1 correspond to subgroups of the group used for encryption, and small groups can be exploited to attack encryption.

Mersenne numbers are numbers of the form

Mn = 2n − 1.

Mersenne primes are Mersenne numbers that are also prime. A necessary condition for Mn to be prime is for n to be prime. This condition is not sufficient. For example,

211 − 1 = 23 × 89.

But is necessary, for reasons we’ll get into shortly.

If Mn = 2q + 1, then q = Mn−1. But if n is a prime, then n − 1 is not a prime, with one exception: n = 3. So the only Mersenne prime that is a safe prime is M3 = 7, which is not a particularly large prime. Public key cryptography uses numbers in the thousands of digits, not single digits.

Why does n have to be prime before Mn stands a chance of being prime?

If a > 1, then xa − 1 can be factored:

xa − 1 = (x − 1)(xa−1 + xa−2 + … + 1)

If n can be factored into ab, then set x = 2b. This shows that 2ab − 1 has a factor, namely 2b − 1.

In the previous post we said that M127 − 1 has a lot of small factors. We can find some of those factors easily:

M127 − 1 = 2 M126 = 2 (2126 − 1)

and (2126 − 1) is divisible by 2k – 1 for every k that divides 126.

The nontrivial factors of 126 are 2, 3, 6, 7, 9, 14, 18, 21, 42, 63, and so 2k – 1 is a factor of M126 for k equal to each of these numbers. This is enough to fully factor 2126 − 1 into

3³ × 7² × 19 × 43 × 73 × 127 × 337 × 5419 × 92737 × 649657 × 77158673929

given in the footnote from the previous post. You could easily come up with this factorization using pencil and paper, though it would not be easy to determine by hand that the last factor is a prime number.

# Victorian public key cryptography

Electronic computers were invented before public key cryptography. Would public key cryptography have been possible before computers?

The security of RSA encryption depends on the ratio of the difficulty of factoring relative to the difficulty of multiplication. This ratio was high, maybe higher, before modern computers.

Suppose the idea of RSA encryption had occurred to Lewis Carroll (1832–1898). What key size would have been necessary for security in his day? Would it have been practical to manually encrypt data using keys of that size?

I imagine if you handed Victorians a pair of public and private RSA keys, they could have manually carried out public key encryption. But coming up with private keys, i.e. finding pairs of large prime numbers, would be harder than carrying out the encryption process.

The largest prime discovered without a computer was 2127 − 1, proved prime by Edouard Lucas in 1876. Such primes would have been large enough—I doubt it was feasible to factor the product of 40-digit primes at the time—but this was a prime of a very special form, a Mersenne prime. Lucas had developed an algorithm for efficiently testing whether a Mersenne number is prime. To this day the largest known primes are Mersenne primes. More on this here.

Lucas would not have been able to produce two 40-digit primes. The largest known prime in 1851 had 12 digits:

999,999,000,001

Because of the special form of this number, it would seem that even coming up with 12-digit primes was quite an achievement. Euler (1707–1783) had found a 10-digit prime, but it was also a Mersenne prime. Large primes without special structure were unknown.

Perhaps if Lewis Carroll had found a couple moderately large primes, he might have presented them to his queen to be used in public key cryptography. Their product could be published in newspapers, but the factors would be state secrets. Anyone could send Queen Victoria encrypted messages via public communication.

Diffie-Hellman public key encryption might have been more practical. It only requires one large prime, and that prime can be made public. Everyone can share it.

The prime p that Lucas discovered would do, until people realized that p − 1 has a lot of small factors , which could be used to break Diffie-Hellman cryptography. I don’t know that any large safe primes were known until more recently.

If someone from the future had given the Victorians a large safe prime, Diffie-Hellman cryptography would have been possible, though laborious. Someone could write a steampunk novel about a time traveler giving the pre-computerized world a big safe prime and teaching them Diffie-Hellman cryptography.

***

 2126 − 1 = 3³ × 7² × 19 × 43 × 73 × 127 × 337 × 5419 × 92737 × 649657 × 77158673929

See the next post for a theorem that would allow you to find this factorization by hand.