# Recognizing three-digit primes

If a three-digit number looks like it might be prime, there’s about a 2 in 3 chance that it is.

To be more precise about what it means for a number to “look like a prime,” let’s say that a number is obviously composite if it is divisible by 2, 3, 5, or 11. Then the following Python code quantifies the claim above.

```    from sympy import gcd, isprime

obviously_composite = lambda n: gcd(n, 2*3*5*11) > 1

primes = 0
nonobvious = 0

for n in range(100, 1000):
if not obviously_composite(n):
nonobvious += 1
if isprime(n):
primes += 1
print(primes, nonobvious)
```

This shows that out of 218 numbers that are not obviously composite, 143 are prime.

This is a fairly conservative estimate. It doesn’t consider 707 an obvious composite, for example, even though it’s pretty clear that 707 is divisible by 7. And if you recognize squares like 169, you can add a few more numbers to your list of obviously composite numbers.

# Overpowered proof that π is transcendental

There is no polynomial with rational coefficients that evaluates to 0 at π. That is, π is a transcendental number, not an algebraic number. This post will prove this fact as a corollary of a more advanced theorem. There are proof that are more elementary and direct, but the proof given here is elegant.

A complex number z is said to be algebraic if it is the root of a polynomial with rational coefficients. The set of all algebraic numbers forms a field F.

The Lindemann-Weierstrass theorem says that if

α1, α2, …, αn

is a set of distinct algebraic numbers, then their exponentials

exp(α1), exp(α2), …, exp(αn)

are linearly independent. That is, no linear combination of these numbers with rational coefficients is equal to 0 unless all the coefficients are 0.

Assume π is algebraic. Then πi would be algebraic, because i is algebraic and the product of algebraic numbers is algebraic.

Certainly 0 is algebraic, and so the Lindemann-Weierstrass theorem would say that exp(πi) and exp(0) are linearly independent. But these two numbers are not independent because

exp(πi) + exp(0) = -1 + 1 = 0.

So we have a proof by contradiction that π is not algebraic, i.e. π is transcendental.

I found this proof in Excursions in Number Theory, Algebra, and Analysis by Kenneth Ireland and Al Cuoco.

# Piranhas and prime factors

The piranha problem says an event cannot be highly correlated with a large number of independent predictors. If you have a lot of strong predictors, they must predict each other, analogous to having too many piranhas in a small body of water: they start to eat each other.

The piranha problem is subtle. It can sound obvious or mysterious, depending on how you state it. You can find several precise formulations of the piranha problem here.

## Prime piranhas

An analog of the piranha problem in number theory is easier to grasp. A number N cannot have two prime factors both larger than its square root, nor can it have three prime factors all larger than its cube root. This observation is simple, obvious, and useful.

For example, if N is a three-digit number, then the smallest prime factor of N cannot be larger than 31 unless N is prime. And if N has three prime factors, at least one of these must be less than 10, which means it must be 2, 3, 5, or 7.

There are various tricks for testing divisibility by small primes. The tricks for testing divisibility by 2, 3, and 5 are well known. Tricks for testing divisibility by 7, 11, and 13 are moderately well known. Tests for divisibility by larger primes are more arcane.

Our piranha-like observation about prime factors implies that if you know ways to test divisibility by primes less than p, then you can factor all numbers up to p² and most numbers up to p³. The latter part of this statement is fuzzy, and so we’ll explore it a little further.

## How much is “most”?

For a given prime p, what proportion of numbers less than p³ have two factors larger than p? We can find out with the following Python code.

```    from sympy import factorint

def density(p, N = None):
if not N:
N = p**3
count = 0
for n in range(1, N):
factors = factorint(n).keys()
if len([k for k in factors if k > p]) == 2:
count += 1
return count/N
```

The code is a little more general than necessary because in a moment we will like to consider a range that doesn’t necessarily end at p³.

Let’s plot the function above for the primes less than 100. Short answer: “most” means roughly 90% for primes between 20 and 100.

The results are very similar if we pass in a value of N greater than p³. About 9% of numbers less than 1,000 have two prime factors greater than 10, and about 12% of numbers less than 1,000,000 have two prime factors greater than 100.

# Density of safe primes

Sean Connolly asked in a comment yesterday about the density of safe primes. Safe primes are so named because Diffie-Hellman encryption systems based on such primes are safe from a particular kind of attack. More on that here.

If q and p = 2q + 1 are both prime, q is called a Sophie Germain prime and p is a safe prime. We could phrase Sean’s question in terms of Sophie Germain primes because every safe prime corresponds to a Sophie Germain prime.

It is unknown whether there are infinitely many Sophie Germain primes, so conceivably there are only a finite number of safe primes. But the number of Sophie Germain primes less than N is conjectured to be approximately

1.32 N / (log N)².

See details here.

Sean asks specifically about the density of safe primes with 19,000 digits. The density of Sophie Germain primes with 19,000 digits or less is conjectured to be about

1.32/(log 1019000)² = 1.32/(19000 log 10)² = 6.9 × 10-10.

So the chances that a 19,000-digit number is a safe prime are on the order of one in a billion.

# Famous constants and the Gumbel distribution

The Gumbel distribution, named after Emil Julius Gumbel (1891–1966), is important in statistics, particularly in studying the maximum of random variables. It comes up in machine learning in the so-called Gumbel-max trick. It also comes up in other applications such as in number theory.

For this post, I wanted to point out how a couple famous constants are related to the Gumbel distribution.

## Gumbel distribution

The standard Gumbel distribution is most easily described by its cumulative distribution function

F(x) = exp( −exp(−x) ).

You can introduce a location parameter μ and scale parameter β the usual way, replacing x with (x − μ)/β and dividing by β.

Here’s a plot of the density. ## Euler-Mascheroni constant γ

The Euler-Mascheroni constant γ comes up frequently in applications. Here are five posts where γ has come up.

The constant γ comes up in the context of the Gumbel distribution two ways. First, the mean of the standard Gumbel distribution is γ. Second, the entropy of a standard Gumbel distribution is γ + 1.

## Apéry’s constant ζ(3)

The values of the Riemann zeta function ζ(z) at positive even integers have closed-form expressions given here, but the values at odd integers do not. The value of ζ(3) is known as Apéry’s constant because Roger Apéry proved in 1978 that ζ(3) is irrational.

Like the Euler-Mascheroni constant, Apéry’s constant has come up here multiple times. Some examples:

The connection of the Gumbel distribution to Apéry’s constant is that the skewness of the distribution is

12√6 ζ(3)/π³.

# Prime numbers and Taylor’s law

The previous post commented that although the digits in the decimal representation of π are not random, it is sometimes useful to think of them as random. Similarly, it is often useful to think of prime numbers as being randomly distributed.

If prime numbers were samples from a random variable, it would be natural to look into the mean and variance of that random variable. We can’t just compute the mean of all primes, but we can compute the mean and variance of all primes less than an upper bound x.

Let M(x) be the mean of all primes less than x and let V(x) be the corresponding variance. Then we have the following asymptotic results:

M(x) ~ x / 2

and

V(x) ~ x²/12.

We can investigate how well these limiting results fit for finite x with the following Python code.

```    from sympy import sieve

def stats(x):
s = 0
ss = 0
count = 0
for p in sieve.primerange(x):
s += p
ss += p**2
count += 1
mean = s / count
variance = ss/count - mean**2
return (mean, variance)
```

So, for example, when x = 1,000 we get a mean of 453.14, a little less than the predicted value of 500. We get a variance of 88389.44, a bit more than the predicted value of 83333.33.

When x = 1,000,000 we get closer to values predicted by the limiting formula. We get a mean of 478,361, still less than the prediction of 500,000, but closer. And we get a variance of 85,742,831,604, still larger than the prediction 83,333,333,333, but again closer. (Closer here means the ratios are getting closer to 1; the absolute difference is actually getting larger.)

## Taylor’s law

Taylor’s law is named after ecologist Lionel Taylor (1924–2007) who proposed the law in 1961. Taylor observed that variance and mean are often approximately related by a power law independent of sample size, that is

V(x) ≈ a M(x)b

independent of x.

Taylor’s law is an empirical observation in ecology, but it is a theorem when applied to the distribution of primes. According to the asymptotic results above, we have a = 1/3 and b = 2 in the limit as x goes to infinity. Let’s use the code above to look at the ratio

V(x) / a M(x)b

for increasing values of x.

If we let x = 10k for k = 1, 2, 3, …, 8 we get ratios

0.612, 1.392, 1.291, 1.207, 1.156, 1.124, 1.102, 1.087

which are slowly converging to 1.

## Related posts

Reference: Joel E. Cohen. Statistics of Primes (and Probably Twin Primes) Satisfy Taylor’s Law from Ecology. The American Statistician, Vol. 70, No. 4 (November 2016), pp. 399–404

# The coupon collector problem and π

How far do you have to go down the decimal digits of π until you’ve seen all the digits 0 through 9?

We can print out the first few digits of π and see that there’s no 0 until the 32nd decimal place.

3.14159265358979323846264338327950

It’s easy to verify that the remaining digits occur before the 0, so the answer is 32.

Now suppose we want to look at pairs of digits. How far out do we have to go until we’ve seen all pairs of digits (or base 100 digits if you want to think of it that way)? And what about triples of digits?

We know we’ll need at least 100 pairs, and at least 1000 triples, so this has gotten bigger than we want to do by hand. So here’s a little Python script that will do the work for us.

```    from mpmath import mp

mp.dps = 30_000
s = str(mp.pi)[2:]

for k in [1, 2, 3]:
tuples = [s[i:i+k] for i in range(0, len(s), k)]
d = dict()
i = 0
while len(d) < 10**k:
d[tuples[i]] = 1
i += 1
print(i)
```

The output:

```    32
396
6076
```

This confirms that we at the 32nd decimal place we will have seen all 10 possible digits. It says we need 396 pairs of digits before we see all 100 possible digit pairs, and we’ll need 6076 triples before we’ve seen all possible triples.

We could have used the asymptotic solution to the “coupon collector problem” to approximately predict the results above.

Suppose you have an urn with n uniquely labeled balls. You randomly select one ball at a time, return the ball to the run, and select randomly again. The coupon collector problem ask how many draws you’ll have to make before you’ve selected each ball at least once.

The expected value for the number of draws is

n Hn

where Hn is the nth harmonic number. For large n this is approximately equal to

n(log n + γ)

where γ is the Euler-Mascheroni constant. (More on the gamma constant here.)

Now assume the digits of π are random. Of course they’re not random, but random is as random does. We can get useful estimates by making the modeling assumption that the digits behave like a random sequence.

The solution to the coupon collector problem says we’d expect, on average, to sample 28 digits before we see each digit, 518 pairs before we see each pair, and 7485 triples before we see each triple. “On average” doesn’t mean much since there’s only one π, but you could interpret this as saying what you’d expect if you repeatedly chose real numbers at random and looked at their digits, assuming the normal number conjecture.

The variance on the number of draws needed is asymptotically π² n²/6, so the number of draws with usually be an interval of the expected value ± 2n.

If you want the details of the coupon collector problem, not just the expected value but the probabilities for different number of draws, see Sampling with replacement until you’ve seen everything.

# Numbering minor league baseball teams Last week I wrote about how to number MLB teams so that the number n told you where they are in the league hierarchy:

• n % 2 tells you the league, American or National
• n % 3 tells you the division: East, Central, or West
• n % 5 is unique within a league/division combination.

Here n % m denotes n mod m, the remainder when n is divided by m.

This post will do something similar for minor league teams.

There are four minor league teams associated with each major league team. If we wanted to number them analogously, we’d need to do something a little different because we cannot specify n % 2 and n % 4 independently. We’d need an approach that is a hybrid of what we did for the NFL and MLB.

We could specify the league and the rank within the minor leagues by three bits: one bit for National or American league, and two bits for the rank:

• 00 for A
• 01 for High A
• 10 for AA
• 11 for AAA

It will be convenient later on if we make the ranks the most significant bits and the league the least significant bit.

So to place a minor league team on a list, we could write down the numbers 1 through 120, and for each n, calculate r = n % 8, d = n % 3, and k = n % 5.

The latest episode of 99% Invisible is called RoboUmp, a show about automating umpire calls. As part of the story, the show discusses the whimsical names of minor league teams and how the names allude to their location. For example, the El Paso Chihuahuas are located across the border from the Mexican state of Chihuahua and their mascot is a chihuahua dog. (The dog was named after the state.)

The El Paso Chihuahuas are the AAA team associated with the San Diego Padres, a team in the National League West, team #3 in the order listed in the MLB post. The number n for the Chihuahuas must equal 7 mod 8, 111two, the first bit for National League and the last two bits for AAA. We also require n to be 2 mod 3 because it’s in the West, and n = 3 mod 5 because the Padres are #3 in the list of National League West teams in our numbering. It works out that n = 23.

How do minor league and major league numbers relate? They have to be congruent mod 30. They have to have the same parity since they represent the same league, and must be congruent mod 3 because they have in the same division. And they must be congruent mod 5 to be in the same place in the list of associated major league teams.

So to calculate a minor league team’s number, start with the corresponding major league number, and add multiples of 30 until you get the right value mod 8.

For example, the Houston Astros are number 20 in the list from the earlier post. The Triple-A team associated with the Astros is the Sugar Land Space Cowboys. The number n for the Space Cowboys must be 6 mod 8 because 6 = 110two, and they’re a Triple-A team (11) in the American League (0). So n = 110.

The Astros’ Double-A team, the Corpus Christi Hooks, needs to have a number equal to 100two = 4 mod 8, so n = 20. The High-A team, the Asheville Tourists, are 50 and the Single-A team, the Fayetteville Woodpeckers, is 80.

You can determine what major league team is associated with a minor league team by taking the remainder by 30. For example, the Rocket City Trash Pandas has number 77, so they’re associated with the major league team with number 17, which is the Los Angeles Angels. The remainder when 77 is divided by 8 is 5 = 101two, which tells you they’re a Double-A team since the high order bits are 1 and 0.

# John Conway’s mental factoring method and friends

There are tricks for determining whether a number is divisible by various primes, but many of these tricks have to be applied one at a time. You can make a procedure for testing divisibility by any prime p that is easier than having to carry out long division, but these rules are of little use if every one of them is different.

Say I make a rule for testing whether a number is divisible by 59. That’s great, if you routinely need to test divisibility by 59. Maybe you work for a company that, for some bizarre reason, ships widgets in boxes of 59 and you frequently have to test whether numbers are multiples of 59.

When you want to factor numbers, you’d like to test divisibility by a set of primes at once, using fewer separate algorithms, and taking advantage of work you’ve already done.

John Conway came up with his 150 Method to test for divisibility by a sequence of small primes. This article explains how Conway’s 150 method and a couple variations work. The core idea behind Conway’s 150 Method, his 2000 Method, and analogous methods developed by others is this:

1. Find a range of integers, near a round number, that contains a lot of distinct prime factors.
2. Reduce your number modulo the round number, then test for divisibility sequentially, reusing work.

Conway’s 150 Method starts by taking the quotient and remainder by 150. And you’ll never guess what his 2000 Method does. :)

This post will focus on the pattern behind Conway’s method, and similar methods. For examples and practical tips on carrying out the methods, see the paper linked above and a paper I’ll link to below.

## The 150 Method

Conway exploited the fact that the numbers 152 through 156 are divisible by a lot of primes: 2, 3, 5, 7, 11, 13, 17, 19, and 31.

He starts his method with 150 rather than 152 because 150 is a round number and easier to work with. We start by taking the quotient and remainder by 150.

Say n = 150q + r. Then n – 152q = r – 2q. If n has three or four digits, q only has one or two digits, and so subtracting q is relatively easy.

Since 19 divides 152, we can test whether n is divisible by 19 by testing whether r – 2q is divisible by 19.

The next step is where sequential testing saves effort. Next we want to subtract off a multiple of 153 to test for divisibility by 17, because 17 divides 153. But we don’t have to start over. We can reuse our work from the previous step.

We want n – 153q = (n – 152q) – q, and we’ve already calculated n – 152q in the previous step, so we only need to subtract q.

The next step is to find n – 154q, and that equals (n – 153q) – q, so again we subtract q from the result of the previous step. We repeat this process, subtracting q each time, and testing for divisibility by a new set of primes each time.

## The 2000 method

Conway’s more extensive method exploited the fact that the numbers 1998 through 2021 are divisible by all primes up to 67. So he would start by taking the quotient and remainder by 2000, which is really easy to do.

Say n = 2000q + r. Then we would add (or subtract) q each time.

You could start with r, then test r for divisibility by the factors of 2000, then test rq for divisibility by the factors of 2001, then test r – 2q for divisibility by the factors of 2002, and so on up to testing r – 21q for divisibility by the factors of 2021. Then you’d need to go back and test r + q for divisibility by the factors of 1999 and test r + 2q for divisibility by the factors of 1998.

In principle that’s how Conways 2000 Method works. In practice, he did something more clever.

Most of the prime factors of the numbers 1998 through 2021 are prime factors of 1998 through 2002, so it makes sense to test this smaller range first hoping for early wins. Also, there’s no need to test divisibility by the factors of 1999 because 1999 is prime.

Conway tested rkq for k = -2 through 21, but not sequentially. He would try out the values of k in an order most likely to terminate the factoring process early.

## The 10,000 method

This paper gives a much more extensive approach to mental factoring than Conway’s 150 method. The authors, Hilarie Orman and Richard Schroeppel, outline a strategy for factoring any six-digit number. Conway’s rule is more modest, intended for three and four digit numbers.

Orman and Schroeppel suggest a sequence of factoring methods, including more advanced techniques to use after you’ve tried testing for divisibility by small primes. One of the techniques in the paper might be called the 10,000 Method by analogy to Conway’s method, though the authors don’t call it that. They call it “check the m‘s” for reasons that make more sense if you read the paper.

The 10,000 Method is much like the 2000 Method. The numbers 10,001 through 10,019 have a lot of prime factors, and the method tests for divisibility by these factors sequentially, taking advantage of previous work at each step, just as Conway’s methods do. The authors do not backtrack the way Conway did; they test numbers in order. However, they do skip over some numbers, like Conway skipped over 1999.

# Major League Baseball and number theory

The previous post took a mathematical look at the National Football League. This post will do the same for Major League Baseball.

Like the NFL, MLB teams are organized into a nice tree structure, though the MLB tree is a little more complicated. There are 32 NFL teams organized into a complete binary tree, with a couple levels collapsed. There are 30 MLB teams, so the tree structure has to be a bit different.

MLB has leagues rather than conferences, but the top-level division is into American and Nation as with the NFL. So the top division is into the American League and the National League.

And as with football, the next level of the hierarchy is divisions. But baseball has three divisions—East, Central, and West—in contrast to four in football.

Each division has five baseball teams, while each football division has four teams.

Here’s the basic tree structure. Under each division are five teams. Here’s a PDF with the full graph including teams.

## Geography

How do the division names correspond to actual geography?

Within each league, the Central teams are to the west of the East teams and to the east of the West teams, with one exception: in the National League, the Pittsburgh Pirates are a Central division team, but they are east of the Atlanta Braves and Miami Marlins in the East division. But essentially the East, Central, and West divisions do correspond to geographic east, center, and west, within a league.

## Numbering

We can’t number baseball teams as elegantly as the previous post numbered football teams. We’d need a mixed-base number. The leading digit would be binary, the next digit base 3, and the final digit base 5.

We could number the teams so that you could tell the league and division of the team by looking at the remainders when the number is divided by 2 and 3, and each team is unique mod 5. By the Chinese Remainder Theorem, we can solve the system of congruence equations mod 30 that specify the value of a number mod 2, mod 3, and mod 5.

If we number the teams as follows, then even numbered teams are in the American League and odd numbered teams are in the National League. When the numbers are divided by 3, those with remainder 0 are in an Eastern division, those with remainder 1 are in a Central division, and those with remainder 2 are in a Western division. Teams within the same league and division have unique remainders by 5.

1. Cincinnati Reds
2. Oakland Athletics
4. Minnesota Twins
5. Arizona Diamondbacks
6. Boston Red Sox
7. Milwaukee Brewers
8. Seattle Mariners
9. Washington Nationals
10. Chicago Whitesocks
12. New York Yankees
13. Pittsburgh Pirates
14. Texas Rangers
15. Atlanta Braves
16. Cleveland Guardians
17. Los Angeles Dodgers
18. Tampa Bay Rays
19. St. Louis Cardinals
20. Houston Astros
21. Miami Marlins
22. Detroit Tigers