Refinements to the prime number theorem

Let π(x) be the number of primes less than x. The simplest form of the prime number theorem says that π(x) is asymptotically equal to x/log(x), where log means natural logarithm. That is,

\lim_{x\to\infty} \frac{\pi(x)}{x/\log(x)} = 1

This means that in the limit as x goes to infinity, the relative error in approximating π(x) with x/log(x) goes to 0. However, there is room for improvement. The relative approximation error goes to 0 faster if we replace x/log(x) with li(x) where

\text{li}(x) = \int_0^x \frac{dt}{\log t}

The prime number theorem says that for large x, the error in approximating π(x) by li(x) is small relative to π(x) itself. It would appear that li(x) is not only an approximation for π(x), but it is also an upper bound. That is, it seems that li(x) > π(x). However, that’s not true for all x.

Littlewood proved in 1914 that there is some x for which π(x) > li(x). We still don’t know a specific number x for which this holds, though we know such numbers exist. The smallest such x is the definition of Skewes’ number. The number of digits in Skewes’ number is known to be between 20 and 317, and is believed to be close to the latter.

Littlewood not only proved that li(x) – π(x) is sometimes negative, he proved that it changes sign infinitely often. So naturally there is interest in estimating li(x) – π(x) for very large values of x.

A new result was published a few days ago [1] refining previous bounds to prove that

|\pi(x) - \text{li}(x)| \leq 235 x (\log x)^{0.52} \exp(-0.8\sqrt{\log x})

for all x > exp(2000).

When x = exp(2000), the right side is roughly 10857 and π(x) is roughly 10865, and so the relative error is roughly 10-8. That is, the li(x) approximation to π(x) is accurate to 8 significant figures, and the accuracy increases as x gets larger.


[1] Platt and Trudgian. The error term in the prime number theorem. Mathematics of Computation. November 16, 2020.

Minimizing random Boolean expressions

The previous post looked at all Boolean expressions on three or four variables and how much they can be simplified. The number of Boolean expressions on n variables is


and so the possibilities explode as n increases. We could do n = 3 and 4, but 5 would be a lot of work, and 6 is out of the question.

So we do what we always do when a space is too big to explore exhaustively: we explore at random.

The Python module we’ve been using, qm, specifies a function of n Boolean variables in terms of the set of product terms on which the function evaluates to 1. These product terms can be encoded as integers, and so a Boolean function of n variables corresponds to a subset of the integers 0 through 2n – 1.

We can generate a subset of these numbers by generating a random mask consisting of 0s and 1s, and keeping the numbers where the mask value is 1. We could do this with code like the following.

     N= 2**n
     x = np.arange(N)
     mask = np.random.randint(2, size=N)
     ones = set(mask*x)

There’s a small problem with this approach: the set ones always contains 0. We want it to contain 0 if and only if the 0th mask value is a 1.

The following code generates a Boolean expression on n variables, simplifies it, and returns the length of the simplified expression [1].

    def random_sample(n):
        N = 2**n
        x = np.arange(N)
        mask = np.random.randint(2, size=N)
        ones = set(mask*x)
        if mask[0] == 0:
        return len(qm(ones=ones, dc={}))

We can create several random samples and make a histogram with the following code.

    def histogram(n, reps):
        counts = np.zeros(2**n+1, dtype=int)
        for _ in range(reps):
            counts[random_sample(n)] += 1
        return counts

The data in the following graph comes from calling histogram(5, 1000).

data = [0, 0, 0, 0, 4, 44, 145, 339, 296, 128, 38, 5, 0, 1]

Note that the length of the random expressions is distributed symmetrically around 16 (half of 25). So minimization turns a distribution centered around 16 into a distribution centered around 8.

The code is slow because the Quine-McCluskey algorithm is slow, and our Python implementation of the algorithm isn’t as fast as it could be. But Boolean minimization is an NP problem, so no exact algorithm is going to scale well. To get faster results, we could switch to something like the Expresso Heuristic Logic Minimizer, which often gets close to a minimum expression.


[1] The code above will fail if the set of terms where the function is 1 is empty. However this is extremely unlikely: we’d expect it to happen once in every 2^(2^n) times and so when n = 5 this is less than one time in four billion. The fully correct approach would be to call qm with zeros=x when ones is empty.

How much can Boolean expressions be simplified?

In the previous post we looked at how to minimize Boolean expressions using a Python module qm. In this post we’d like to look at how much the minimization process shortens expressions.

Witn n Boolean variables, you can create 2^n terms that are a product of distinct variables. You can specify a Boolean function by specifying the subset of such terms on which it takes the value 1, and so there are 2^(2^n) Boolean functions on n variables. For very small values of n we can can minimize every possible Boolean function.

To do this, we need a way to iterate through the power set (set of all subsets) of the integers up to 2^n. Here’s a function to do that, borrowed from itertools recipes.

    from itertools import chain, combinations
    def powerset(iterable):
        xs = list(iterable)
        return chain.from_iterable(
            combinations(xs, n) for n in range(len(xs) + 1))

Next, we use this code to run all Boolean functions on 3 variables through the minimizer. We use a matrix to keep track of how long the input expressions are and how long the minimized expressions are.

    from numpy import zeros
    from qm import q 

    n = 3
    N = 2**n
    tally = zeros((N,N), dtype=int)
    for p in powerset(range(N)):
        if not p:
            continue # qm can't take an empty set
        i = len(p)
        j = len(qm(ones=p, dc={}))
        tally[i-1, j-1] += 1 

Here’s a table summarizing the results [1].

The first column gives the number of product terms in the input expression and the subsequent columns give the number of product terms in the output expressions.

For example, of the expressions of length 2, there were 12 that could be reduced to expressions of length 1 but the remaining 16 could not be reduced. (There are 28 possible input expressions of length 2 because there are 28 ways to choose 2 items from a set of 8 things.)

There are no nonzero values above the main diagonal, i.e. no expression got longer in the process of minimization. Of course that’s to be expected, but it’s reassuring that nothing went obviously wrong.

We can repeat this exercise for expressions in 4 variables by setting n = 4 in the code above. This gives the following results.

We quickly run into a wall as n increases. Not only does the Quine-McCluskey algorithm take about twice as long every time we add a new variable, the number of possible Boolean functions grows even faster. There were 2^(2^3) = 256 possibilities to explore when n = 3, and 2^(2^4) = 65,536 when n = 4.

If we want to explore all Boolean functions on five variables, we need to look at 2^(2^5) = 4,294,967,296 possibilities. I estimate this would take over a year on my laptop. The qm module could be made more efficient, and in fact someone has done that. But even if you made the code a billion times faster, six variables would still be out of the question.

To explore functions of more variables, we need to switch from exhaustive enumeration to random sampling. I may do that in a future post. (Update: I did.)


[1] The raw data for the tables presented as images is available here.

Minimizing boolean expressions

This post will look at how to take an expression for a Boolean function and look for a simpler expression that corresponds to the same function. We’ll show how to use a Python implementation of the Quine-McCluskey algorithm.


We will write AND like multiplication, OR like addition, and use primes for negation. For example,

wx + z


(w AND x) OR (NOT z).

Minimizing expressions

You may notice that the expression

wxz + wxz

can be simplified to wz, for example, but it’s not feasible to simplify complicated expressions without a systematic approach.

One such approach is the Quine-McCluskey algorithm. Its run time increases exponentially with the problem size, but for a small number of terms it’s quick enough [1]. We’ll show how to use the Python module qm which implements the algorithm.

Specifying functions

How are you going to pass a Boolean expression to a Python function? You could pass it an expression as a string and expect the function to parse the string, but then you’d have to specify the grammar of the little language you’ve created. Or you could pass in an actual Python function, which is more work than necessary, especially if you’re going to be passing in a lot of expressions.

A simpler way is pass in the set of places where the function evaluates to 1, encoded as numbers.

For example, suppose your function is

wxyz + wxyz

This function evaluates to 1 when either the first term evaluates to 1 or the second term evaluates to 1. That is, when either

(w, x, y, z) = (1, 1, 0, 1)


(w, x, y, z) = (0, 1, 1, 0).

Interpreting the left sides as binary numbers, you could specify the expression with the set {13, 6} which describes where the function is 1.

If you prefer, you could express your numbers in binary to make the correspondence to terms more explicit, i.e. {0b1101,0b110}.

Using qm

One more thing before we use qm: your Boolean expression might not be fully specified. Maybe you want it to be 1 on some values, 0 on others, and you don’t care what it equals on the rest.

The qm module lets you specify these with arguments ones, zeroes, and dc. If you specify two out of these three sets, qm will infer the third one.

For example, in the code below

    from qm import qm
    print(qm(ones={0b111, 0b110, 0b1101}, dc={}))

we’re asking qm to minimize the expression

xyz + xyz‘ + wxyz.

Since the don’t-care set is empty, we’re saying our function equals 0 everywhere we haven’t said that it equals 1. The function prints

    ['1101', '011X']

which corresponds to

wxyz + wxy,

the X meaning that the fourth variable, z, is not part of the second term.

Note that the minimized expression is not unique: we could tell by inspection that

xyz + xyz‘ + wxyz.

could be reduced to

xy + wxyz.

Also, our code defines a minimum expression to be one with the fewest sums. Both simplifications in this example have two sums. But xy + wxyz is simpler than wxyz + wxy in the sense of having one less term, so there’s room for improvement, or at least discussion, as to how to quantify the complexity of an expression.

In the next post I use qm to explore how much minimization reduces the size of Boolean expressions.


[1] The Boolean expression minimization problem is in NP, and so no known algorithm that always produces an exact answer will scale well. But there are heuristic algorithms like Espresso and its variations that usually provide optimal or near-optimal results.

Rotating symbols in LaTeX

Linear logic uses an unusual symbol, an ampersand rotated 180 degrees, for multiplicative disjunction.


The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package cmll has two commands that produce this character, one semantic and one descriptive: \parr and \invamp [1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the graphicx package and the \rotatebox command. Here’s how you could roll your own par operator:


There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The \rotatebox command can rotate any number of degrees, and so you could rotate an ampersand 30° with


to produce a tilted ampersand.


Related posts

[1] The name \parr comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply \par because LaTeX already has a command \par for inserting a paragraph break.)

The name \invamp is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

The smallest number with a given number of divisors

Suppose you want to find the smallest number with 5 divisors. After thinking about it a little you might come up with 16, because

16 = 24

and the divisors of 16 are 2k where k = 0, 1, 2, 3, or 4.

This approach generalizes: For any prime q, the smallest number with q divisors is 2q-1.

Now suppose you want to find the smallest number with 6 divisors. One candidate would be 32 = 25, but you could do better. Instead of just looking at numbers divisible by the smallest prime, you could consider numbers that are divisible by the two smallest primes. And in fact

12 = 22 3

is the smallest number with 6 divisors.

This approach also generalizes. If h is the product of 2 primes, say h = pq where pq, then the smallest number with h divisors is

2p-1 3q-1.

The divisors come from letting the exponent on 2 range from 0 to p-1 and letting the exponent on 3 range from 0 to q-1.

For example, the smallest number with 35 divisors is

5184 = 27-1 35-1.

Note that we did not require p and q to be different. We said pq, and not p > q. And so, for example, the smallest number with 25 divisors is

1296 = 25-1 35-1.

Now, suppose we want to find the smallest number with 1001 divisors. The number 1001 factors as 7*11*13, which has some interesting consequences. It turns out that the smallest number with 1001 divisors is

213-1 311-1 57-1.

Does this solution generalize? Usually, but not always.

Let h = pqr where p, q, and r are primes with pqr. Then the smallest number with h divisors is

2p-1 3q-1 5r-1

with one exception. The smallest number with 8 divisors would be 30 = 2*3*5 if the theorem always held, but in fact the smallest number with 8 divisors is 24.

In [1] M. E. Gorst examines the exceptions to the general pattern. We’ve looked at the smallest number with h divisors when h is the product of 1, or 2, or 3 (not necessarily distinct) primes. Gorst considers values of h equal to the product of up to 6 primes.

We’ve said that the pattern above holds for all h the product of 1 or 2 primes, and for all but one value of h the product of 3 primes. There are two exceptions for h the product of 4 primes. That is, if h = pqrs where pqrs are primes, then the smallest number with h divisors is

2p-1 3q-1 5r-1 7s-1

with two exceptions. The smallest number with 24 divisors is 23 × 3 × 5, and the smallest number with 3 × 23 divisors is 23 × 32 × 5.

When h is the product of 5 or 6 primes, there are infinitely many exceptions, but they have a particular form given in [1].

The result discussed here came up recently in something I was working on, but I don’t remember now what. If memory serves, which it may not, I wanted to assume something like what is presented here but wasn’t sure it was true.


[1] M. E. Grost. The Smallest Number with a Given Number of Divisors. The American Mathematical Monthly, September 1968, pp. 725-729.

Good news from Pfizer and Moderna

Both Pfizer and Moderna have announced recently that their SARS-COV2 vaccine candidates reduce the rate of infection by over 90% in the active group compared to the control (placebo) group.

That’s great news. The vaccines may turn out to be less than 90% effective when all is said and done, but even so they’re likely to be far more effective than expected.

But there’s other good news that might be overlooked: the subjects in the control groups did well too, though not as well as in the active groups.

The infection rate was around 0.4% in the Pfizer control group and around 0.6% in the Moderna control group.

There were 11 severe cases of COVID in the Moderna trial, out of 30,000 subjects, all in the control group.

There were 0 severe cases of COVID in the Pfizer trial in either group, out of 43,000 subjects.


I think I’ll pass

The other day I saw an article about some math test and thought “I bet I’d blow that away now.”

Anyone who has spent a career using some skill ought to blow away an exam intended for people who have been learning that skill for a semester.

However, after thinking about it more, I’m pretty sure I’d pass the test in question, but I’m not at all sure I’d ace it. Academic exams often test unimportant material that is in the short term memory of both the instructor and the students.

From Timbuktu to …

When I was in middle school, I remember a question that read

It is a long way from ________ to ________.

I made up two locations that were far apart but my answer was graded as wrong.

My teacher was looking for a direct quote from a photo caption in our textbook that said it was a long way from Timbuktu to some place I can’t remember.

That stuck in my mind as the canonical example of a question that doesn’t test subject matter knowledge but tests the incidental minutia of the course itself [1]. A geography professor would stand no better chance of giving the expected answer than I did.

The three reasons …

Almost any time you see a question asking for “the 3 reasons” for something or “the 5 consequences” of this or that, it’s likely a Timbuktu question. In open-world contexts [2], I’m suspicious whenever I see “the” followed by a specific number.

In some contexts you can make exhaustive lists—it makes sense to talk about the 3 branches of the US government or the 5 Platonic solids, but it doesn’t make sense to talk about the 4 causes of World War I. Surely historians could come up with more than 4 causes, and there’s probably no consensus regarding what the 4 most important causes are.

There’s a phrase teaching to the test for when the goal is not to teach the subject per se but to prepare the students to pass a standardized test related to the subject. The phenomena discussed here is sort of the opposite, testing to the teaching.

When you ask students for the 4 causes of WWI, you’re asking for the 4 causes given in lecture or the 4 causes in the text book. You’re not testing knowledge of WWI per se but knowledge of the course materials.

Related posts

[1] Now that I’m in middle age rather than middle school, I could say that the real question was not geography but psychology. The task was to reverse-engineer from an ambiguous question what someone was thinking. That is an extremely valuable skill, but not one I possessed in middle school.

[2] A closed world is one in which the rules are explicitly known, finite, and exhaustive. Chess is a closed world. Sales is not. Academia often puts a box around some part of an open world so it can think of it as a closed world.

Probability of commuting

A couple years ago I wrote a blog post looking at how close the quaternions come to commuting. That is, the post looked at the average norm of xyyx.

A related question would be to ask how often quaternions do commute, i.e. the probability that xyyx = 0 for randomly chosen x and y.

There’s a general theorem for this [1]. For a discrete non-abelian group, the probability that two elements commute, chosen uniformly at random, is never more than 5/8 for any group.

To put it another way, in a finite group either all pairs of elements commute with each other or no more than 5/8 of all pairs commute, with no possibilities in between. You can’t have a group, for example, in which exactly 3 out of 4 pairs commute.

What if we have an infinite group like the quaternions?

Before we can answer that, we’ve got to say how we’d compute probabilities. With a finite group, the natural thing to do is make every point have equal probability. For a (locally compact) infinite group the natural choice is Haar measure.

Subject to some technical conditions, Haar measure is the only measure that interacts as expected with the group structure. It’s unique up to a constant multiple, and so it’s unique when we specify that the measure of the whole group has to be 1.

For compact non-abelian groups with Haar measure, we again get the result that no more than 5/8 of pairs commute.

[1] W. H. Gustafson. What is the Probability that Two Group Elements Commute? The American Mathematical Monthly, Nov., 1973, Vol. 80, No. 9, pp. 1031-1034.

Test for divisibility by 13

There are simple rules for telling whether a number is divisible by 2, 3, 4, 5, and 6.

  • A number is divisible by 2 if its last digit is divisible by 2.
  • A number is divisible by 3 if the sum of its digits is divisible by 3.
  • A number is divisible by 4 if the number formed by its last two digits is divisible by 4.
  • A number is divisible by 5 if its last digit is divisible by 5.
  • A number is divisible by 6 if it is divisible by 2 and by 3.

There is a rule for divisibility by 7, but it’s a little wonky. Let’s keep going.

  • A number is divisible by 8 if the number formed by its last three digits is divisible by 8.
  • A number is divisible by 9 if the sum of its digits is divisible by 9.
  • A number is divisible by 10 if its last digit is 0.

There’s a rule for divisibility by 11. It’s a little complicated, though not as complicated as the rule for 7. I describe the rule for 11 in the penultimate paragraph here.

A number is divisible by 12 if it’s divisible by 3 and 4. (It matters here that 3 and 4 are relatively prime. It’s not true, for example, that a number is divisible by 12 if it’s divisible by 2 and 6.)

But what do you do when you get to 13?

Testing divisibility by 7, 11, and 13

We’re going to kill three birds with one stone by presenting a rule for testing divisibility by 13 that also gives new rules for testing divisibility by 7 and 11. So if you’re trying to factor a number by hand, this will give a way to test three primes at once.

To test divisibility by 7, 11, and 13, write your number with digits grouped into threes as usual. For example,


Then think of each group as a separate number — e.g. 11, 37, and 989 — and take the alternating sum, starting with a + sign on the last term.

989 – 37 + 11

The original number is divisible by 7 (or 11 or 13) if this alternating sum is divisible by 7 (or 11 or 13 respectively).

The alternating sum in our example is 963, which is clearly 9*107, and not divisible by 7, 11, or 13. Therefore 11,037,989 is not divisible by 7, 11, or 13.

Here’s another example. Let’s start with


The alternating sum is

518 – 498 + 894 – 4 = 910

The sum takes a bit of work, but less work than dividing a 10-digit number by 7, 11, and 13.

The sum 910 factors into 7*13*10, and so it is divisible by 7 and by 13, but not by 11. That tells us 4,894,498,518 is divisible by 7 and 13 but not by 11.

Why this works

The heart of the method is that 7*11*13 = 1001. If I subtract a multiple of 1001 from a number, I don’t change its divisibility by 7, 11, or 13. More than that, I don’t change its remainder by 7, 11, or 13.

The steps in the method amount to adding or subtracting multiples of 1001 and dividing by 1000. The former doesn’t change the remainder by 7, 11, or 13, but the latter multiplies the remainder by -1, hence the alternating sum. (1000 is congruent to -1 mod 7, mod 11, and mod 13.) See more formal argument in footnote [1].

So not only can we test for divisibility by 7, 11, and 13 with this method, we can also find the remainders by 7, 11, and 13. The original number and the alternating sum are congruent mod 1001, so they are congruent mod 7, mod 11, and mod 13.

In our first example, n = 11,037,989 and the alternating sum was m = 963. The remainder when m is divided by 7 is 4, so the remainder when n is divided by 7 is also 4. That is, m is congruent to 4 mod 7, and so n is congruent to 4 mod 7. Similarly, m is congruent to 6 mod 11, and so n is congruent to 6 mod 11. And finally m is congruent to 1 mod 13, so n is congruent to 1 mod 13.

Related posts

[1] The key calculation is
10^3 \equiv -1 \bmod 1001 \implies \sum_{i=0}^n a_i10^{3i} \equiv \sum_{i=0}^n (-1)^ia_i \bmod 1001