The fastest ways to test whether a number is prime have some small probability of being wrong. Said another way, it’s easier to tell whether a number is “probably” prime than to tell with certainty that it’s prime. This post looks briefly at algorithms for primality testing then focuses on what exactly it means to say a number is “probably prime.”

The simplest probabilisitic method for testing primes is based on Fermat’s Little Theorem. This theorem says that if *p* is prime then *a*^{p-1} has a remainder of 1 when divided by *p*. If the remainder when dividing *a*^{n-1} by *n* is anything other than 1, we know we do not have a prime. If the remainder is 1, we don’t necessarily have a prime, but the chances that the number is prime have gone up. (At this point, some statisticians are cringing at my wording. I’ll get to these objections in a minute.)

If I repeat the test for several different values of *a*, the probability that my number is prime keeps going up. In principle I could keep testing with new values of *a* until I’m sufficiently satisfied that my number is prime. That’s not quite true, however, since there are some numbers known as Carmichael numbers that will slip through the test for every value of *a*. The smallest Carmichael number is 561. But these numbers are sufficiently rare that you are extraordinarily unlikely to run into one by accident when testing large numbers for primality.

Primality testing involves some very interesting mathematics. I may blog about that sometime, but in this post I want to look at what it means to say a number is “probably prime.” This issue illustrates the differences between two schools of statistical interpretation.

A strict frequentist statistician would argue as follows. Suppose you have a large number *N*, and you run it through a probabilistic primality test. If it fails, the probability of it being prime is 0 because the number is certainly composite. If it passes, the probability that it is prime is either 0 or 1, but we just don’t know which. If it is prime, the probability that it is prime is 1. If it’s not prime, the probability is 0. It’s not very helpful to ask for the probability of a specific number being prime because there’s nothing random going on.

However, just before you write off the statistician as being worthless, he gives a further explanation. He explains that you can indeed talk about the probability of a number being prime if you rephrase your question. Suppose you pick a number at random from some range. Then you can speak of the probability that you *chose* a prime number. The number’s properties are not random but your choice is random. And further you can ask about the conditional probability that your selection will be prime given that it passes a primality test.

Perhaps you don’t find the argument above satisfying. You still want to say “I’ve got a number here and it passed the primality tests. Surely I can say it’s probably prime.” A Bayesian statistician may agree with you. There are varieties of Bayesians, but one school of thought says that probabilities represent uncertainty. Randomness is one cause of uncertainty, but not the only one or even the most important one. E. T. Jaynes was a major proponent of this view. He argued that much of what people say about randomness is nonsense and that we should focus on information instead. As he once said “probabilities do not describe reality, only our information about reality.”

Of course a particular number is either prime or not, but *our knowledge* of whether it is prime is uncertain. So the probability refers to our mental state, not to the number. (In a sense this isn’t too different from the frequentist saying the probability applies to our process of choosing numbers, not the number we drew.)

One criticism of Bayesian statistics is that Bayesian analysis begins with a prior distribution. So our Bayesian statistician would ask “What is your probability that the number will be prime before you select it?” A sensible answer would be to use the Prime Number Theorem to come up with an initial guess. For example, if you’re picking a number with 200 digits, the theorem says that the proportion of 200-digit numbers that are prime is roughly 1/(200 log 10) or about 1/460.

But what if someone chooses a prior that isn’t as clever? They’ll get a different probability that a particular number is prime! But that’s to be expected. If a probability represents uncertainty, some people will have more uncertainty than others because people know different things. What if someone secretly created your 200-digit number by multiplying two 100-digit numbers? His prior probability that the product is prime will be zero because he has no uncertainty.

Leaving aside primality for a moment, suppose I flip a fair coin. Before I flip the coin, a frequentist and a Bayesian agree that the probability that it will come up heads is 1/2. Now suppose the coin has fallen to the floor and only I can see the outcome. My probability of heads is now either 0 or 1. The Bayesian would say that his probability is still 1/2 because his information hasn’t changed. A strict frequentist would say it is now improper to assign a probability to the outcome. The coin’s status has changed from random to fixed but unknown, and one may not use probabilities to quantify uncertain knowledge.

Going back to primality testing, recall we first said 1/460 was a good prior distribution on the probability of a 200-digit number being prime. But we could improve this by saying the prior probability is 0 for even numbers and 1/230 for odd numbers since we can easily rule out half of the 200 digit numbers. Our prior probability has become more refined because it now incorporates more knowledge, namely the fact that big even numbers are not prime. But why stop there?

We could rule out multiples of 3 as well and put all the positive probability on numbers not divisible by 2 or 3. We could continue, and at each step our prior distribution would change. Some people are disturbed by the thought of the prior changing like this, but it’s perfectly consonant with the idea of a prior distribution expressing prior knowledge. The more work you do up front, the less uncertainty you have.

We could even continue this process, *in theory*, until we know the primality status of every 200-digit number and there’s no uncertainty left. We’d be done before we got around to conducting our probabilistic primality test. In practice we’d never finish if we tried to test every 200 digit number for primality. There would be nowhere to store the results even if we could compute them. There are fewer than 10^80 atoms in the universe, so if we stored one result per atom, we could only store a table of primes up to about 80 digits.

Setting aside statistical fine points, suppose we are now quite certain that our number is prime. Say we report that the probability of our number not being prime is less than 1 chance in 10^{50}. Is that really believable? For such a tiny probability, isn’t it more likely that there was a bug in our computer program that produced the probability? Or if we assume the software is perfect, we have to admit that computer hardware has errors. These errors are rare, but are they as rare as one chance in 10^{50}? Or for that matter, what is the probability that radiation may have caused a bit to flip somewhere in the hardware as we were computing?

Suppose we want to know the primality of a number because we’re going to use the number in an encryption algorithm. Then our ultimate concern is whether the encryption keeps our data secure. If the probability of the primality algorithm failing is orders of magnitude less than the probability of other security threats, we should focus our attention elsewhere.

**Related posts**:

Probability and information

What a probability means

Connecting probability and number theory

Thanks!

why is it that a Carmichael number has to be a compositie of at least 3 prime numbers?

Thomas: I don’t know the answer to your question, but here’s a different form of your question that may be easier to prove.

Claim: If p and q are distinct primes, there exists a number a not divisible by p or q such that a

^{pq-1}is congruent to 1 mod pq.I don’t have a proof for this. Maybe it has something to do with φ(pq) = (p-1)(q-1) and Euler’s generalization of Fermat’s theorem: a

^{φ(n)}is congruent to 1 if a is relatively prime to n.John, I have never seen your claim, much less a proof. Thanks anyways though.

I have come to believe that a lot of the frequentist/bayesian debate is due to the lack of a solid definition for “random sample.” Let f be the standard normal distribution on R. Then x in R is a “random sample” with respect to f if… if what? What is the definition? I have a 200-digit number written down on a piece of paper. I assume it to be a random sample from the set of all 200-digit numbers. Does such a statement have any meaning? I could see that if you had an infinite sequence of x’s, then you *might* be able to start to make such statements meaningful. But a single number?

Look up “random sample” on Wikipedia: “A random sample is one chosen by a method involving an unpredictable component.” So helpful! Reading the article gives me the impression that the entire concept is built on sand.

Dear Thomas (and other people caring why pq with p and q prime is never a carmichael number),

I just posted a one-line proof here: http://bylosingallgenerality.wordpress.com/2010/10/14/a-carmichael-number-has-at-least-3-prime-divisors/

The frequentist interpretation of the randomized primality tests can be better done in the language of statistical tests, i.e. null hypothesis vs alternative.

So the null-hypothesis is that your number is composite. Then you do a lot of primality testing. Afterwards you can say that you can reject the null hypothesis with some confidence. Interestingly one failing primality test will instantly verify the null hypothesis.

SteveBrooklineMA, I somewhat agree with your notion. I guess that’s why our professor who read probability theory in a frequentist fashion carefully avoided relying on a definition of random sample. All we did was talking about probability distributions and integrals over them.

Another way to look at the difference between frequentists and bayesians is to look at how they treat parameters: Bayesians believe that all parameters also have a probability distributions, while frequentists invent notions like likelihood to cope with their refusal to assign probalitities to parameter values. I can sympathize with both of them.