Probability problem with Pratt prime proofs

In the process of creating a Pratt certificate to prove that a number n is prime, you have to find a number a that seems kinda arbitrary. As we discussed here, a number n is prime if there exists a number a such that

an-1 = 1 mod n


a(n-1)/p ≠ 1 mod n

for all primes p that divide n – 1. How do you find a? You try something and see if it works. If it doesn’t, you try again.

How many values of a might you have to try? Could this take a long time?

You need to pick 2 ≤ an – 2, and there are φ(n-1) values of a that will work. Here φ is Euler’s totient function. So the probability of picking a value of a that will work is

p = φ(n-1) / (n – 3).

The number of attempts before success has a geometric distribution with parameter p, and an expected value of 1/p.

So about how big is p? There’s a theorem that says that φ(n)/n is asymptotically bounded below by

exp(-γ) / log log n

where γ = 0.577… is the Euler-Mascheroni constant. So for a 50-digit number, for example, we would expect p to be somewhere around 0.12, which says we might have to try eight or nine values of a. This estimate may be a little pessimistic since we based it on a lower bound for φ.

Factoring b^n + 1

The previous post illustrated a technique for finding factors of number of the form bn – 1. This post will look at an analogous, though slightly less general, technique for numbers of the form bn + 1.

There is a theorem that says that if m divides n then bm + 1 divides bn + 1 if n is odd. To see that the exponent must be odd, let b = 10, n = 4, and m = 2: 10,0001 is not divisible by 101.

Last time we looked at 22023 – 1, so this time let’s look at J = 22023 + 1.

Since 2023 = 7×17×17, we know J is divisible by 2n + 1 for n = 7, 17, 7×17, and 17×17. Factoring each of these factors tells us that J is divisible by

  • 3
  • 43691
  • 72251
  • 79187
  • 1077971
  • 823679683
  • 18360250452977
  • 197766803208315851
  • 143162553165560959297
  • 338858733065598401355195539629373089

(If it seems this post is moving too fast, you might want to go back and read the previous post and then come back.)

Let P be the product of the factors above, and let F = J/P.  Now F is a 493-digit number, so we don’t expect to be able to get much further unless we’re lucky. We can find three factors of F

  • 43
  • 4382432993
  • 1809032819104754041

but then we get stuck. We know there are more factors, but it seems doubtful that I’m going to find any more anytime soon.


As in the previous post, we’ll conclude with Python code to verify the claims above.

from sympy import isprime

def verify_factors(N, factors, full=True):
    prod = 1
    for f in factors:
        prod *= f
    if full:
        assert(N == prod)
        assert(N % prod == 0)
        assert(not isprime(N // prod))
    return N // prod

factors = [

J = 2**2023 + 1
F = verify_factors(J, factors, False)

factors = [43, 4382432993, 1809032819104754041]
verify_factors(F, factors, False)


Factoring b^n – 1

Suppose you want to factor a number of the form bn – 1.

There is a theorem that says that if m divides n then bm – 1 divides bn – 1.

Let’s use this theorem to try to factor J = 22023 – 1, a 609-digit number. Factoring such a large number would be more difficult if it didn’t have a special form that we can exploit.

Our theorem tells us J is divisible by 27 – 1 and 2289 – 1 because 2023 = 7×17×17.

Is J divisible by (27 – 1)(2289 – 1)? In fact it is, but this doesn’t necessarily follow from the theorem above because (bm – 1) and (bn/m -1) could share factors, though in this case they do not.

So we have

J = (27 – 1) (2289 – 1) F

for some factor F.  Now 27 – 1 = 127, a prime. What about 2289 – 1?

By the theorem above we know that 2289 – 1 is divisible by 217 – 1 = 131071, which turns out to be a prime. We can get a couple more factors of 2289 – 1 by consulting The Cunningham Project, a project that catalogs known factors of bn ± 1 for small values of b. We learn from their site that 2289 – 1 is also divisible by 12761663 and 179058312604392742511009. All together

2289 – 1 = 131071 × 12761663 × 179058312604392742511009 × R


R = 3320934994356628805321733520790947608989420068445023

and R turns out to be prime.

So now we have five prime factors of J:

  • 127
  • 131071
  • 12761663
  • 179058312604392742511009
  • R.

That leaves us with F above, a 520-digit number. It would seem we’ve gotten all the mileage we can out of our factoring trick. But there’s something we haven’t tried: We know that J is divisible by 2119 – 1 because 7 × 17 = 119 is a factor of 2023.


2119 – 1 = 127 × 239 × 20231 × 131071 × 62983048367 × 131105292137

and so these prime factors divide J. However, two of these, 127 and 131071, we’ve already discovered. But we do learn 4 more prime factors. So

F = 239 × 20231 × 62983048367 × 131105292137 × G

where G is a 492-digit number. We can tell by Fermat’s test that G is not prime, but I’m unaware of any clever technique for easily finding any of the factors of G.


In general factoring a 492-digit number is hard. There are RSA challenge numbers smaller than this that have not yet been factored, such as RSA-260, a 260-digit number. On the other hand, the RSA numbers are designed to be hard to factor. RSA-260 has two prime factors, presumably both the same order of magnitude. We get a little luckier with G. It has three relatively small factors that I was able to find:

G = 166684901665026193 × 3845059207282831441 × 153641005986537578671 × H

where H is a 436-digit number. I know from Fermat’s test that H is composite but I could not find any factors.

Update: From the comments, 2605053667526976413491923719 is also a factor of G. I’ve updated the code below accordingly. Now the unfactored part H is a 408-digit number.


Here’s Python code to verify the claims made above.

from sympy import isprime

def verify_factors(N, factors, full=True):
    prod = 1
    for f in factors:
        prod *= f
    assert(N % prod == 0)
    if full:
        assert(N == prod)

R = 3320934994356628805321733520790947608989420068445023
factors = [131071, 12761663, 179058312604392742511009, R]
verify_factors(2**289 - 1, factors)

J = 2**2023 - 1
prod = 127*(2**289 - 1)
F = J // prod
assert(J == prod*F)

factors = [127, 239, 20231, 131071, 62983048367, 131105292137]
verify_factors(2**119 - 1, factors)

prod = 239*20231*62983048367*131105292137
G = F // prod
assert(F == G*prod)

factors = [166684901665026193, 3845059207282831441, 153641005986537578671, 2605053667526976413491923719]
verify_factors(G, factors, False)

prod = 1
for f in factors:
    prod *= f
H = G // prod
assert(G == H*prod)
assert(not isprime(H))

assert(len(str(J)) == 609)
assert(len(str(F)) == 520)
assert(len(str(G)) == 492)
assert(len(str(H)) == 408)

Related post: Primality certificates

Update: See the next post for the case of bn + 1.

Converting between barycentric and trilinear coordinates

Barycentric coordinates describe the position of a point relative to the three vertices of a triangle. Trilinear coordinates describe the position of a point relative to the three sides of a triangle. It’s surprisingly simple to convert from one to the other.

Why should this be surprising? Because the distance from a point to a line is more complicated to compute than the distance between two points. Hiding this complication is one of the things that makes trilinear coordinates convenient for some tasks.

Both barycentric and trilinear coordinates are homogeneous. This means the coordinates are only unique up to a scaling factor. The proportions between the components are unique, not the components themselves. Barycentric and trilinear coordinates are written with colons separating the components as a reminder that they are proportions rather than absolute numbers.

If the vertices of our triangle are A, B, and C, the barycentric coordinates of a point P are proportional to the distances from P to A, B, and C. The trilinear coordinates of P are proportional to the distances from P to the sides opposite A, B, and C.

Let a be the length of the side opposite vertex A and similarly for b and c. Then a point with trilinear coordinates

x : y : z

has barycentric coordinates

ax : by : cz

Similarly, a point with barycentric coordinates

α : β : γ

has trilinear coordinates

α/a : β/b : γ/b.

If you have trilinear coordinates x : y : z you can find the actual distances to the sides of the triangle by scaling by a factor involving the area of the triangle. The distances to sides opposite A, B, and C are kx, ky, and kz respectively, where

k = area / (ax + by + cz).

Note that the denominator is the sum of the corresponding barycentric coordinates. This is because barycentric coordinates of P can be interpreted as the areas of the three triangles formed by drawing lines from P to each of the vertices.

Special primality proofs

I’ve written lately about two general ways to prove that a number is prime: Pratt certificates for moderately-large primes and elliptic curve certificates for very large primes.

If you can say more about the prime you wish to certify, there may be special forms of certificates that are more efficient. In particular, there are efficient tests to determine whether Fermat number or a Mersenne number is prime.

Pepin’s test, implemented in four lines of Python here, determines whether or not a Fermat number is prime. It can instantly prove that the known Fermat primes are indeed prime. It can quickly show that the next several Fermat numbers are composite, but the time required to run Pepin’s test increases rapidly for larger numbers.

The Lucas-Lehmer test, implemented in six lines of Python here, tests whether a Mersenne number is prime. The largest known primes are Mersenne primes because the Lucas-Lehmer test is relatively efficient, though it still takes a lot of effort to search for new Mersenne primes.

Certifying that a number is composite is potentially much easier than certifying that a number is prime. A famous example of a proof that a number is composite was Euler’s announcement in 1732 that

232 + 1 = 641 × 6700417,

thereby proving that the fifth Fermat number is not prime, disproving Fermat’s conjecture.

Several Fermat numbers have been fully factored, concretely proving that they are composite, but some Fermat numbers have been proven composite via Pepin’s test that have not been factored. The analogous statement is true for Mersenne primes and the Lucas-Lehmer test.

Zeta sum vs zeta product

The Riemann zeta function ζ(s) is given by an infinite sum and an infinite product

\zeta(s) = \sum_{n=1}^\infty\frac{1}{n^s} = \prod_{p \text{ prime}} \frac{1}{1-p^{-s}}

for complex numbers s with real part greater than 1 [*].

The infinite sum is equal to the infinite product, but which would give you more accuracy: N terms of the sum or N terms of the product? We’ll take a look at this question empirically.

The accuracy of the partial sum and the partial product depends on N and on s. For starters, let’s fix s = 3 and look at the approximation error as a function of N. Note that the error is plotted on a log scale.

Now let’s fix N = 10 and look at the error as a function of s.

So our two examples suggest that taking N terms of the product gives more accuracy than taking N terms of the sum.

In the paper mentioned in the previous post, the author approximates the zeta function with a few terms of the product. That’s what motivated this post. I thought the product might give more accuracy than the sum, and that appears to be the case.

Related posts

[*] If you evaluate the series for ζ at s = -1, despite the fact that the equation does not hold at that point, and evaluate ζ at -1 using analytic continuation, you get the infamous nonsensical result that the positive integers sum to -1/12.

Approximating pi with Bernoulli numbers

In a paper on arXiv Simon Plouffe gives the formula

\pi \approx \left(\frac{2 n!}{B_n 2^n}\right)^{1/n}.

which he derives from an equation in Abramowitz and Stegun (A&S).

It took a little while for me to understand what Plouffe intended. I don’t mean my remarks here to be criticism of the author but rather helpful hints for anyone else who might have difficulty reading the paper.

He says “Here the Bn are positive and n is even.” This doesn’t mean that the Bn are positive; it means that we should make them positive if necessary by taking the absolute value.

Plouffe says that he derives his estimate from an equation on page 809 of A&S, but at least in my copy of A&S, there are no equations on page 809. I believe he had in mind equation 23.2.16, which is on page 807 in my edition.

This equation says

\zeta(2n) = \frac{(2\pi)^{2n}}{2(2n)!} = |B_{2n}|.

The first inequality in the paper does not appear directly in A&S but can be derived from the equation above. The approximation for π at the top of the post approximates ζ(2n) by 1. You could obtain a better approximation for π by using a better approximation for ζ(2n), which Plouffe does, retaining the first few terms in Euler’s product representation for the zeta function.

How good is the approximation above? Plotting the approximation error on a log scale gives nearly a straight line, i.e. the error decreases exponentially.



This plot was produced with the following Mathematica code.

    f[m_] := (2 Factorial[2m] / Abs[BernoulliB[2m] 2^(2m)])^(1/(2m))
    ListPlot[Table[Log[Abs[f[m] - Pi]], {m, 1, 20}]]

Related posts

Reverse engineering options

There are 221,184 ways to have a Whopper. You Rule.

This weekend I saw a sign in the window of a Burger King™ that made me think of an interesting problem. If you know the number of possibilities like this, how would you reverse engineer what the options that created the possibilities?

In the example above, there are 211,184 = 213×33 possible answers, and so most likely there are 13 questions that have a yes or no answer, and 3 questions that have 3 possible answers each. But there are other possibilities in principle, a lot of other possibilities.

Maybe there are 15 questions: 1 with 4 possible answers, 12 with 2 possible answers, and 3 with 3 possible answers. Or maybe there are again 15 questions, but one of the questions has 9 answers. If there are 14 questions, maybe one question has 8 answers, or 27 answers, or maybe two questions have 6 answers, … it gets complicated.

More general problem

How would you address this kind of problem in general?

Suppose there is some list of questions Q1, Q2, …, Qk where there are qi possible answers to question Qi. Then the product of all the qi equals the total number of possible responses P to the questions. Note that we don’t know the number of questions k; all we know is the product P. Let’s call the ordered list {q1, q2, …, qk} of number of responses to questions a survey.

There are as many possible surveys as there are multiplicative partitions of P. The number of possible surveys only depends on the exponents in the prime factorization of P, not on the prime factors themselves. It would make no difference if in the example above we had 513×73 possibilities rather than 213×33 possibilities. We’d reason the same way when counting the possible surveys. The number of questions is somewhere between 1 and the sum of the exponents in the prime factorization.

We have to decide how we want to count surveys. For example, if we do have 13 questions with binary answers and 3 questions with trinary answers, does it matter which questions have three answers? If it does, we multiply the number of possibilities by the binomial coefficient C(16, 3) because we choose which of the 16 total questions have three possible answers. And then we have to reason about the case of 15 questions, 14 questions, etc.

Multiplicative partitions are complicated, and its not possible to write down simple expressions for the number of multiplicative partitions of numbers with many prime factors. But in the special case of only two distinct prime factors, i.e.

P = p_1^i \, p_2^j

then there is a relatively simple expression for the total number of possible surveys, assuming order matters [1]:

2^{i+j-1} \sum_{k=0}^j {i \choose k}{j \choose k} 2^{-k}

This works out to 3,760,128 in the example above. Curiously,

3760128 = 221184 × 17.

But note that our result does not depend on the primes in the factorization of P, only on their exponents. There are no p‘s in the summation above, only i‘s and j‘s. As we noted earlier, we’d come to the same count if we’d started with P = 513×73. The factors of 2 and 3 in our count arrive for different reasons than being the primes in the factorization of P.

Related posts

[1] A. Knopfmacher and M. E. Mays. A survey of factorization counting functions, International Journal of Number Theory. Vol. 01, No. 04, pp. 563-581 (2005)

Foreshadowing Page Rank

Douglas Hofstadter, best known as the author of Godel, Escher, Bach, wrote the foreword to Clark Kimberling’s book Triangle Centers and Central Triangles. Hofstadter begins by saying that in his study of math he “sadly managed to sidestep virtually all of geometry” and developed an interest in geometry, specifically triangle centers, much later.

The ancient Greeks had four ways to define the center of a triangle—circumcenter, incenter, orthocenter, and centroid. These are marked in the diagram above in blue, red, green, and orange respectively. But there are many more notions of a center of a triangle. Kimberling’s book (published in 1998) gives 400 triangle centers, and his website now lists tens of thousands of triangle centers.

In the foreword to Kimberling’s book Hofstadter recounts his speculation regarding which notion of triangle center is most central, the most important. He debates several criteria, then focuses on the fact that triangle centers tend to lie on lines, like the Euler line drawn in the image above. He calculates that the first 100 centers on Kimberling’s list fall on far fewer lines than one would expect if the points were distributed randomly.

So one could rank the lines according to how many centers each had, and rank the centers by how many lines pass through each. But then you want to count the ranks of the lines, giving points a higher rank for being on more high ranking lines, and you want rank lines higher for having more high ranking points on them ….

Hofstadter says that although this sounds circular, and it is, his ranking system could be implemented as a system of linear equations.

This system for ranking triangle centers is essentially the same as Google’s PageRank algorithm, giving higher rank to pages that are linked to by high ranking pages. And computing PageRank is indeed an enormous linear algebra problem, specifically an eigenvalue problem.

Hofstadter was not the first to think of this kind of ranking. According to Wikipedia

The eigenvalue problem behind PageRank’s algorithm was independently rediscovered and reused in many scoring problems. In 1895, Edmund Landau suggested using it for determining the winner of a chess tournament.

Hofstadter says

It would be fascinating to see what would come out of this attempt to assign a rank-ordering to the special points and special lines associated with any triangle.

But apparently he did not carry out the calculation, at least not as of 1998. Perhaps the idea of solving an eigenvalue problem for a matrix with hundreds of rows seemed impractical to Hofstadter at the time.

It would be interesting to know whether Hofstadter or anyone else ever followed through on this ranking project.

Kimberling lists his triangle centers in roughly chronological order, which correlates with subjective importance. If someone were to carry out a PageRank of the triangle centers, I’d recommend using only the 400 centers in Kimberling’s book. Or maybe even better, repeat the exercise for the first N centers for an increasing sequence of values of N.

Third order ordinary differential equations

Most applied differential equations are second order. This probably has something to do with the fact that Newton’s laws are second order differential equations.

Higher order equations are less common in application, and when they do pop up they usually have even order, such as the 4th order beam equation.

What about 3rd order equations? Third order equations are rare in application, and third order linear equations are even more rare.

This post will focus on ordinary differential equations (ODEs), but similar remarks apply to PDEs.

Third order nonlinear equations

One example of a third order nonlinear ODE is the Blasius boundary layer equation from fluid dynamics

y''' + yy'' = 0

and its generalization, the Falkner-Skan equation

y''' + yy'' + \beta\left(1 - (y')^2\right) = 0.

Third order linear equations

I’ve seen two linear third order ODEs in application, but neither one is very physical.

The first [1] is an equation satisfied by the product of Airy functions:

 y''' - 4xy' - 2y = 0.

Here is a short proof that the product of Airy functions satisfy this equation.

Airy functions come up in applications such as quantum mechanics, optics, and probability. I suppose products of Airy functions may come up in those areas, but the equation above seems like something discovered after the fact. It seems someone thought it would be useful to find a differential equation that products of Air functions satisfy. I doubt someone wrote down the equation because it came up in applications, then discovered that products of Airy functions were the solutions.

The second [2] is a statistical model for analyzing handwriting:

x'''(t) = \beta_1(t) x'(t) + \beta_2(t) x''(t) + f(t).

Here someone decided to try modeling handwriting movements as a function of velocity, acceleration, and “jerk” (third derivative of position). It may be a useful model, but it wasn’t derived in the same physical sense as say the equations of mechanical vibrations. You could also object that since x(t) does not appear in the equation, this is a second order differential equation for y(t) = x′(t).

Related posts

[1] Abramowitz and Stegun, Handbook of Mathematical Functions, equation 10.4.57.

[2] Ransay and Silverman. Applied Functional Data Analysis: Methods and Case Studies. Equation 12.1.