Big O tilde notation

There’s a variation on Landau’s big-O notation [1] that’s starting to become more common, one that puts a tilde on top of the O. At first it looks like a typo, a stray diacritic mark. What does that mean? In short,

{\cal O}(h(n) \log^k n) = \tilde{{\cal O}}(h(n))

That is, big O tilde notation ignores logarithmic factors. For example, the FFT algorithm computes the discrete Fourier transform of a sequence of length n in

O(n log n)

steps, and so you could write this as Õ(n). This notation comes up frequently in computational number theory where algorithms often have a mix of polynomial and logarithmic terms.

A couple days ago I blogged about algorithms for multiplying large numbers. The Schönhage-Strasse algorithm has a run time on the order of

O(n log(n) log(log(n))),

which you could write as simply Õ(n).

Shor’s quantum algorithm for factoring n-bit integers has a run time of

O(n² log(n) log(log(n))),

which we could write as Õ(n²). The fast Euclidean algorithm improves on the ancient Euclidean algorithm by reducing run time from O(n²) down to

O(n log²(n) log(log(n))),

which could be written simply as Õ(n).

The definition at the top of the post says we can ignore powers of logarithm, but the previous examples contained iterated logarithms. This is permissible because log(x) < x, and so log(log(x)) < log(x). [2]

Related posts

[1] Big O notation can be confusing at first. For example, the equal sign doesn’t have its standard meaning. For more details, see these notes.

[2] Sometimes in mathematics a superscript on a function is an exponent to be applied to the output, and sometimes it indicates the number of times a function should be iterated. That is, f²(x) could mean f(x)² or f( f(x) ). The former is the convention for logarithms, and we follow that convention here.

How fast can you multiply really big numbers?

How long does it take to multiply very large integers? Using the algorithm you learned in elementary school, it takes O(n²) operations to multiply two n digit numbers. But for large enough numbers it pays to carry out multiplication very differently, using FFTs.

If you’re multiplying integers with tens of thousands of decimal digits, the most efficient algorithm is the Schönhage-Strassen algorithm, which takes

O(n log n  log log n)

operations. For smaller numbers, another algorithm, such as Karatsuba’s algorithm, might be faster. And for impractically large numbers, Fürer’s algorithm is faster.

What is impractically large? Let’s say a number is impractically large if storing it requires more than a billion dollars worth of RAM. If I did my back-of-the-envelop math correctly, you can buy enough RAM to store about 257 bits for about a billion dollars. The Schönhage-Strassen algorithm is more efficient than Fürer’s algorithm for numbers with less than 264 bits.

Related postHow fast can you multiply matrices?

Goldbach’s conjecture, Lagrange’s theorem, and 2019

The previous post showed how to find all groups whose order is a product of two primes using 2019 as an example. Here are a couple more observations along the same line, illustrating the odd Goldbach conjecture and Lagrange’s four-square theorem with 2019.

Odd Goldbach Conjecture

Goldbach’s conjecture says that every even number greater than 2 can be written as the sum of two primes. The odd Goldbach conjecture, a.k.a. the weak Goldbach conjecture, says that every odd number greater than 5 can be written as the sum of three primes. The odd Goldbach conjecture isn’t really a conjecture anymore because Harald Helfgott proved it in 2013, though the original Goldbach conjecture remains unsettled.

The odd Goldbach conjecture says it should be possible to write 2019 as the sum of three primes. And in fact there are 2,259 ways to write 2019 as a non-decreasing sequence of primes.

      3 +   5 + 2011
      3 +  13 + 2003
      3 +  17 + 1999
          ...
    659 + 659 +  701
    659 + 677 +  701
    673 + 673 +  673

Lagrange’s four-square theorem

Lagrange’s four-square theorem says that every non-negative integer can be written as the sum of four squares. 2019 is a non-negative integer, so it can be written as the sum of four squares. In fact there are 66 ways to write 2019 as a sum of four squares.

     0  1 13 43
     0  5 25 37
     0  7 11 43
         ...
    16 19 21 31
    17 23 24 25
    19 20 23 27

Sometimes there is a unique way to write a number as the sum of four squares. The last time a year could be written uniquely as a sum of four squares was 1536, and the next time will be in 2048.

Related posts

Check sums and error detection

The previous post looked at Crockford’s base 32 encoding, a minor variation on the way math conventionally represents base 32 numbers, with concessions for human use. By not using the letter O, for example, it avoids confusion with the digit 0.

Crockford recommends the following check sum procedure, a simple error detection code:

The check symbol encodes the number modulo 37, 37 being the least prime number greater than 32.

That is, we take the remainder when the base 32 number is divided by 37 and append the result to the original encoded number. The remainder could be larger than 31, so we need to expand our alphabet of symbols. Crockford recommends using the symbols *, ~, $, =, and U to represent remainders of 32, 33, 34, 35, and 36.

Crockford says his check sum will “detect wrong-symbol and transposed-symbol errors.” We will show that this is the case in the proof below.

Python example

Here’s a little Python code to demonstrate how the checksum works

    from base32_crockford import encode, decode

    s = "H88CMK9BVJ1V"
    w = "H88CMK9BVJ1W" # wrong last char
    t = "H88CMK9BVJV1" # transposed last chars

    def append_checksum(s):
        return encode(decode(s), checksum=True)

    print(append_checksum(s))
    print(append_checksum(w))
    print(append_checksum(t))

This produces the following output.

    H88CMK9BVJ1VP
    H88CMK9BVJ1WQ
    H88CMK9BVJV1E

The checksum character of the original string is P. When the last character is changed, the checksum changes from P to Q. Similarly, transposing the last two characters changes the checksum from P to E.

The following code illustrates that the check sum can be a non-alphabetic character.

    s = "H88CMK9BVJ10"
    n = decode(s)
    r = n % 37
    print(r)
    print(encode(n, checksum=True))

This produces

    32
    H88CMK9BVJ10*

As we said above, a remainder of 32 is represented by *.

Proof

If you change one character in a base 32 number, its remainder by 37 will change as well, and so the check sum changes.

Specifically, if you change the nth digit from the right, counting from 0, by an amount k, then you change the number by a factor of k 2n. Since 0 < k < 32, k is not divisible by 37, nor is 2n. Because 37 is prime, k 2n is not divisible by 37 [1]. The same argument holds if we replace 37 by any larger prime.

Now what about transpositions? If you swap consecutive digits a and b in a number, you also change the remainder by 37 (or any larger prime) and hence the check sum.

Again, let’s be specific. Suppose we transpose the nth and n+1st digits from the right, again counting from 0. Denote these digits by a and b respectively. Then swapping these two digits changes the number by an amount

(b 2n+1 + a 2n) – (a 2n+1 + b 2n) = (b – a) 2n

If ab, then b – a is a number between -31 and 31, but not 0, and so b – a is not divisible by 37. Neither is any power of 2 divisible by 37, so we’ve changed the remainder by 37, i.e. changed the check sum. And as before, the same argument works for any prime larger than 47.

Related posts

[1] A prime p divides a product ab only if it divides a or it divides b. This isn’t true for composite numbers. For example, 6 divides 4*9 = 36, but 6 doesn’t divide 4 or 9.

Base 32 and base 64 encoding

Math has a conventional way to represent numbers in bases larger than 10, and software development has a couple variations on this theme that are only incidentally mathematical.

Math convention

By convention, math books typically represent numbers in bases larger than 10 by using letters as new digit symbols following 9. For example, base 16 would use 0, 1, 2, …, 9, A, B, C, D, E, and F as its “digits.” This works for bases up to 36; base 36 would use all the letters of the alphabet. There’s no firm convention for whether to use upper or lower case letters.

Base 64 encoding

The common use for base 64 encoding isn’t to represent bits as numbers per se, but to have an efficient way to transmit bits in a context that requires text characters.

There are around 100 possible characters on a keyboard, and 64 is the largest power of 2 less than 100 [1], and so base 64 is the most dense encoding using common characters in a base that is a power of 2.

Base 64 encoding does not follow the math convention of using the digits first and then adding more symbols; it’s free not to because there’s no intention of treating the output as numbers. Instead, the capital letters A through Z represent the numbers 0 though 25, the lower case letters a through z represent the numbers 26 through 51, and the digits 0 through 9 represent the numbers 52 through 61. The symbol + is used for 62 and / is used for 63.

Crockford’s base 32 encoding

Douglas Crockford proposed an interesting form of base 32 encoding. His encoding mostly follows the math convention: 0, 1, 2, …, 9, A, B, …, except he does not use the letters I, L, O, and U. This eliminates the possibility of confusing i, I, or l with 1, or confusing O with 0. Crockford had one more letter he could eliminate, and he chose U in order to avoid an “accidental obscenity.” [2]

Crockford’s base 32 encoding is a compromise between efficiency and human legibility. It is more efficient than hexadecimal, representing 25% more bits per character. It’s less efficient than base 64, representing 17% fewer bits per character, but is more legible than base 64 encoding because it eliminates commonly confused characters.

His encoding is also case insensitive. He recommends using only capital letters for output, but permitting upper or lower case letters in input. This is in the spirit of Postel’s law, also known as the robustness principle:

Be conservative in what you send, and liberal in what you accept.

See the next post for an explanation of Crockford’s check sum proposal.

A password generator

Here’s a Python script to generate passwords using Crockford’s base 32 encoding.

    from secrets import randbits
    from base32_crockford import encode

    def gen_pwd(numbits):
        print(encode(randbits(numbits)))

For example, gen_pwd(60) would create a 12-character password with 60-bits of entropy, and this password would be free of commonly confused characters.

Related posts

[1] We want to use powers of 2 because it’s easy to convert between base 2 and base 2n: start at the right end and convert bits in groups of n. For example, to convert a binary string to hexadecimal (base 24 = 16), convert groups of four bits each to hexadecimal. So to convert the binary number 101111001 to hex, we break it into 1 0111 1001 and convert each piece to hex, with 1 -> 1, 0111 -> 7, and 1001 -> 9, to find 101111001 -> 179. If we a base that is not a power of 2, the conversion would be more complicated and not so localized.

[2] All the words on George Carlin’s infamous list include either an I or a U, and so none can result from Crockford’s base 32 encoding. If one were willing to risk accidental obscenities, it would be good to put U back in and remove S since the latter resembles 5, particularly in some fonts.

New prime record: 51st Mersenne prime discovered

A new prime record was announced yesterday. The largest known prime is now

2^{82,589,933} - 1

Written in hexadecimal the newly discovered prime is

1\underbrace{\mbox{FFF \ldots FFF}}_\mbox{{\normalsize 20,647,483 F's}}

For decades the largest known prime has been a Mersenne prime because there’s an efficient test for checking whether a Mersenne number is prime. I explain the test here.

There are now 51 known Mersenne primes. There may be infinitely many Mersenne primes, but this hasn’t been proven. Also, the newly discovered Mersenne prime is the 51st known Mersenne prime, but it may not be the 51st Mersenne prime, i.e. there may be undiscovered Mersenne primes hiding between the last few that have been discovered.

Three weeks ago I wrote a post graphing the trend in Mersenne primes. Here’s the updated post. Mersenne primes have the form 2p -1 where p is a prime, and this graph plots the values of p on a log scale. See the earlier post for details.

Trend in Mersenne primes

Because there’s a one-to-one correspondence between Mersenne primes and even perfect numbers, the new discovery means there is also a new perfect number. M is a Mersenne prime if and only if M(M + 1)/2 is an even perfect number. This is also the Mth triangular number.

Related posts

RSA with one shared prime

The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product.

Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p and r, and qr. (This has happened [1].) Then you can easily find p. All three primes can be obtained in less than a millisecond as illustrated by the Python script below.

In a way it would be better to share both your primes with someone else than to share just one. If both your primes are the same as someone else’s, you could read each other’s messages, but a third party attacker could not. If you’re lucky, the person you share primes with doesn’t know that you share primes or isn’t interested in reading your mail. But if that person’s private key is ever disclosed, so is yours.

Python code

Nearly all the time required to run the script is devoted to finding the primes, so we time just the factorization.

    from secrets import randbits
    from sympy import nextprime, gcd
    from timeit import default_timer as timer
    
    numbits = 2048
    
    p = nextprime(randbits(numbits))
    q = nextprime(randbits(numbits))
    r = nextprime(randbits(numbits))                                      
    
    m = p*q
    n = p*r
    
    t0 = timer()
    g = gcd(m, n)
    assert(p == g)
    assert(q == m//p)
    assert(r == n//p)
    t1 = timer()
    
    # Print time in milliseconds
    print(1000*(t1 - t0))

Python notes

There are a few noteworthy things about the Python code.

First, it uses a cryptographic random number generator, not the default generator, to create 2048-bit random numbers.

Second, it uses the portable default_timer which chooses the appropriate timer on different operating systems. Note that it returns wall clock time, not CPU time.

Finally, it uses integer division // to recover q and r. If you use the customary division operator Python will carry out integer division and attempt to cast the result to a float, resulting in the error message “OverflowError: integer division result too large for a float.”

GCD vs factoring

If you wanted to find the greatest common divisor of two small numbers by hand, you’d probably start by factoring the two numbers. But as Euclid knew, you don’t have to factor two numbers before you can find the greatest common factor of the numbers. For large numbers, the Euclidean algorithm is orders of magnitude faster than factoring.

Nobody has announced being able to factor a 1024 bit number yet. The number m and n above have four times as many bits. The largest of the RSA numbers factored so far has 768 bits. This was done in 2009 and took approximately two CPU millennia, i.e. around 2000 CPU years.

Fast Euclidean algorithm

The classic Euclidean algorithm takes O(n²) operations to find the greatest common divisor of two integers of length n. But there are modern sub-quadratic variations on the Euclidean algorithm that take

O(n log²(n) log(log(n)))

operations, which is much faster for very large numbers. I believe SymPy is using the classic Euclidean algorithm, but it could transparently switch to using the fast Euclidean algorithm for large inputs.

Related posts

[1] As noted in the comments, this has happened due to faulty software implementation, but it shouldn’t happen by chance. By the prime number theorem, the number of n-bit primes is approximately 2n / (n log 2). If M people choose 2M primes each n bits long, the probability of two of these primes being the same is roughly

22-n M n log 2.

If M = 7.5 billion (the current world population) and n = 1024, the probability of two primes being the same is about 10-295. This roughly the probability of winning the Powerball jackpot 40 times in a row.

RSA with Pseudoprimes

RSA setup

Recall the setup for RSA encryption given in the previous post.

  1. Select two very large prime numbers p and q.
  2. Compute n = pq and φ(n) = (p – 1)(q – 1).
  3. Choose an encryption key e relatively prime to φ(n).
  4. Calculate the decryption key d such that ed = 1 (mod φ(n)).
  5. Publish e and n, and keep dp, and q secret.

φ is Euler’s totient function, defined here.

There’s a complication in the first step. Maybe you think the numbers p and q are prime, but they’re not. In that case the calculation of φ in step 2 is wrong.

Pseudoprimes

The numbers p and q need to be “very large,” where the definition of what constitutes large changes over time due to Moore’s law and progress with factoring algorithms. Currently p and q would typically have at least 2048 bits each. It’s easy to find numbers that large that are probably prime, but it’s difficult to be certain.

At the moment, the fastest way to test for primes has a small chance making a false positive error, but no chance of a false negative. That is, if the test says a number is composite, it is certainly composite. But if the test says a number may be prime, there is a small chance that it is not prime. (Details here.) So in practice RSA starts by finding two large probable primes or pseudoprimes.

Discussions of RSA encryption often point out that large pseudoprimes are very rare, so it isn’t a problem that RSA starts with pseudoprimes. But what does that mean? Is there a one in a trillion chance that your private key won’t work and nobody can send you messages? Or that you can receive some messages and not others?

Encryption and decryption

RSA encryption works by taking a message m and raising it to the exponent e modulo n where e and n are defined at the top of the post. To decrypt the message, you raise it to the exponent d modulo n where d is your private decryption key. Because d was computed to satisfy

ed = 1 (mod φ(n)),

Euler’s theorem says that we’ll get our message back. We’ll give an example below.

What if p or q are pseudoprime?

If p and q are prime, then φ(n) = (p – 1)(q – 1). But if we’re wrong in our assumption that one of these factors is prime, our calculation of φ(n) will be wrong. Will our encryption and decryption process work anyway? Maybe.

We’ll do three examples below, all using small numbers. We’ll use e = 257 as our public encryption exponent and m = 42 as the message to encrypt in all examples.

In the first example p and q are indeed prime, and everything works as it should. In the next two examples we will replace p with a pseudoprime. In the second example everything works despite our error, but in the third example decryption fails.

Example 1: p and q prime

We’ll start with p = 337 and q = 283. Both are prime. The Python code below shows that d = 60833 and the encrypted message is 45431. Everything works as advertised.

Example 2: p pseudoprime

Now we use p = 341 and q = 283. Here p is a pseudoprime for base 2, i.e.

2340 = 1 mod 341

and so 341 passes Fermat’s primality test [1] for b = 2. Now d = 10073 and the encrypted message is 94956. Decrypting the encrypted message works, even though p is not prime and our calculation for φ is wrong. In fact, the process works not just for our message m = 42 but for every message.

Example 3: p pseudoprime

Here again we let p be the pseudoprime 341 but set q to the prime 389. Now d = 6673, the encrypted message is 7660, but decrypting the encrypted message returns 55669, not 42 as we started with. Decryption fails for all other messages as well.

If we use the correct value for φ(pq) the example works. That is, if we use

φ(341*389) = φ(11*31*389) = 10*30*388

rather than the incorrect value 340*388 the decryption process recovers our original message.

What can go wrong

The examples above show that if we mistakenly think one of our numbers is a prime when it is only a pseudoprime, decryption might succeed or it might fail. In either case, we assume npq has two prime factors when in fact it has more, and so n is easier to factor than we thought.

Python code

Here’s the code for the examples above.

    from sympy import gcd, mod_inverse
    
    message = 42
    e = 257
    
    def example(p, q):
        n = p*q
        phi = (p-1)*(q-1)
        assert( gcd(e, phi) == 1 )
        
        d = mod_inverse(e, phi)
        assert( d*e % phi == 1 )
    
        encrypted = pow(message, e, n)
        decrypted = pow(encrypted, d, n)
        return (message == decrypted)
    
    print(example(337, 283))
    print(example(341, 283))
    print(example(341, 389))

Related posts

[1] Fermat’s primality test is explained here. In our example we only tried one base, b = 2. If we had also tried b = 3 we would have discovered that 341 is not prime. In practice one would try several different bases. Using the heuristic that failures are independent for each base, it’s very unlikely that a composite number would be a pseudoprime for each of, say, 5o different bases.

RSA encryption exponents are mostly all the same

The big idea of public key cryptography is that it lets you publish an encryption key e without compromising your decryption key d. A somewhat surprising detail of RSA public key cryptography is that in practice e is nearly always the same number, specifically e = 65537. We will review RSA, explain how this default e was chosen, and discuss why this may or may not be a problem.

RSA setup

Here’s the setup for RSA encryption in a nutshell.

  1. Select two very large prime numbers p and q.
  2. Compute npq and φ(n) = (p – 1)(q – 1).
  3. Choose an encryption key e relatively prime to φ(n).
  4. Calculate the decryption key d such that ed = 1 (mod φ(n)).
  5. Publish e and n, and keep dp, and q secret.

The asymmetry in the encryption scheme comes from the fact that the person who knows p and q can compute n, φ(n), and d easily, but no one else can find d without factoring n. (Or so it seems.)

The encryption exponent

Here we focus on the third step above. In theory e could be very large, but very often e = 65537. In that case, we don’t actually choose e to be relatively prime to φ(n). Instead, we pick p and q, and hence n, and usually φ(n) will be relatively prime to 65537, but if it’s not, we choose p and q again until gcd(65537, φ(n)) = 1. Kraft and Washington [1] have this to say about the choice of e:

… surprisingly, e does not have to be large. Sometimes e = 3 has been used, although it is usually recommended that larger values be used. In practice, the most common value is e = 65537, The advantages are that 65537 is a relatively large prime, so it’s easier to arrange that gcd(e, φ(n)) = 1, and it is one more than a power of 2, so raising a number to the 65537th power consists mostly of squarings …

In short, the customary value of e was chosen for efficiency. Also, there aren’t many choices for similar values of e [3].

Heninger and Shacham [2] go into more detail.

The choice e = 65537 = 216 + 1 is especially widespread. Of the certificates observed in the UCSD TLS 2 Corpus (which was obtained by surveying frequently-used TLS servers), 99.5% had e = 65537, and all had e at most 32 bits.

So the “most common value” mentioned by Kraft and Washington appears to be used 99.5% of the time.

Is this a problem?

Is it a problem that e is very often the same number? At first glace no. If the number e is going to be public anyway, there doesn’t seem to be any harm in selecting it even before you choose p and q.

There may be a problem, however, that the value nearly everyone has chosen is fairly small. In particular, [2] gives an algorithm for recovering private RSA keys when some of the bits are know and the exponent e is small. How would some of the bits be known? If someone has physical access to a computer, they can recover some bits from it’s memory.

Related posts

[1] James S. Kraft and Lawrence C. Washington. An Introduction to Number Theory with Cryptography. 2nd edition.

[2] Nadia Heninger and Hovav Shacham. Reconstructing RSA Private Keys from Random Key Bits.

[3] The property that makes e an efficient exponent is that it is a Fermat prime, i.e. it has the form 22m + 1 with m = 4. Raising a number to a Fermat number exponent takes n squarings and 1 multiplication. And being prime increases the changes that it will be relatively prime to φ(n). Unfortunately 65537 may be the largest Fermat prime. We know for certain it’s the largest Fermat prime that can be written in less than a billion digits. So if there are more Fermat primes, they’re too large to use.

However, there are smaller Fermat primes than 65537, and in fact the quote from [1] mentions the smallest, 3, has been used in practice. The remaining Fermat primes are 5, 17, and 257. Perhaps these have been used in practice as well, though that may not be a good idea.

Mersenne prime trend

Mersenne primes have the form 2p -1 where p is a prime. The graph below plots the trend in the size of these numbers based on the 50 51 Mersenne primes currently known.

The vertical axis plots the exponents p, which are essentially the logs base 2 of the Mersenne primes. The scale is logarithmic, so this plot suggests that the Mersenne primes grow approximately linearly on a double logarithmic scale.

Trend in Mersenne primes

Update: A new Mersenne prime was announced December 21, 2018. The graph was updated to include this new data point.

Related posts

Searching for Mersenne primes

The nth Mersenne number is

Mn = 2n – 1.

A Mersenne prime is a Mersenne number which is also prime. So far 50 51 have been found [1].

A necessary condition for Mn to be prime is that n is prime, so searches for Mersenne numbers only test prime values of n. It’s not sufficient for n to be prime as you can see from the example

M11 = 2047 = 23 × 89.

Lucas-Lehmer test

The largest known prime has been a Mersenne prime since 1952, with one exception in 1989. This is because there is an efficient algorithm, the Lucas-Lehmer test, for determining whether a Mersenne number is prime. This is the algorithm used by GIMPS (Great Internet Mersenne Prime Search).

The Lucas-Lehmer test is very simple. The following Python code tests whether Mp is prime.

    def lucas_lehmer(p):
        M = 2**p - 1
        s = 4
        for _ in range(p-2):
            s = (s*s - 2) % M
        return s == 0

Using this code I was able to verify the first 25 Mersenne primes in under 50 seconds. This includes all Mersenne primes that were known as of 40 years ago.

History

Mersenne primes are named after the French monk Marin Mersenne (1588–1648) who compiled a list of Mersenne primes.

Édouard Lucas came up with his test for Mersenne primes in 1856 and in 1876 proved that M127 is prime. That is, he found a 39-digit prime number by hand. Derrick Lehmer refined the test in 1930.

As of January 2018, the largest known prime is M77,232,917.

Related posts

[1] We’ve found 51 Mersenne primes as of December 22, 2018, but we’re not sure whether we’ve found the first 51 Mersenne primes. We know we’ve found the 47 smallest Mersenne primes. It’s possible that there are other Mersenne primes between the 47th and the 50th known Mersenne primes, and there may be one hiding between the 50th and 51st.

Searching for Fermat primes

Fermat numbers have the form

F_n = 2^{2^n} + 1

Fermat numbers are prime if n = 0, 1, 2, 3, or 4. Nobody has confirmed that any other Fermat numbers are prime. Maybe there are only five Fermat primes and we’ve found all of them. But there might be infinitely many Fermat primes. Nobody knows.

There’s a specialized test for checking whether a Fermat number is prime, Pépin’s test. It says that for n ≥ 1, the Fermat number Fn is prime if and only if

3^{(F_n - 1)/2} \equiv -1 \pmod {F_n}

We can verify fairly quickly that F1 through F4 are prime and that F5 through F14 are not with the following Python code.

    def pepin(n):
        x = 2**(2**n - 1)
        y = 2*x
        return y == pow(3, x, y+1)

    for i in range(1, 15):
        print(pepin(i))

After that the algorithm gets noticeably slower.

We have an efficient algorithm for testing whether Fermat numbers are prime, efficient relative to the size of numbers involved. But the size of the Fermat numbers is growing exponentially. The number of digits in Fn is

1 + \lfloor 2^n \log_{10} 2 \rfloor \approx 0.3 \times 2^n

So F14 has 4,933 digits, for example.

The Fermat numbers for n = 5 to 32 are known to be composite. Nobody knows whether F33 is prime. In principle you could find out by using Pépin’s test, but the number you’d need to test has 2,585,827,973 digits, and that will take a while. The problem is not so much that the exponent is so big but that the modulus is also very big.

The next post presents an analogous test for whether a Mersenne number is prime.

Prime denominators and nines complement

Let p be a prime. If the repeating decimal for the fraction a/p has even period, the the second half of the decimals are the 9’s complement of the first half. This is known as Midy’s theorem.

For a small example, take

1/7 = 0.142857142857…

and notice that 142 + 857 = 999. That is, 8, 5, and 7 are the nine’s complements of 1, 4, and 2 respectively.

For a larger example, we can use Mathematica to look at the decimal expansion of 6/47:

    In:  N[6/47, 60]
    Out: 0.127659574468085106382978723404255319148936170212765957446809

and we can confirm

    12765957446808510638297 + 
    87234042553191489361702 =
    99999999999999999999999

Let’s do another example with 6/43:

    In:  N[6/43, 50]
    Out: 0.13953488372093023255813953488372093023255813953488

Does the theorem hold here? The the hypothesis of the theorem doesn’t hold because the fraction has period 21, which is not even. Can the conclusion of the theorem hold anyway? No, because we can’t split 21 digits in half! But we can split 21 digits into three parts, and if we do we find

    1395348 + 8372093 + 232558 = 9999999

Let’s finish up with an example in another base. We’ll look at the fraction 3/7 in base 17.

    In:  BaseForm[N[3/7, 24], 17]
    Out: 0.74e9c274e9c274e9c2717

If we split the repeating part of the decimal into two parts we get 74e and 9c2, and these add up to the base 17 analog of all 9’s.

    In:  BaseForm[17^^74e + 17^^9c2, 17]
    Out: ggg17

That is, the base 17 numbers 9, c, and 2, are the g’s complements of 7, 4, and e respectively.

[The Mathematica operator ^^ interprets its right argument as a number whose base is given by the left argument. The BaseForm function takes a number as its first argument and returns its representation in the base given as its second argument.]

Related post: Casting out Z’s

Integer odds and prime numbers

For every integer m > 1, it’s possible to choose N so that the proportion of primes in the sequence 1, 2, 3, … N is 1/m. To put it another way, you can make the odds against one of the first N natural numbers being prime any integer value you’d like [1].

For example, suppose you wanted to find N so that 1/7 of the first N positive integers are prime. Then the following Python code shows you could pick N = 3059.

    from sympy import primepi

    m = 7

    N = 2*m
    while N / primepi(N) != m:
        N += m
    print(N)

Related posts

[1] Solomon Golomb. On the Ratio of N to π(N). The American Mathematical Monthly, Vol. 69, No. 1 (Jan., 1962), pp. 36-37.

The proof is short, and doesn’t particularly depend on the distribution of primes. Golomb proves a more general theorem for any class of integers whose density goes to zero.

Prime interruption

Suppose you have a number that you believe to be prime. You start reading your number aloud, and someone interrupts “Stop right there! No prime starts with the digits you’ve read so far.”

It turns out the person interrupting you shouldn’t be so sure. There are no restrictions on the digits a prime number can begin with. (Ending digits are another matter. No prime ends in 0, for example.) Said another way, given any sequence of digits, it’s possible to add more digits to the end and make a prime. [1]

For example, take today’s date in ISO format: 20181008. Obviously not a prime. Can we find digits to add to make it into a prime? Yes, we can add 03 on to the end because 2018100803 is prime.

What about my work phone number: 83242286846? Yes, just add a 9 on the end because 832422868469 is prime.

This works in any base you’d like. For example, the hexadecimal number CAFEBABE is not prime, but CAFEBABE1 is. Or if you interpret SQUEAMISH as a base 36 number, you can form a base 36 prime by sticking a T on the end. [2]

In each of these example, we haven’t had to add much on the end to form a prime. You can show that this is to be expected from the distribution of prime numbers.

Related posts

[1] Source: R. S. Bird. Integers with Given Initial Digits. The American Mathematical Monthly, Vol. 79, No. 4 (Apr., 1972), pp. 367-370

[2] CAFEBABE is a magic number used at the beginning of Java bytecode files. The word “squeamish” here is an homage to “The Magic Words are Squeamish Ossifrage,” an example cleartext used by the inventors of RSA encryption.