Closest consecutive reciprocal sum to an integer

József Kürschák proved in 1908 that the function

f(m, n) = \sum_{k = m}^n \frac{1}{k}

is never an integer for 0 < m < n. In particular, the harmonic numbers

H_n = f(1, n)

are never integers for n > 1.

The function f(mn) can get arbitrarily close to any integer value by taking m and n large enough, but it can never exactly equal an integer.

For this post, I’d like to look at how close  f(mn) comes to an integer value when 0 < mnN for some large N, say N = 100,000.

Computation strategy

The most naive way to approach this would be to compute f(mn) for all m and n and keep track of which values were closest to integers. This would be wasteful since it would recompute the same terms over and over. Instead, we could take advantage of the fact that

f(m, n + 1) = f(m, n) + \frac{1}{n+1}

Instead of working with f(mn), it will be convenient to work with just its fractional part

g(m, n) = f(m, n) - \lfloor f(m, n) \rfloor

because it won’t hurt to throw away the integer parts as we go. The values of m and n minimizing g(mn) will be the values for which f(mn) comes closest to an integer from above. The values of m and n maximizing g(mn) will be the values for which f(mn) comes closest to an integer from below.

We could calculate a matrix with all values of g(m, n), but this would take O(N²) memory. Instead, for each n we will calculate g(mn), save the maximum and minimum values, then overwrite that memory with g(mn + 1). This approach will only take O(N) memory.

Floating point error

When we compute f(mn) for large values of n, can we rely on floating point arithmetic?

If N = 100,000, f(mn) < 16 = 24. A floating point fraction has 53 bits, so we’d expect each addition to be correct to within an error of 2−49 and so we’d expect our total error to be less than 2−49 N.

Python code

The following code computes the values of g(mn) closest to 0 and 1.

import numpy as np

N = 100_000
f_m = np.zeros(N+1) # working memory

# best values of m for each n
min_fm = np.zeros(N+1)
max_fm = np.zeros(N+1)

n = 2
f_m[1] = 1.5

for n in range(3, N+1):
    f_m[n-1] = 1/(n-1)
    f_m[1:n] += 1/n
    f_m[1:n] -= np.floor(f_m[1:n])
    min_fm[n] = np.min(f_m[1:n])
    max_fm[n] = np.max(f_m[1:n])    

print(min(min_fm[3:]))
print(max(max_fm))

This reports a minimum value of 5.2841953035454026e-11 and a maximum value of 0.9999999996613634. The minimum value is closer to 0 than our (pessimistic) error estimate, though the maximum value is further from 1 than our error estimate.

Modifying the code a bit shows that the minimum occurs at (27134, 73756), and this the input to g that is within our error estimate. So we can be confident that it is the minimum, though we can’t be confident of its value. So next we turn to Mathematica to find the exact value of g(27133, 73756) as a rational number, a fraction with 32024 digits in the numerator and denominator, and convert it to a floating point number. The result agrees with our estimate in magnitude and to four significant figures.

So in summary

\sum_{k = 27134}^{73756} \frac{1}{k} \approx 1

with an error on the order of 10−11, and this is the closest value of f(mn) to an integer, for 0 < mn ≤ 100,000.

Related posts

Freshman’s dream

The “Freshman’s dream” is the statement

(x + y)p = xp + yp

It’s not true in general, but it is true mod p if p is a prime. It’s a cute result, but it’s also useful in applications, such as finite field computations in cryptography.

Here’s a demonstration of the Freshman’s dream in Python.

>>> p = 5
>>> [((x + y)**p - x**p - y**p) % p for x in range(p) for y in range(p)]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Here’s an example using a prime too large for verify the results by looking at the output.

>>> import numpy as np
>>> p = 103
>>> v = [((x + y)**p - x**p - y**p) % p for x in range(p) for y in range(p)]
>>> np.all( np.array(v) == 0 )
True

You can use the same code to show that the Freshman’s dream is not true in general if p is not a prime, and it’s not true in general if p is a prime but the exponent is less than p.

987654321 / 123456789

I recently saw someone post [1] that 987654321/123456789 is very nearly 8, specifically 8.0000000729.

I wondered whether there’s anything distinct about base 10 in this. For example, would the ratio of 54321six and 12345six be close to an integer? The ratio is 4.00268, which is pretty close to 4.

What about a larger base? Let’s try base 16. The expression

0xFEDCBA987654321 / 0x123456789ABCDEF

in Python returns 14. The exact ratio is not 14, but it’s as close to 14 as a standard floating point number can be.

For a base b, let denom(b) to be the number formed by concatenating all the digits in ascending order and let num(b) be the number formed by concatenating all the digits in descending order.

\begin{align*} \text{num}(b) &= \sum_{k=1}^{b-1} kb^{k-1} \\ \text{denom}(b) &= \sum_{k=1}^{b-1} (b-k)b^{k-1} \end{align*}

Then for b > 2 we have

\frac{\text{num}(b)}{\text{denom}(b)} = b - 2 + \frac{b-1}{\text{denom}(b)}

The following Python code demonstrates [2] that this is true for b up to 1000.

num = lambda b: sum([k*b**(k-1) for k in range(1, b)])
denom = lambda b: sum([(b-k)*b**(k-1) for k in range(1, b)])

for b in range(3, 1001):
    n, d = num(b), denom(b)
    assert(n // d == b-2)
    assert(n % d == b-1)

So for any base the ratio is nearly an integer, namely b − 2, and the fractional part is roughly 1/bb−2.

When b = 16, as in the example above, the result is approximately

14 + 16−14 = 8 + 4 + 2 + 2−56

which would take 60 bits to represent exactly, but a floating point fraction only has 53 bits. That’s why our calculation returned exactly 14 with no fractional part.

 

[1] I saw @ColinTheMathmo post it on Mastodon. He said he saw it on Fermat’s Library somewhere. I assume it’s a very old observation and that the analysis I did above has been done many times before.

[2] Why include a script rather than a proof? One reason is that the proof is straight-forward but tedious and the script is compact.

A more general reason that I give computational demonstrations of theorems is that programs are complementary to proofs. Programs and proofs are both subject to bugs, but they’re not likely to have the same bugs. And because programs made details explicit by necessity, a program might fill in gaps that aren’t sufficiently spelled out in a proof.

Turning trig identities into Fibonacci identities

In 2013, John Conway and Alex Ryba published a brief note [1] on how to convert identities involving sine and cosine into identities involving Fibonacci and Lucas numbers.

Fibonacci and Lucas

The Fibonacci numbers Fn are defined by F0 = 0, F1 = 1, and

Fn+2 = Fn + Fn+1

for n > 1. Similarly, Lucas numbers Ln are defined by the same recurrence relation

Ln+2 = Ln + Ln+1

but starting with L0 = 2 and L1 = 1.

The recipe

The recipe for turning trigonometric identities into Fibonacci and Lucas number identities is as follws.

  1. Arguments become subscripts.
  2. Replace sines by ½ in Fn.
  3. Replace cosines by ½ in Ln.
  4. Insert a factor of (−5)k in front of every term that contains 2k or 2k + 1 sine terms.

The first step is admittedly unclear. It’s unclear in [1] as well. If sine or cosine takes an argument of

pα + qβ

then this becomes the subscript

paqb

where the Latin letters are integers and the Greek letters are angles.

There’s no proof in [1] why the recipe should work, but an earlier paper [2] gives more details.

Examples

Double angle

For a simple example, let’s start with the double angle identity

sin 2θ = 2 sin θ cos θ.

This becomes

½ i2n F2n = 2 (½ in Fn)(½ in Ln)

which simplifies to

F2n = Fn Ln.

Sum angle

For a little more involved example, let’s look at the sum identity

cos(θ + φ) = cos θ cos φ − sin θ sin φ.

This becomes

½ ia + b La + b = ½ ia La ½ ib Lb −(−5) ½ ia Fa ½ ib Fb

which simplifies to

2La + b = La Lb  + 5 Fa Fb.

Triple angle

In the previous examples the powers of i in the recipe above cancelled out. This example will show that they are necessary.

We start with the triple angle identity

sin 3θ = 3 sin θ − 4 sin³ θ.

This becomes

½ i3n F3n = 3 (½) in Fn − 4(−5)(½ in Fn

which simplifies to

F3n = 3 (−1)n Fn + 5 (Fn)³.

Demonstration

The following Python code demonstrates that the Fibonacci / Lucas identities above are correct.

def F(n):
    if n == 0: return 0
    if n == 1: return 1
    return F(n-1) + F(n-2)

def L(n):
    if n == 0: return 2
    if n == 1: return 1
    return L(n-1) + L(n-2)

for n in range(10):
    assert(F(2*n) == F(n) * L(n))

for n in range(10):
    for m in range(10):
        assert(2*L(m + n) == L(m)*L(n) + 5*F(m)*F(n))

for n in range(10):
    assert(F(3*n) == 3*(-1)**n * F(n) + 5*F(n)**3)

Related posts

[1] John Conway and Alex Ryba. Fibonometry. The Mathematical Gazette, November 2013, pp. 494–495

[2] Barry Lewis, Trigonometry and Fibonacci Numbers, The Mathematical Gazette, July 2007, pp. 216–226

More on Carmichael

This morning I took an old blog post and turned it into an X thread. I think the thread is easier to read. More expository and less rigorous.

The post and thread look at generalizations of the fact that every integer and its fifth power end in the same digit. The conclusion is that n and nk end in the same digit base b if b is square-free and k = φ(b) + 1. So, for example, 10 is square-free (i.e. not divisible by a non-trivial square) and φ(10) + 1 = 5.

Benjamin Clark replied suggesting looking at λ(b) + 1 in addition to φ(n) + 1.

Euler and Carmichael

To back up a bit, φ(n) is Euler’s totient function, the number of positive integers less than and relatively prime to n. Robert Carmichael’s totient function λ(n) is a closely related function, one that replaced Euler’s function in implementations of RSA encryption—more specifically in RSA decryption—something I wrote about earlier this week.

Euler’s generalization of Fermat’s little theorem says if a is relatively prime to m, then

aφ(n) = 1 (mod n).

Carmichael’s totient λ(n) is the smallest number such that

aλ(n) = 1 (mod n)

when a is relatively prime to n.

Sometimes φ(n) = λ(n), namely for n = 1, 2, 4, and every odd prime power. Otherwise λ(n) is a proper divisor of φ(n).

Carmichael’s conjecture

Carmichael conjectured that for every n there is at least one other integer m ≠ n such that φ(m) = φ(n). He proved that this is true at least for n less than 1037. The conjecture is now known to be true for n less than 101010.

RSA with multiple primes

Typically RSA public keys are the product of two large primes, npq. But there’s no reason they couldn’t be the product of say three primes, npqr, or more primes, as long as φ(n), or λ(n), is calculated correctly.

Encryption is done the same way. Decryption could be done the same way, except there is the opportunity for it to be more efficient. The trick is to use the CRT (Chinese Remainder Theorem) in a way similar to Garner’s algorithm. This is why RSA with multiple primes is sometimes used for digital signatures.

The difficulty of factoring n using the GNFS algorithm doesn’t change depending on the number of factors n has, as long as all the factors are sufficiently large, far too large to find using trial division.

Daniel Bernstein’s post-quantum RSA paper was based on keys that are the product of a large number of 4096-bit primes. This way all the arithmetic is carried out modulo 4096-bit primes, not modulo terabyte primes.

Fermat primes and tangent numbers

Fermat numbers

The nth Fermat number is defined by

F(n) = 2^{2^n} + 1

Pierre Fermat conjectured that the F(n) were prime for all n, and they are for n = 0, 1, 2, 3, and 4, but Leonard Euler factored F(5), showing that it is not prime.

Tangent numbers

The nth tangent number is defined by the Taylor series for tangent:

\tan(z) = \sum_{n=0}^\infty T(n) \frac{z^n}{n!}

Another way to put it is that the exponential generating function for T(n) is tan(z).

Fermat primes and tangent numbers

Here’s a remarkable connection between Fermat numbers and tangent numbers, discovered by Richard McIntosh as an undergraduate [1]:

F(n) is prime if and only if F(n) does not divide T(F(n) − 2).

That is, the nth Fermat number is prime if and only if it does not divide the (F(n) − 2)th tangent number.

We could duplicate Euler’s assessment that F(5) is not prime by showing that 4294967297 does not divide the 4294967295th tangent number. That doesn’t sound very practical, but it’s interesting.

Update: To see just how impractical the result in this post would be for testing whether a Fermat number is prime, I found an asymptotic estimate of tangent numbers on OEIS,  and estimated that the 4294967295th tangent number has about 80 billion digits.

[1] Richard McIntosh. A Necessary and Sufficient Condition for the Primality of Fermat Numbers. The American Mathematical Monthly, Vol. 90, No. 2 (Feb., 1983), pp. 98–99

Time needed to factor large integers

The optimal choice of factoring algorithm depends on the size of the number you want to factor. For numbers larger than a googol (10100) the GNFS (general number field sieve) algorithm is the fastest known factoring algorithm, making GNFS the algorithm of choice for trying to factor public keys for RSA encryption

The expected time required to factor a number n using the GNFS is proportional to

exp( f(n) g(n) )

where f(n) is relatively constant and g(n) varies more strongly with n. More specifically

f(n) = (64/9)1/3 + o(1)

and

g(n) = (log n)1/3 (log log n)2/3.

The notation o(1) means a term that goes to zero as n increases. Very often in practice you can ignore o(1) terms when your value of n is large. In cryptographic applications n is certainly large, at least 21024, and yet the o(1) term is still significant for values of n seen in practice. I’ve seen one source say that for keys used in practice the o(1) term is around 0.27.

Security levels

It is widely stated that factoring a 1024-bit private key is comparable in difficulty to breaking an 80-bit symmetric encryption key, i.e. that 1024-bit keys provide 80-bit security.

To find the security level of a 1024-bit RSA key, the size of the o(1) term matters. But given this, if we want to find the security level of more RSA key sizes, it doesn’t matter how big the o(1) term is. It only that the term is roughly constant over the range of key sizes we are interested in.

If f(n) is relatively constant, then the log of the time to factor n is proportional to g(n), and the log of the time to break an encryption with security level k is proportional to k. So the security level of a key n should be proportional to g(n). Let’s see if that’s the case, using the reported security levels of various key sizes.

import numpy as np

# RSA key sizes and security levels
data = {
    1024 : 80,
    2048 : 112,
    3072 : 128,
    4096 : 140,
    7680 : 192,
}
for d in data:
    x = d*np.log(2) # natural log of 2^d
    y = x**(1/3)*(np.log(x)**(2/3)) 
    print(data[d]/y)

This prints the following:

2.5580
2.6584
2.5596
2.4819
2.6237

So the ratio is fairly constant, as expected. All the reported security levels are multiples of 4, which suggests there was some rounding done before reporting them. This would account for some of the variation above.

The output above suggests that the security level of an RSA key with b bits is roughly

2.55 x1/3 log(x)2/3

where x = log(2) b.

Aside on RSA security

RSA encryption is not broken by factoring keys but by exploiting implementation flaws.

Factoring a 2048-bit RSA key would require more energy than the world produces in a year, as explained here.

Extrapolating quantum factoring

In 2001, a quantum computer was able to factor 15.

In 2012, a quantum computer was able to factor 21, sorta. They cheated by not implementing gates that they knew a priori would not be used. To this day, there still hasn’t been a quantum computer to factor 21 without taking some shortcuts. Some reasons why are given here.

Extrapolating from two data points is kinda ridiculous, but we only have two data points, and we’re going to do it anyway.

xkcd 605: Extrapolating

Some may object that extrapolating the size of the numbers that can be factored isn’t the right approach. Instead we should be extrapolating the number of qubits or something else. Fine, but that’s not what we’re doing in this post.

Linear interpolation

Using linear interpolation, a quantum computer wouldn’t be able to factor a cryptographically relevant prime number before the heat death of the universe.

Exponential interpolation

Let’s switch to exponential extrapolation. Instead of assuming the size of the numbers grows linearly, we’ll assume the number of bits in the numbers grows linearly, meaning the numbers grow exponentially.

In about 10 years, the size of number that could be factored increased by

21/15 = 1.4 ≈ √2.

This means the size of the numbers increased by about half a bit in a decade. By now we should be able to factor 35.

RSA keys have at least 1024 bits, so at half a bit per decade, quantum computers should be able to factor RSA keys 20,000 years from now.

Double exponential interpolation

Now let’s assume the number of bits in a number that a quantum computer can factor is itself growing exponentially, meaning that the numbers are growing doubly exponentially, doubling every decade. Then we should be able to factor 9-bit numbers by now, and in 100 years we will be able to factor RSA keys.

Breaking RSA in 10 years

The estimate above is kinda arbitrary because a double exponential function has three degrees of freedom and so we’d need three data points, which we don’t have.

We can fit a double exponential by making up a third data point. Some researchers speculate that a quantum computer will be able to factor 1024-bit RSA keys 10 years from now. If so, that gives us a third data point.

To make things easier, let’s work in terms of bits and set x to be the number of years since 2000. Then we have

f(x) = ab 2cx

with f(1) = 15, f(12) = 21, and f(35) = 1024. We can fit this function using Python [1].

from scipy.optimize import curve_fit

def f(x, a, b, c):
    return a +  b * 2**(c*x)

x_data = [1.0, 12.0, 35.0]
y_data = [log2(15), log2(21), 1024]  
initial_guess =  [4, 0.01, 0.3]

popt, pcov = curve_fit(f, x_data, y_data, p0=initial_guess, maxfev=10000)
a, b, c = popt
print(f"Fitted parameters: {a=}, {b=}, {c=}")

The parameter values are a = 3.893887005243357, b = 0.0093347992761344, c = 0.4782191085655488.

Here’s a plot of the fitted function f.

If we’re going to be able to factor 1024-bit integers by 2035, according to a double exponential model, we’re already behind schedule. We should be factoring 40-bit numbers by now, but so far we’ve only factored a 4-bit number.

Or to put it another way, those who believe we will be able to factor 1024-bit integers by 2035 are implicitly assuming a future rate of growth faster than double exponential. And maybe they’re right.

If we’re going to break 1024-bit RSA keys by 2035, at some point we’ll have to do better than the curve above, and so far we’re doing worse.

Bookmark this post and come back when a new quantum factoring record is set and rerun the code with new data.

Related posts

[1] The curve_fit function needed a lot of help to work. See the next post for a better way to fit the function.

Impossible rational triangles

A rational triangle is a triangle whose sides have rational length and whose area is rational. Can any two rational numbers be sizes of a rational triangle? Surprisingly no. You can always find a third side of rational length, but it might not be possible to do so while keeping the area rational.

The following theorem comes from [1].

Let q be a prime number not congruent to 1 or −1 mod 8. And let ω be an odd integer such that no prime factor of q − 1 is congruent to 1 mod 4. Then a = qω + 1 and b = qω − 1 are not the sides of any rational triangle.

To make the theorem explicitly clear, here’s a Python implementation.

from sympy import isprime, primefactors

def test(q, omega):
    if not isprime(q):
        return False
    if q % 8 == 1 or q % 8 == 7:
        return False
    if omega % 2 == 0:
        return False
    factors = primefactors(q**(2*omega) - 1)
    for f in factors:
        if f % 4 == 1:
            return False
    return True

We can see that q = 2 and ω = 1 passes the test, so there is no rational triangle with sides a = 21 + 1 = 3 and b = 21 − 1 = 1.

You could use the code above to search for (ab) pairs systematically.

from sympy import primerange

for q in primerange(20):
    for omega in range(1, 20, 2):
        if test(q, omega):
            a, b = q**omega + 1, q**omega - 1
            print(a, b)

This outputs the following:

3 1
9 7
33 31
129 127
8193 8191
32769 32767
131073 131071
524289 524287
4 2
6 4
126 124
14 12

Note that one of the outputs is (4, 2). Since multiplying the sides of a rational triangle by a rational number gives another rational triangle, there cannot be a rational triangle in which one side is twice as long as another.

As the author in [1] points out, “There are undoubtedly many pairs not covered by [the theorem cited above]. It would be of interest to characterize all pairs.” The paper was published in 1976. Maybe there has been additional work since then.

Related posts

[1] N. J. Fine. On Rational Triangles. The American Mathematical Monthly. Vol. 83. No. 7, pp. 517–521.