Expressing a prime as the sum of two squares

I saw where Elon Musk posted Grok’s answer to the prompt “What are the most beautiful theorems.” I looked at the list, and there were no surprises, as you’d expect from a program that works by predicting the most likely sequence of words based on analyzing web pages.

There’s only one theorem on the list that hasn’t appeared on this blog, as far as I can recall, and that’s Fermat’s theorem that an odd prime p can be written as the sum of two squares if and only if p = 1 mod 4. The “only if” direction is easy [1] but the “if” direction takes more effort to prove.

If p is a prime and p = 1 mod 4, Fermat’s theorem guarantees the existence of x and y such that

x^2 + y^2 = p

Gauss’ formula

Stan Wagon [2] gave an algorithm for finding a pair (xy) to satisfy the equation above [2]. He also presents “a beautiful formula due to Gauss” which “does not seem to be of any value for computation.” Gauss’ formula says that if p = 4k + 1, then a solution is

\begin{align*} x &= \frac{1}{2} \binom{2k}{k} \pmod p \\ y &= (2k)\!!\, x \pmod p \end{align*}

For x and y we choose the residues mod p with |x| and |y| less than p/2.

Why would Wagon say Gauss’ formula is computationally useless? The number of multiplications required is apparently on the order of p and the size of the numbers involved grows like p!.

You can get around the problem of intermediate numbers getting too large by carrying out all calculations mod p, but I don’t see a way of implementing Gauss’ formula with less than O(p) modular multiplications [3].

Wagon’s algorithm

If we want to express a large prime p as a sum of two squares, an algorithm requiring O(p) multiplications is impractical. Wagon’s algorithm is much more efficient.

You can find the details of Wagon’s algorithm in [3], but the two key components are finding a quadratic non-residue mod p (a number c such that cx² mod p for any x) and the Euclidean algorithm. Since half the numbers between 1 and p − 1 are quadratic non-residues, you’re very likely to find a non-residue after a few attempts.

 

[1] The square of an integer is either equal to 0 or 1 mod 4, so the sum of two squares cannot equal 3 mod 4.

[2] Stan Wagon. The Euclidean Algorithm Strikes Again. The American Mathematical Monthly, Vol. 97, No. 2 (Feb., 1990), pp. 125-129.

[3] Wilson’s theorem gives a fast way to compute (n − 1)! mod n. Maybe there’s some analogous identity that could speed up the calculation of the necessary factorials mod p, but I don’t know what it would be.

 

Aligning one matrix with another

Suppose you have two n × n matrices, A and B, and you would like to find a rotation matrix Ω that lines up B with A. That is, you’d like to find Ω such that

A = ΩB.

This is asking too much, except in the trivial case of A and B being 1 × 1 matrices. You could view the matrix equation above as a set of n² equations in real numbers, but the space of orthogonal matrices only has n(n − 1) degrees of freedom [1].

When an equation doesn’t have an exact solution, the next best thing is to get as close as possible to a solution, typically in a least squares sense. The orthogonal Procrustes problem is to find an orthogonal matrix Ω minimizing the distance between A and ΩB That is, we want to minimize

|| A − ΩB ||

subject to the constraint that Ω is orthogonal. The matrix norm used in this problem is the Frobenius norm, the sum of the squares of the matrix entries. The Frobenius norm is the 2-norm if we straighten the matrices into vectors of dimension n².

Peter Schönemann found a solution to the orthogonal Procrustes problem in 1964. His solution involves singular value decomposition (SVD). This shouldn’t be surprising since SVD solves the problem of finding the closest thing to an inverse of an non-invertible matrix. (More on that here.)

Schönemann’s solution is to set MABT and find its singular value decomposition

M = UΣVT.

Then

Ω = UVT.

Python code

The following code illustrates solving the orthogonal Procrustes problem for random matrices.

import numpy as np

n = 3

# Generate random n x n matrices A and B
rng = np.random.default_rng(seed=20260211) 
A = rng.standard_normal((n, n))
B = rng.standard_normal((n, n))

# Compute M = A * B^T
M = A @ B.T

# SVD: M = U * Sigma * V^T
U, s, Vt = np.linalg.svd(M, full_matrices=False)

# R = U * V^T
R = U @ Vt

# Verify that R * R^T is very nearly the identity matrix
print("||R^T R - I||_F =", np.linalg.norm(R.T @ R - np.eye(n), ord="fro"))

In this example the Frobenius norm between RRT and I is 4 × 10−16, so essentially RRT = I to machine precision.

Related posts

[1] Every column of an orthogonal matrix Ω must have length 1, so that gives n constraints. Furthermore, each pair of columns must be orthogonal, which gives n choose 2 more constraints. We start with Ω having n² degrees of freedom, but then remove n and then n(n − 1)/2 degrees of freedom.

n² − nn(n − 1)/2 = n(n − 1)/2

Computing large Fibonacci numbers

The previous post discussed two ways to compute the nth Fibonacci number. The first is to compute all the Fibonacci numbers up to the nth iteratively using the defining property of Fibonacci numbers

Fn + 2 = Fn + Fn + 1

with extended integer arithmetic.

The second approach is to use Binet’s formula

Fn = round( φn / √ 5 )

where φ is the golden ratio.

It’s not clear which approach is more efficient. You might naively say that the iterative approach has run time O(n) while Binet’s formula is O(1). That doesn’t take into account how much work goes into each step, but it does suggest that eventually Binet wins.

The relative efficiency of each algorithm depends on how it is implemented. In this post I will compare using Python’s integer arithmetic and the mpmath library for floating point. Here’s my code for both methods.

from math import log10
import mpmath as mp

def fib_iterate(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

def digits_needed(n):

    phi = (1 + 5**0.5) / 2
    return int(n*log10(phi) - 0.5*log10(5)) + 1

def fib_mpmath(n, guard_digits=30):

    digits = digits_needed(n)

    # Set decimal digits of precision
    mp.mp.dps = digits + guard_digits

    sqrt5 = mp.sqrt(5)
    phi = (1 + sqrt5) / 2
    x = (phi ** n) / sqrt5

    return int(mp.nint(x))

Next, here’s some code to compare the run times.

def compare(n):
    
    start = time.perf_counter()
    x = fib_iterate(n)
    elapsed = time.perf_counter() - start
    print(elapsed)

    start = time.perf_counter()
    y = fib_mpmath(n)
    elapsed = time.perf_counter() - start
    print(elapsed)

    if (x != y):
        print("Methods produced different results.")

This code shows that the iterate approach is faster for n = 1,000 but Binet’s method is faster for n = 10,000.

>>> compare(1_000)
0.0002502090001144097
0.0009207079999669077
>>> compare(10_000)
0.0036547919999065925
0.002145750000181579

For larger n, the efficiency advantage of Binet’s formula becomes more apparent.

>>> compare(1_000_000)
11.169050417000108
2.0719056249999994

Guard digits and correctness

There is one unsettling problem with the function fib_mpmath above: how many guard digits do you need? To compute a number correctly to 100 significant figures, for example, requires more than 100 digits of working precision. How many more? It depends on the calculation. What about our calculation?

If we compute the 10,000th Fibonacci number using fib_mpmath(10_000, 2), i.e. with 2 guard digits, we get a result that is incorrect in the last digit. To compute the 1,000,000th Fibonacci number correctly, we need 5 guard digits.

We don’t need many guard digits, but we’re guessing at how many we need. How might we test whether we’ve guessed correctly? One way would be to compute the result using fib_iterate and compare results, but that defeats the purpose of using the more efficient fib_mpmath.

If floating point calculations produce an incorrect result, the error is likely to be in the least significant digits. If we knew that the last digit was correct, that would give us more confidence that all the digits are correct. More generally, we could test the result mod m. I discussed Fibonacci numbers mod m in this post.

When m = 10, the last digits of the Fibonacci numbers have a cycle of 60. So the Fibonacci numbers with index n and with index n mod 60 should be the same.

The post I just mentioned links to a paper by Niederreiter that says the Fibonacci numbers ore evenly distributed mod m if and only if m is a power of 5, in which case the cycle length is 4m.

The following code could be used as a sanity check on the result of fig_mpmath.

def mod_check(n, Fn):
    k = 3
    base = 5**k
    period = 4*base
    return Fn % base == fib_iterate(n % period) % base

With k = 3, we are checking the result of our calculation mod 125. It is unlikely that an incorrect result would be correct mod 125. It’s hard to say just how unlikely. Naively we could say there’s 1 chance in 125, but that ignores the fact that the errors are most likely in the least significant bits. The chances of an incorrect result being correct mod 125 would be much less than 1 in 125. For more assurance you could use a larger power of 5.

Fibonacci numbers and time-space tradeoffs

A few days ago I wrote about Fibonacci numbers and certificates. As I pointed out in the article, there’s no need to certify Fibonacci numbers, but the point of the post was to illustrate the idea of a solution certificate in a simple context. Practical uses of certificates are more complicated.

This time I want to use Fibonacci numbers to illustrate space tradeoffs. The post on Fibonacci certificates imagined providing someone a pair (Fr) where F is a large Fibonacci number, and r is a certificate proving that F is indeed a Fibonacci number. The goal was to minimize the computational effort to verify F. All the recipient needed to do was compute

| 5F² − r² |.

The number F is a Fibonacci number if and only if this number equals 4. The problem with this scenario is that F and r are both large numbers. They require transmitting and storing a lot of bits.

A much more space-efficient approach would be to transmit the index of the Fibonacci number and have the user compute the number. The example in the certificate post was (12586269025, 28143753123). Since 12586269025 is the 50th Fibonacci number, I could communicate it to someone by simply transmitting the number 50. That saves space, but it puts more computational burden on the recipient.

Fibonacci numbers grow exponentially with index size, and so the size of the nth Fibonacci number in bits is proportional to n. But the number of bits in n is proportional to log n. When n is large, the difference is dramatic.

How many bits are in the 1,000,000th Fibonacci number? The nth Fibonacci number is φn/√5 rounded to the nearest integer, so the number of bits in the millionth Fibonacci number would be

log21000000/√5) = 1000000 log2 φ − 0.5 log2 5

which is roughly 700,000. By contrast, one million is a 20 bit number. So transmitting “1000000” is far more efficient than transmitting the millionth Fibonacci number.

What does it take to compute the nth Fibonacci number? For small n, it’s fast and easy to compute the Fibonacci numbers up to n sequentially using the definition of the sequence. For large enough n, it would be faster to compute φn/√5. However, the former requires (extended) integer arithmetic, and the latter requires (extended) floating point arithmetic. It’s not clear where the crossover point would be where floating point would be more efficient. That’s the topic of the next post.

Minimum of cosine sum

Suppose f(x) is the sum of terms of the form cos(kx) where k is an integer from a set A with n elements.

f_A(x) = \sum_{k \in A} \cos(kx)

Then the maximum value of f is f(0) = n. But what is the minimum value of f?

The Chowla cosine conjecture says that the minimum should be less than −√n for large n. For now the best proven results are much smaller in absolute value [1].

I was playing around with this problem, and the first thing I thought of was to let the set A be the first n primes. This turned out to not be the most interesting example. Since all the primes except for the first are odd, and cos(kπ) = −1 for odd k, the minimum is 2 −n and occurs at π.

Here’s a plot where A is the set of primes less than 100.

For the cosine conjecture to be interesting, the set A should contain a mix of even and odd numbers.

Here’s a plot with A equal to a random selection of 25 points between 1 and 100. (I chose 25 because there are 25 primes less than 100.)

Here’s the Python code I used to generate the two sets A and the function to plot.

import numpy as np
from sympy import prime

def f(x, A):
    return sum([np.cos(k*x) for k in A])

n = 25
A_prime = [prime(i) for i in range(1, n+1)]
np.random.seed(20260207)
A_random = np.random.choice(range(1, 101), size=n, replace=False)

If you wanted to explore the Chowla conjecture numerically, direct use of minimization software is impractical. As you can tell from the plots above, there are a lot of local minima. If the values in A are not too large, you can look at a plot to see approximately where the minimum occurs, then use a numerical method to find the minimum in this region, but that doesn’t scale.

Here’s an approach that would scale better. You could find all the zeros of the derivative of fA and evaluate the function at each. One of these is the minimum. The derivative is a sum of sines with integer frequencies, and so it could be written as a polynomial in z = exp(ix) [2]. You could find all the zeros of this polynomial using the QR algorithm as discussed in the previous post.

 

[1] Benjamin Bedert. Polynomial bounds for the Chowla cosine problem. arXiv

[2] You get a polynomial of degree n in z and 1/z. Then multiply by z2n to get a polynomial in z only of degree 2n.

Eigenvalue homework problems are backward

Classroom

When you take a linear algebra course and get to the chapter on eigenvalues, your homework problems will include a small matrix A and you will be asked to find the eigenvalues. You do this by computing the determinant

det(A − λI) = P(λ)

and getting P(λ), a polynomial in λ. The roots of P are the eigenvalues of A.

Either A will be a 2 × 2 matrix, in which case you can find the roots using the quadratic formula, or the matrix will have been carefully selected so that P(λ) will be easy to factor. Otherwise, finding the roots of a polynomial is hard.

Real world

Numerical algorithms to find eigenvalues have gotten really good. In practice, you don’t compute determinants or find roots of polynomials. Instead you do something like the QR algorithm.

Finding all the roots of a polynomial is a challenging problem, and so what you might do in practice is find the roots by constructing a matrix, called the companion matrix, whose eigenvalues correspond to the roots you’re after.

Summary

As a classroom exercise, you calculate roots of polynomials to find eigenvalues.

In the real world, you might use an eigenvalue solver to find the roots of polynomials.

I wrote a similar post a few years ago. It explains that textbooks definite hyperbolic functions using ex, but you might want to compute ex using hyperbolic functions.

Fibonacci number certificates

Suppose I give you a big number F and claim that F is a Fibonacci number. How could you confirm this?

Before I go further, let me say what this post is really about. It’s not about Fibonacci numbers so much as it is about proofs and certificates. There’s no market for large Fibonacci numbers, and certainly no need to quickly verify that a number is a Fibonacci number.

You could write a program to generate Fibonacci numbers, and run it until it either produces F , in which case you know F is a Fibonacci number, or the program produces a larger number than F without having produced F, in which case you know it’s not a Fibonacci number. But there’s a faster way.

A certificate is data that allows you to confirm a solution to a problem in less time, usually far less time, than it took to generate the solution. For example, Pratt certificates give you a way to prove that a number is prime. For a large prime, you could verify its Pratt certificate much faster than directly trying to prove the number is prime.

There is a theorem that says a number f is a Fibonacci number if and only if one of 5f2 ± 4 is a perfect square. So in addition to F another number r that is a certificate that F is a Fibonacci number. You compute

N = 5F² − r²

and if N is equal to 4 or −4, you know that F is a Fibonacci number. Otherwise it is not.

Here’s a small example. Suppose I give you (12586269025, 28143753123) and claim that the first number is a Fibonacci number and the second number is its certificate. You can compute

5 × 12586269025² − 28143753123²

and get −4, verifying the claim.

Certificates are all about the amount of computation the verifier needs to do. The prover, i.e. the person producing the certificate, has to do extra work to provide a certificate in addition to a problem solution. This trade-off is acceptable, for example, in a blockchain where a user posts one transaction but many miners will verify many transactions.

Related posts

Γ(1/n)

If n is a positive integer, then rounding Γ(1/n) up to the nearest integer gives n. In symbols,

\left\lceil \Gamma\left( \tfrac{1}{n}\right) \right\rceil = n

We an illustrate this with the following Python code.

>>> from scipy.special import gamma
>>> from math import ceil
>>> for n in range(1, 101):
    ... assert(ceil(gamma(1/n)) == n)

You can find a full proof in [1]. I’ll give a partial proof that may be more informative than the full proof.

The asymptotic expansion of the gamma function near zero is

\Gamma(z) = \frac{1}{z} - \gamma + {\cal O}(z^2)

where γ is the Euler-Mascheroni constant.

So when we set z = 1/n we find Γ(1/n) ≈ n − γ + O(1/n²). Since 0 < γ < 1, the theorem above is true for sufficiently large n. And it turns out “sufficiently large” can be replaced with n ≥ 1.

[1] Gamma at reciprocals of integers: 12225. American Mathematical Monthly. October 2022. pp 789–790.

Polish serenity

Yesterday I ran across the following mashup by Amy Swearer of a Polish proverb and the Serenity Prayer.

Lord, grant me the serenity to accept when it’s no longer my circus,
the courage to control the monkeys that are still mine,
and the wisdom to know the difference.

The proverb is “Nie mój cyrk, nie moje małpy,” literally “Not my circus, not my monkeys”.

Satellites have a lot of room

I saw an animation this morning showing how the space above our planet is dangerously crowded with satellites. That motivated me to do a little back-of-the-envelope math.

The vast majority of satellites are in low earth orbit (LEO), which extends from 160 to 2000 km above the earth’s surface. The radius of the earth is about 6400 km, so the volume of the LEO region is

\frac{4\pi}{3} \left((6400 + 2000)^3 - (6400 + 160)^3\right) \text{km}^3 = 1.3 \times 10^{12} \,\text{km}^3

There are about 12,500 satellites in LEO, so the average volume of LEO per satellite is about 100,000,000 km³.

Now this isn’t the last word in collision avoidance—there are lots of complications we’re not going to get into here—but it is the first word: there’s a lot of space in space.