Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we take products instead, how many distinct products are there?

Proven results

Erdős and Szemerédi proved that there are constants c and ε > 0 such that

max{|A+A|, |A*A|} ≥ c|A|1+ε

In other words, either A+A or A*A is substantially bigger than A. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |A+A| or |A*A| is bounded below by a fixed multiple of |A|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size N from the non-negative integers less than R, then computes the sum and product sets using set comprehensions.

    from numpy.random import choice

    def trial(R, N):
        # R = integer range, N = sample size
        x = choice(R, N, replace=False)
        s = {a+b for a in x for b in x}
        p = {a*b for a in x for b in x}
        return (len(s), len(p))

When I first tried this code I thought it had a bug. I called trial 10 times and got the same values for |A+A| and |A*A| every time. That was because I chose R large relative to N. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if R >> N, it is likely that |A+A| and |A*A| will both equal N(N+1)/2. Things get more interesting when N is closer to R.

Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about lower bounds on |A+A| and |A*A| you could ask for the distribution on |A+A| and |A*A| when the sets A are drawn from some probability distribution.

If the set A is drawn from a continuous distribution, then |A+A| and |A*A| both equal N(N+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case R >> N above.

If the set A is an arithmetic sequence then |A+A| is small and |A*A| is large, and the opposite holds if A is a geometric sequence. So it might be interesting to look at the correlation of |A+A| and |A*A| when A comes from a discrete distribution, such as choosing N integers uniformly from [1, R] when N/R is not too small.

RSA with one shared prime

The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product.

Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p and r, and qr. (This has happened [1].) Then you can easily find p. All three primes can be obtained in less than a millisecond as illustrated by the Python script below.

In a way it would be better to share both your primes with someone else than to share just one. If both your primes are the same as someone else’s, you could read each other’s messages, but a third party attacker could not. If you’re lucky, the person you share primes with doesn’t know that you share primes or isn’t interested in reading your mail. But if that person’s private key is ever disclosed, so is yours.

Python code

Nearly all the time required to run the script is devoted to finding the primes, so we time just the factorization.

    from secrets import randbits
    from sympy import nextprime, gcd
    from timeit import default_timer as timer
    
    numbits = 2048
    
    p = nextprime(randbits(numbits))
    q = nextprime(randbits(numbits))
    r = nextprime(randbits(numbits))                                      
    
    m = p*q
    n = p*r
    
    t0 = timer()
    g = gcd(m, n)
    assert(p == g)
    assert(q == m//p)
    assert(r == n//p)
    t1 = timer()
    
    # Print time in milliseconds
    print(1000*(t1 - t0))

Python notes

There are a few noteworthy things about the Python code.

First, it uses a cryptographic random number generator, not the default generator, to create 2048-bit random numbers.

Second, it uses the portable default_timer which chooses the appropriate timer on different operating systems. Note that it returns wall clock time, not CPU time.

Finally, it uses integer division // to recover q and r. If you use the customary division operator Python will carry out integer division and attempt to cast the result to a float, resulting in the error message “OverflowError: integer division result too large for a float.”

GCD vs factoring

If you wanted to find the greatest common divisor of two small numbers by hand, you’d probably start by factoring the two numbers. But as Euclid knew, you don’t have to factor two numbers before you can find the greatest common factor of the numbers. For large numbers, the Euclidean algorithm is orders of magnitude faster than factoring.

Nobody has announced being able to factor a 1024 bit number yet. The number m and n above have four times as many bits. The largest of the RSA numbers factored so far has 768 bits. This was done in 2009 and took approximately two CPU millennia, i.e. around 2000 CPU years.

Fast Euclidean algorithm

The classic Euclidean algorithm takes O(n²) operations to find the greatest common divisor of two integers of length n. But there are modern sub-quadratic variations on the Euclidean algorithm that take

O(n log²(n) log(log(n)))

operations, which is much faster for very large numbers. I believe SymPy is using the classic Euclidean algorithm, but it could transparently switch to using the fast Euclidean algorithm for large inputs.

Related posts

[1] As noted in the comments, this has happened due to faulty software implementation, but it shouldn’t happen by chance. By the prime number theorem, the number of n-bit primes is approximately 2n / (n log 2). If M people choose 2M primes each n bits long, the probability of two of these primes being the same is roughly

22-n M n log 2.

If M = 7.5 billion (the current world population) and n = 1024, the probability of two primes being the same is about 10-295. This roughly the probability of winning the Powerball jackpot 40 times in a row.

Searching for Mersenne primes

The nth Mersenne number is

Mn = 2n – 1.

A Mersenne prime is a Mersenne number which is also prime. So far 50 51 have been found [1].

A necessary condition for Mn to be prime is that n is prime, so searches for Mersenne numbers only test prime values of n. It’s not sufficient for n to be prime as you can see from the example

M11 = 2047 = 23 × 89.

Lucas-Lehmer test

The largest known prime has been a Mersenne prime since 1952, with one exception in 1989. This is because there is an efficient algorithm, the Lucas-Lehmer test, for determining whether a Mersenne number is prime. This is the algorithm used by GIMPS (Great Internet Mersenne Prime Search).

The Lucas-Lehmer test is very simple. The following Python code tests whether Mp is prime.

    def lucas_lehmer(p):
        M = 2**p - 1
        s = 4
        for _ in range(p-2):
            s = (s*s - 2) % M
        return s == 0

Using this code I was able to verify the first 25 Mersenne primes in under 50 seconds. This includes all Mersenne primes that were known as of 40 years ago.

History

Mersenne primes are named after the French monk Marin Mersenne (1588–1648) who compiled a list of Mersenne primes.

Édouard Lucas came up with his test for Mersenne primes in 1856 and in 1876 proved that M127 is prime. That is, he found a 39-digit prime number by hand. Derrick Lehmer refined the test in 1930.

As of January 2018, the largest known prime is M77,232,917.

Related posts

[1] We’ve found 51 Mersenne primes as of December 22, 2018, but we’re not sure whether we’ve found the first 51 Mersenne primes. We know we’ve found the 47 smallest Mersenne primes. It’s possible that there are other Mersenne primes between the 47th and the 50th known Mersenne primes, and there may be one hiding between the 50th and 51st.

Ellipsoid distance on Earth

globe

To first approximation, Earth is a sphere. But it bulges at the equator, and to second approximation, Earth is an oblate spheroid. Earth is not exactly an oblate spheroid either, but the error in the oblate spheroid model is about 100x smaller than the error in the spherical model.

Finding the distance between two points on a sphere is fairly simple. Here’s a calculator to compute the distance, and here’s a derivation of the formula used in the calculator.

Finding the distance between two points on an ellipsoid is much more complicated. (A spheroid is a kind of ellipsoid.) Wikipedia gives a description of Vincenty’s algorithm for finding the distance between two points on Earth using an oblate spheroid model (specifically WGS-84). I’ll include a Python implementation below.

Comparison with spherical distance

How much difference does it make when you calculate difference on an oblate spheroid rather than a sphere? To address that question I looked at the coordinates of several cities around the world using the CityData function in Mathematica. Latitude is in degrees north of the equator and longitude is in degrees east of the prime meridian.

    |--------------+--------+---------|
    | City         |    Lat |    Long |
    |--------------+--------+---------|
    | Houston      |  29.78 |  -95.39 |
    | Caracas      |  10.54 |  -66.93 |
    | London       |  51.50 |   -0.12 |
    | Tokyo        |  35.67 |  139.77 |
    | Delhi        |  28.67 |   77.21 |
    | Honolulu     |  21.31 | -157.83 |
    | Sao Paulo    | -23.53 |  -46.63 |
    | New York     |  40.66 |  -73.94 |
    | Los Angeles  |  34.02 | -118.41 |
    | Cape Town    | -33.93 |   18.46 |
    | Sydney       | -33.87 |  151.21 |
    | Tromsø       |  69.66 |   18.94 |
    | Singapore    |   1.30 |  103.85 |
    |--------------+--------+---------|

Here are the error extremes.

The spherical model underestimates the distance from London to Tokyo by 12.88 km, and it overestimates the distance from London to Cape Town by 45.40 km.

The relative error is most negative for London to New York (-0.157%) and most positive for Tokyo to Sidney (0.545%).

Update: The comparison above used the same radius for both the spherical and ellipsoidal models. As suggested in the comments, a better comparison would use the mean radius (average of equatorial and polar radii) in the spherical model rather than the equatorial radius.

With that change the most negative absolute error is between Tokyo and New York at -30,038 m. The most negative relative error is between London and New York at -0.324%.

The largest positive error, absolute and relative, is between Tokyo and Sydney. The absolute error is 29,289 m and the relative error is 0.376%.

Python implementation

The code below is a direct implementation of the equations in the Wikipedia article.

Note that longitude and latitude below are assumed to be in radians. You can convert from degrees to radians with SciPy’s deg2rad function.

from scipy import sin, cos, tan, arctan, arctan2, arccos, pi

def spherical_distance(lat1, long1, lat2, long2):
    phi1 = 0.5*pi - lat1
    phi2 = 0.5*pi - lat2
    r = 0.5*(6378137 + 6356752) # mean radius in meters
    t = sin(phi1)*sin(phi2)*cos(long1-long2) + cos(phi1)*cos(phi2)
    return r * arccos(t)

def ellipsoidal_distance(lat1, long1, lat2, long2):

    a = 6378137.0 # equatorial radius in meters 
    f = 1/298.257223563 # ellipsoid flattening 
    b = (1 - f)*a 
    tolerance = 1e-11 # to stop iteration

    phi1, phi2 = lat1, lat2
    U1 = arctan((1-f)*tan(phi1))
    U2 = arctan((1-f)*tan(phi2))
    L1, L2 = long1, long2
    L = L2 - L1

    lambda_old = L + 0

    while True:
    
        t = (cos(U2)*sin(lambda_old))**2
        t += (cos(U1)*sin(U2) - sin(U1)*cos(U2)*cos(lambda_old))**2
        sin_sigma = t**0.5
        cos_sigma = sin(U1)*sin(U2) + cos(U1)*cos(U2)*cos(lambda_old)
        sigma = arctan2(sin_sigma, cos_sigma) 
    
        sin_alpha = cos(U1)*cos(U2)*sin(lambda_old) / sin_sigma
        cos_sq_alpha = 1 - sin_alpha**2
        cos_2sigma_m = cos_sigma - 2*sin(U1)*sin(U2)/cos_sq_alpha
        C = f*cos_sq_alpha*(4 + f*(4-3*cos_sq_alpha))/16
    
        t = sigma + C*sin_sigma*(cos_2sigma_m + C*cos_sigma*(-1 + 2*cos_2sigma_m**2))
        lambda_new = L + (1 - C)*f*sin_alpha*t
        if abs(lambda_new - lambda_old) <= tolerance:
            break
        else:
            lambda_old = lambda_new

    u2 = cos_sq_alpha*((a**2 - b**2)/b**2)
    A = 1 + (u2/16384)*(4096 + u2*(-768+u2*(320 - 175*u2)))
    B = (u2/1024)*(256 + u2*(-128 + u2*(74 - 47*u2)))
    t = cos_2sigma_m + 0.25*B*(cos_sigma*(-1 + 2*cos_2sigma_m**2))
    t -= (B/6)*cos_2sigma_m*(-3 + 4*sin_sigma**2)*(-3 + 4*cos_2sigma_m**2)
    delta_sigma = B * sin_sigma * t
    s = b*A*(sigma - delta_sigma)

    return s

Physical constants in Python

You can find a large collection of physical constants in scipy.constants. The most frequently used constants are available directly, and hundreds more are in a dictionary physical_constants.

The fine structure constant α is defined as a function of other physical constants:

\alpha = \frac{e^2}{4 \pi \varepsilon_0 \hbar c}

The following code shows that the fine structure constant and the other constants that go into it are available in scipy.constants.

    import scipy.constants as sc

    a = sc.elementary_charge**2
    b = 4 * sc.pi * sc.epsilon_0 * sc.hbar * sc.c
    assert( abs(a/b - sc.fine_structure) < 1e-12 )

Eddington’s constant

In the 1930’s Arthur Eddington believed that the number of protons in the observable universe was exactly the Eddington number

N_{\mathrm{Edd}} = \frac{2^{256}}{\alpha}

Since at the time the fine structure constant was thought to be 1/136, this made the number of protons a nice even 136 × 2256.  Later he revised his number when it looked like the fine structure constant was 1/137. According to the Python code above, the current estimate is more like 1/137.036.

Eddington was a very accomplished scientist, though he had some ideas that seem odd today. His number is a not a bad estimate, though nobody believes it could be exact.

Related posts

The constants in scipy.constants have come up in a couple previous blog posts.

The post on Koide’s coincidence shows how to use the physical_constants dictionary, which includes not just the physical constant values but also their units and uncertainty.

The post on Benford’s law shows that the leading digits of the constants in scipy.constants follow the logarithmic distribution observed by Frank Benford (and earlier by Simon Newcomb).

How fast can you multiply matrices?

Suppose you want to multiply two 2 × 2 matrices together. How many multiplication operations does it take? Apparently 8, and yet in 1969 Volker Strassen discovered that he could do it with 7 multiplications.

Upper and lower bounds

The obvious way to multiply two n × n matrices takes n³ operations: each entry in the product is the inner product of a row from the first matrix and a column from the second matrix. That amounts to n² inner products, each requiring n multiplications.

You can multiply two square matrices with O(n³) operations with the method described above, and it must take at least O(n²) operations because the product depends on all of the 2n² entries of the two matrices. Strassen’s result suggests that the optimal algorithm for multiplying matrices takes O(nk) operations for some k between 2 and 3. By applying Strassen’s algorithm recursively to larger matrices you can get k = log2 7 = 2.807.

The best known value at the moment is k = 2.3728639.

Bounds on bounds

Yesterday the blog Gödel’s Lost Letter and P = NP posted an article Limits on Matrix Multiplication where they report on recent developments for finding the smallest value of k. A new paper doesn’t report a new value of k, but a limit on what current approaches to the problem can prove. Maybe k can equal 2, but there is a lower bound, strictly bigger than 2, on how small current approaches can go.

Is this practical?

When I first heard of Strassen’s method, I was told it’s a curious but impractical result. Strassen saved one multiplication at the expense of introducing several more addition operations.

According to the Wikipedia article on matrix multiplication, recursively applying Strassen’s method can save time for n > 100. But there’s more to consider than counting operations. Strassen’s method, and subsequent algorithms, are more complicated. They may not be more efficient in practice even if they use fewer operations because the operations may not vectorize well.

Wikipedia reports that Strassen’s algorithm is not as numerically stable as the traditional approach, but this doesn’t matter when working over finite fields where arithmetic is exact.

Strassen’s method

Let’s look at just what Strassen’s method does. We want to find the product of two matrices:

\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{pmatrix} \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22}\end{pmatrix} = \begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22}\end{pmatrix}

I started to spell out Strassen’s method in LaTeX equations, but I thought it would be much better to write it out in code so I can be sure that I didn’t make a mistake.

The following Python code randomly fills in the values of the a’s and b’s, computes the c’s using the conventional method, then asserts that you can find these values from the q’s computed from Strassen’s method. Note there is one multiplication in each of the seven q’s.

    from random import randint
    
    # Fill matrices with random integer values
    a11 = randint(0, 9)
    a12 = randint(0, 9)
    a21 = randint(0, 9)
    a22 = randint(0, 9)
    b11 = randint(0, 9)
    b12 = randint(0, 9)
    b21 = randint(0, 9)
    b22 = randint(0, 9)
    
    # Traditional matrix multiplication
    c11 = a11*b11 + a12*b21
    c12 = a11*b12 + a12*b22
    c21 = a21*b11 + a22*b21
    c22 = a21*b12 + a22*b22
    
    # Strassen's method
    q1 = (a11 + a22)*(b11 + b22)
    q2 = (a21 + a22)*b11
    q3 = a11*(b12 - b22)
    q4 = a22 * (-b11 + b21)
    q5 = (a11 + a12)*b22
    q6 = (-a11 + a21)*(b11 + b12)
    q7 = (a12 - a22)*(b21 + b22)
    
    assert(c11 == q1 + q4 - q5 + q7)
    assert(c21 == q2 + q4)
    assert(c12 == q3 + q5)
    assert(c22 == q1 + q3 - q2 + q6)

Since Strassen’s method takes more operations than the traditional method for multiplying 2 × 2 matrices, how can it take fewer operations than the traditional method for multiplying large matrices?

When you apply Strassen’s method to a matrix partitioned into submatrices, its multiplications become matrix multiplications, and its additions become matrix additions. These operations are O(n2.807) and O(n2) respectively, so saving multiplications at the cost of more additions is a win.

Related articles

Drawing Spirograph curves in Python

I was looking back over an old blog post and noticed some code in the comments that I had overlooked. Tom Pollard gives the following code for drawing Spirograph-like curves.

import matplotlib.pyplot as plt
from numpy import pi, exp, real, imag, linspace

def spiro(t, r1, r2, r3):
    """
    Create Spirograph curves made by one circle of radius r2 rolling 
    around the inside (or outside) of another of radius r1.  The pen
    is a distance r3 from the center of the first circle.
    """
    return r3*exp(1j*t*(r1+r2)/r2) + (r1+r2)*exp(1j*t)

def circle(t, r):
    return r * exp(1j*t)

r1 = 1.0
r2 = 52.0/96.0
r3 = 42.0/96.0

ncycle = 13 # LCM(r1,r2)/r2

t = linspace(0, ncycle*2*pi, 1000)
plt.plot(real(spiro(t, r1, r2, r3)), imag(spiro(t, r1, r2, r3)))
plt.plot(real(circle(t, r1)), imag(circle(t, r1)))
 
fig = plt.gcf()
fig.gca().set_aspect('equal')
plt.show()

Tom points out that with the r parameters above, this produces the default curve on Nathan Friend’s Inspirograph app.

Related: Exponential sum of the day

Distribution of carries

Suppose you add two long numbers together. As you work your way toward the result, adding digits from right to left, sometimes you carry a 1 and sometimes you don’t. How often do you carry 1?

Now suppose you add three numbers at a time. Now your carry might be 0, 1, or 2. How often does each appear? How do things change if you’re not working in base 10?

John Holte goes into this problem in detail in [1]. We’ll only look at his first result here.

Suppose we’re adding m numbers together in base b, with each digit being uniformly distributed. Then at each step of the addition process, the probability distribution of the amount we carry out only depends on the amount we carried in from the previous step. In other words, the carries form a Markov chain!

This means that we can do more than describe the distribution of the amounts carried. We can specify the transition probabilities for carries, i.e. the distribution on the amount carried out, conditional on the the amount carried in.

Examples

When we add two base ten numbers, m = 2 and b = 10, we have the transition matrix

\begin{pmatrix} 0.55 & 0.45 \\ 0.45 & 0.55 \end{pmatrix}

This means that on average, we’re as likely to carry 0 as to carry 1. But more specifically, there’s a 55% chance that we’ll carry a 1 if we carried a 1 on the previous step, and also a 55% chance that we won’t have a carry this time if we didn’t have one last time.

If we add three numbers, things get a little more interesting. Now the transition matrix is

\begin{pmatrix} 0.220 & 0.660 & 0.120 \\ 0.165 & 0.670 & 0.165 \\ 0.120 & 0.660 & 0.220 \end{pmatrix}

This gives the probabilities of each out carry for each in carry. We can compute the long run distribution from the matrix above to find that when adding three long numbers, we’ll carry 0 or 1 with probability 1/6 each, and carry 1 with probability 2/3.

When adding three binary numbers, the transition matrix is fairly different than for base 10

\begin{pmatrix} 0.500 & 0.500 & 0.000 \\ 0.125 & 0.750 & 0.125 \\ 0.000 & 0.500 & 0.500 \\ \end{pmatrix}

but the long run distributions are the same: carry a 1 two thirds of the time, and split the remaining probability equally between carrying 0 and 2.

The transition matrices don’t change much for larger bases; base 100 doesn’t look much different than base 100, for example.

Finally, let’s do an example adding five numbers in base 10. The transition matrix is

\begin{pmatrix} 0.020 & 0.305 & 0.533 & 0.140 & 0.003 \\ 0.013 & 0.259 & 0.548 & 0.176 & 0.005 \\ 0.008 & 0.216 & 0.553 & 0.216 & 0.008 \\ 0.005 & 0.176 & 0.548 & 0.260 & 0.013 \\ 0.003 & 0.140 & 0.533 & 0.305 & 0.020 \\ \end{pmatrix}

and the long run distribution is 1/60, 13/60, 33/60, 13/60, and 1/60 for carries of 0 through 4.

General case

In general, when adding m numbers in base b, Holte gives the transition probabilities as

\pi_{ij} = b^{-m} \sum_{r=0}^{j-\lfloor i/b\rfloor}(-1)^r {m+1\choose r}{m - i -1 + b(j-r+1) \choose m}

Here is some Python code to compute the transition matrices.

from scipy import floor, zeros, array
from scipy.special import comb
from numpy.linalg import matrix_power

# transition probability
def pi(i, j, b, m):
    s = 0
    for r in range(0, j - int(floor(i/b)) + 1):
        s += (-1)**r * comb(m+1, r) * comb(m-i-1 +b*(j-r+1), m)
    return s/b**m

def transition_matrix(b, m):
    P =  [ [pi(i, j, b, m) for j in range(m)]
            for i in range(m) ]
    return array(P)

You can find the long run probabilities by raising the transition matrix to a large power.

Related posts

[1] John M. Holte. The American Mathematical Monthly, Vol. 104, No. 2 (Feb., 1997), pp. 138-149

Circular law for random matrices

Suppose you create a large matrix M by filling its components with random values. If has size n by n, then we require the probability distribution for each entry to have mean 0 and variance 1/n. Then the Girko-Ginibri circular law says that the eigenvalues of M are approximately uniformly distributed in the unit disk in the complex plane. As the size n increases, the distribution converges to a uniform distribution on the unit disk.

The probability distribution need not be normal. It can be any distribution, shifted to have mean 0 and scaled to have variance 1/n, provided the tail of the distribution isn’t so thick that the variance doesn’t exist. If you don’t scale the variance to 1/n you still get a circle, just not a unit circle.

We’ll illustrate the circular law with a uniform distribution. The uniform distribution has mean 1/2 and variance 1/12, so we will subtract 1/2 and multiply each entry by √(12/n).

Here’s our Python code:

    import matplotlib.pyplot as plt
    import numpy as np
    
    n = 100
    M = np.random.random((n,n)) - 0.5
    M *= (12/n)**0.5
    w = np.linalg.eigvals(M)
    plt.scatter(np.real(w), np.imag(w))
    plt.axes().set_aspect(1)
    plt.show()

When n=100 we get the following plot.

eigenvalues of 100 by 100 matrix

When n=1000 we can see the disk filling in more.

eigenvalues of 1000 by 1000 matrix

Note that the points are symmetric about the real axis. All the entries of M are real, so its characteristic polynomial has all real coefficients, and so its roots come in conjugate pairs. If we randomly generated complex entries for M we would not have such symmetry.

Related post: Fat-tailed random matrices

Refactored code for division algebras

An earlier post included code for multiplying quaternions, octonions, and sedenions. The code was a little clunky, so I refactor it here.

    def conj(x):
        xstar = -x
        xstar[0] *= -1
        return xstar 

    def CayleyDickson(x, y):
        n = len(x)

        if n == 1:
            return x*y

        m = n // 2
    
        a, b = x[:m], x[m:]
        c, d = y[:m], y[m:]
        z = np.zeros(n)
        z[:m] = CayleyDickson(a, c) - CayleyDickson(conj(d), b)
        z[m:] = CayleyDickson(d, a) + CayleyDickson(b, conj(c))
        return z

The CayleyDickson function implements the Cayley-Dickson construction and can be used to multiply real, complex, quaternion, and octonion numbers. In fact, it can be used to implement multiplication in any real vector space of dimension 2n. The numeric types listed above correspond to n = 0, 1, 2, and 3. These are the only normed division algebras over the reals.

When n = 4 we get the sedenions, which are not a division algebra because they contain zero divisors, and the code can be used for any larger value of n as well. As noted before, the algebraic properties degrade as n increases, though I don’t think they get any worse after n = 4.

If you wanted to make the code more robust, you could add code to verify that the arguments x and y have the same length, and that their common length is a power of 2. (You could just check that the length is either 1 or even; if it’s not a power of 2 the recursion will eventually produce an odd argument.)

Related posts