Physical constants in Python

You can find a large collection of physical constants in scipy.constants. The most frequently used constants are available directly, and hundreds more are in a dictionary physical_constants.

The fine structure constant α is defined as a function of other physical constants:

\alpha = \frac{e^2}{4 \pi \varepsilon_0 \hbar c}

The following code shows that the fine structure constant and the other constants that go into it are available in scipy.constants.

    import scipy.constants as sc

    a = sc.elementary_charge**2
    b = 4 * sc.pi * sc.epsilon_0 * sc.hbar * sc.c
    assert( abs(a/b - sc.fine_structure) < 1e-12 )

Eddington’s constant

In the 1930’s Arthur Eddington believed that the number of protons in the observable universe was exactly the Eddington number

N_{\mathrm{Edd}} = \frac{2^{256}}{\alpha}

Since at the time the fine structure constant was thought to be 1/136, this made the number of protons a nice even 136 × 2256.  Later he revised his number when it looked like the fine structure constant was 1/137. According to the Python code above, the current estimate is more like 1/137.036.

Eddington was a very accomplished scientist, though he had some ideas that seem odd today. His number is a not a bad estimate, though nobody believes it could be exact.

Related posts

The constants in scipy.constants have come up in a couple previous blog posts.

The post on Koide’s coincidence shows how to use the physical_constants dictionary, which includes not just the physical constant values but also their units and uncertainty.

The post on Benford’s law shows that the leading digits of the constants in scipy.constants follow the logarithmic distribution observed by Frank Benford (and earlier by Simon Newcomb).

How fast can you multiply matrices?

Suppose you want to multiply two 2 × 2 matrices together. How many multiplication operations does it take? Apparently 8, and yet in 1969 Volker Strassen discovered that he could do it with 7 multiplications.

Upper and lower bounds

The obvious way to multiply two n × n matrices takes n³ operations: each entry in the product is the inner product of a row from the first matrix and a column from the second matrix. That amounts to n² inner products, each requiring n multiplications.

You can multiply two square matrices with O(n³) operations with the method described above, and it must take at least O(n²) operations because the product depends on all of the 2n² entries of the two matrices. Strassen’s result suggests that the optimal algorithm for multiplying matrices takes O(nk) operations for some k between 2 and 3. By applying Strassen’s algorithm recursively to larger matrices you can get k = log2 7 = 2.807.

The best known value at the moment is k = 2.3728639.

Bounds on bounds

Yesterday the blog Gödel’s Lost Letter and P = NP posted an article Limits on Matrix Multiplication where they report on recent developments for finding the smallest value of k. A new paper doesn’t report a new value of k, but a limit on what current approaches to the problem can prove. Maybe k can equal 2, but there is a lower bound, strictly bigger than 2, on how small current approaches can go.

Is this practical?

When I first heard of Strassen’s method, I was told it’s a curious but impractical result. Strassen saved one multiplication at the expense of introducing several more addition operations.

According to the Wikipedia article on matrix multiplication, recursively applying Strassen’s method can save time for n > 100. But there’s more to consider than counting operations. Strassen’s method, and subsequent algorithms, are more complicated. They may not be more efficient in practice even if they use fewer operations because the operations may not vectorize well.

Wikipedia reports that Strassen’s algorithm is not as numerically stable as the traditional approach, but this doesn’t matter when working over finite fields where arithmetic is exact.

Strassen’s method

Let’s look at just what Strassen’s method does. We want to find the product of two matrices:

\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{pmatrix} \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22}\end{pmatrix} = \begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22}\end{pmatrix}

I started to spell out Strassen’s method in LaTeX equations, but I thought it would be much better to write it out in code so I can be sure that I didn’t make a mistake.

The following Python code randomly fills in the values of the a’s and b’s, computes the c’s using the conventional method, then asserts that you can find these values from the q’s computed from Strassen’s method. Note there is one multiplication in each of the seven q’s.

    from random import randint
    
    # Fill matrices with random integer values
    a11 = randint(0, 9)
    a12 = randint(0, 9)
    a21 = randint(0, 9)
    a22 = randint(0, 9)
    b11 = randint(0, 9)
    b12 = randint(0, 9)
    b21 = randint(0, 9)
    b22 = randint(0, 9)
    
    # Traditional matrix multiplication
    c11 = a11*b11 + a12*b21
    c12 = a11*b12 + a12*b22
    c21 = a21*b11 + a22*b21
    c22 = a21*b12 + a22*b22
    
    # Strassen's method
    q1 = (a11 + a22)*(b11 + b22)
    q2 = (a21 + a22)*b11
    q3 = a11*(b12 - b22)
    q4 = a22 * (-b11 + b21)
    q5 = (a11 + a12)*b22
    q6 = (-a11 + a21)*(b11 + b12)
    q7 = (a12 - a22)*(b21 + b22)
    
    assert(c11 == q1 + q4 - q5 + q7)
    assert(c21 == q2 + q4)
    assert(c12 == q3 + q5)
    assert(c22 == q1 + q3 - q2 + q6)

Since Strassen’s method takes more operations than the traditional method for multiplying 2 × 2 matrices, how can it take fewer operations than the traditional method for multiplying large matrices?

When you apply Strassen’s method to a matrix partitioned into submatrices, its multiplications become matrix multiplications, and its additions become matrix additions. These operations are O(n2.807) and O(n2) respectively, so saving multiplications at the cost of more additions is a win.

Related articles

Drawing Spirograph curves in Python

I was looking back over an old blog post and noticed some code in the comments that I had overlooked. Tom Pollard gives the following code for drawing Spirograph-like curves.

import matplotlib.pyplot as plt
from numpy import pi, exp, real, imag, linspace

def spiro(t, r1, r2, r3):
    """
    Create Spirograph curves made by one circle of radius r2 rolling 
    around the inside (or outside) of another of radius r1.  The pen
    is a distance r3 from the center of the first circle.
    """
    return r3*exp(1j*t*(r1+r2)/r2) + (r1+r2)*exp(1j*t)

def circle(t, r):
    return r * exp(1j*t)

r1 = 1.0
r2 = 52.0/96.0
r3 = 42.0/96.0

ncycle = 13 # LCM(r1,r2)/r2

t = linspace(0, ncycle*2*pi, 1000)
plt.plot(real(spiro(t, r1, r2, r3)), imag(spiro(t, r1, r2, r3)))
plt.plot(real(circle(t, r1)), imag(circle(t, r1)))
 
fig = plt.gcf()
fig.gca().set_aspect('equal')
plt.show()

Tom points out that with the r parameters above, this produces the default curve on Nathan Friend’s Inspirograph app.

Related: Exponential sum of the day

Distribution of carries

Suppose you add two long numbers together. As you work your way toward the result, adding digits from right to left, sometimes you carry a 1 and sometimes you don’t. How often do you carry 1?

Now suppose you add three numbers at a time. Now your carry might be 0, 1, or 2. How often does each appear? How do things change if you’re not working in base 10?

John Holte goes into this problem in detail in [1]. We’ll only look at his first result here.

Suppose we’re adding m numbers together in base b, with each digit being uniformly distributed. Then at each step of the addition process, the probability distribution of the amount we carry out only depends on the amount we carried in from the previous step. In other words, the carries form a Markov chain!

This means that we can do more than describe the distribution of the amounts carried. We can specify the transition probabilities for carries, i.e. the distribution on the amount carried out, conditional on the amount carried in.

Examples

When we add two base ten numbers, m = 2 and b = 10, we have the transition matrix

\begin{pmatrix} 0.55 & 0.45 \\ 0.45 & 0.55 \end{pmatrix}

This means that on average, we’re as likely to carry 0 as to carry 1. But more specifically, there’s a 55% chance that we’ll carry a 1 if we carried a 1 on the previous step, and also a 55% chance that we won’t have a carry this time if we didn’t have one last time.

If we add three numbers, things get a little more interesting. Now the transition matrix is

\begin{pmatrix} 0.220 & 0.660 & 0.120 \\ 0.165 & 0.670 & 0.165 \\ 0.120 & 0.660 & 0.220 \end{pmatrix}

This gives the probabilities of each out carry for each in carry. We can compute the long run distribution from the matrix above to find that when adding three long numbers, we’ll carry 0 or 1 with probability 1/6 each, and carry 1 with probability 2/3.

When adding three binary numbers, the transition matrix is fairly different than for base 10

\begin{pmatrix} 0.500 & 0.500 & 0.000 \\ 0.125 & 0.750 & 0.125 \\ 0.000 & 0.500 & 0.500 \\ \end{pmatrix}

but the long run distributions are the same: carry a 1 two thirds of the time, and split the remaining probability equally between carrying 0 and 2.

The transition matrices don’t change much for larger bases; base 100 doesn’t look much different than base 10, for example.

Finally, let’s do an example adding five numbers in base 10. The transition matrix is

\begin{pmatrix} 0.020 & 0.305 & 0.533 & 0.140 & 0.003 \\ 0.013 & 0.259 & 0.548 & 0.176 & 0.005 \\ 0.008 & 0.216 & 0.553 & 0.216 & 0.008 \\ 0.005 & 0.176 & 0.548 & 0.260 & 0.013 \\ 0.003 & 0.140 & 0.533 & 0.305 & 0.020 \\ \end{pmatrix}

and the long run distribution is 1/60, 13/60, 33/60, 13/60, and 1/60 for carries of 0 through 4.

General case

In general, when adding m numbers in base b, Holte gives the transition probabilities as

\pi_{ij} = b^{-m} \sum_{r=0}^{j-\lfloor i/b\rfloor}(-1)^r {m+1\choose r}{m - i -1 + b(j-r+1) \choose m}

Here is some Python code to compute the transition matrices.

from scipy import floor, zeros, array
from scipy.special import comb
from numpy.linalg import matrix_power

# transition probability
def pi(i, j, b, m):
    s = 0
    for r in range(0, j - int(floor(i/b)) + 1):
        s += (-1)**r * comb(m+1, r) * comb(m-i-1 +b*(j-r+1), m)
    return s/b**m

def transition_matrix(b, m):
    P =  [ [pi(i, j, b, m) for j in range(m)]
            for i in range(m) ]
    return array(P)

You can find the long run probabilities by raising the transition matrix to a large power.

Related posts

[1] John M. Holte. The American Mathematical Monthly, Vol. 104, No. 2 (Feb., 1997), pp. 138-149

Circular law for random matrices

Suppose you create a large matrix M by filling its components with random values. If has size n by n, then we require the probability distribution for each entry to have mean 0 and variance 1/n. Then the Girko-Ginibri circular law says that the eigenvalues of M are approximately uniformly distributed in the unit disk in the complex plane. As the size n increases, the distribution converges to a uniform distribution on the unit disk.

The probability distribution need not be normal. It can be any distribution, shifted to have mean 0 and scaled to have variance 1/n, provided the tail of the distribution isn’t so thick that the variance doesn’t exist. If you don’t scale the variance to 1/n you still get a circle, just not a unit circle.

We’ll illustrate the circular law with a uniform distribution. The uniform distribution has mean 1/2 and variance 1/12, so we will subtract 1/2 and multiply each entry by √(12/n).

Here’s our Python code:

    import matplotlib.pyplot as plt
    import numpy as np
    
    n = 100
    M = np.random.random((n,n)) - 0.5
    M *= (12/n)**0.5
    w = np.linalg.eigvals(M)
    plt.scatter(np.real(w), np.imag(w))
    plt.axes().set_aspect(1)
    plt.show()

When n=100 we get the following plot.

eigenvalues of 100 by 100 matrix

When n=1000 we can see the disk filling in more.

eigenvalues of 1000 by 1000 matrix

Note that the points are symmetric about the real axis. All the entries of M are real, so its characteristic polynomial has all real coefficients, and so its roots come in conjugate pairs. If we randomly generated complex entries for M we would not have such symmetry.

Related post: Fat-tailed random matrices

Refactored code for division algebras

An earlier post included code for multiplying quaternions, octonions, and sedenions. The code was a little clunky, so I refactor it here.

    def conj(x):
        xstar = -x
        xstar[0] *= -1
        return xstar 

    def CayleyDickson(x, y):
        n = len(x)

        if n == 1:
            return x*y

        m = n // 2
    
        a, b = x[:m], x[m:]
        c, d = y[:m], y[m:]
        z = np.zeros(n)
        z[:m] = CayleyDickson(a, c) - CayleyDickson(conj(d), b)
        z[m:] = CayleyDickson(d, a) + CayleyDickson(b, conj(c))
        return z

The CayleyDickson function implements the Cayley-Dickson construction and can be used to multiply real, complex, quaternion, and octonion numbers. In fact, it can be used to implement multiplication in any real vector space of dimension 2n. The numeric types listed above correspond to n = 0, 1, 2, and 3. These are the only normed division algebras over the reals.

When n = 4 we get the sedenions, which are not a division algebra because they contain zero divisors, and the code can be used for any larger value of n as well. As noted before, the algebraic properties degrade as n increases, though I don’t think they get any worse after n = 4.

If you wanted to make the code more robust, you could add code to verify that the arguments x and y have the same length, and that their common length is a power of 2. (You could just check that the length is either 1 or even; if it’s not a power of 2 the recursion will eventually produce an odd argument.)

Related posts

Python code for octonion and sedenion multiplication

The previous post discussed octonions. This post will implement octonion multiplication in Python, and then sedenion multiplication.

Cayley-Dickson construction

There’s a way to bootstrap quaternion multiplication into octonion multiplication, so we’ll reuse the quaternion multiplication code from an earlier post. It’s called the Cayley-Dickson construction. For more on this construction , see John Baez’s treatise on octonions.

If you represent an octonion as a pair of quaternions, then multiplication can be defined by

(a, b) (c, d) = (acd*b, da + bc*)

where a star superscript on a variable means (quaternion) conjugate.

Note that this isn’t the only way to define multiplication of octonions. There are many (480?) isomorphic ways to define octonion multiplication.

(You can take the Cayley-Dickson construction one step further, creating sedenions as pairs of octonions. We’ll also provide code for sedenion multiplication below.)

Code for octonion multiplication

    import numpy as np

    # quaternion multiplication
    def qmult(x, y):
        return np.array([
            x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3],
            x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2],
            x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1],
            x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0]
       ])

    # quaternion conjugate
    def qstar(x):
        return x*np.array([1, -1, -1, -1])

    # octonion multiplication
    def omult(x, y):
        # Split octonions into pairs of quaternions
        a, b = x[:4], x[4:]
        c, d = y[:4], y[4:]
    
        z = np.zeros(8)
        z[:4] = qmult(a, c) - qmult(qstar(d), b)
        z[4:] = qmult(d, a) + qmult(b, qstar(c))
        return z

Update: See this post for refactored code. Handles quaternions, octonions, sedenions, etc. all with one function.

Unit tests

Here are some unit tests to verify that our multiplication has the expected properties. We begin by generating three octonions with norm 1.

    from numpy.linalg import norm

    def random_unit_octonion():
        x = np.random.normal(size=8)
        return x / norm(x)

    x = random_unit_octonion()
    y = random_unit_octonion()
    z = random_unit_octonion()

We said in the previous post that octonions satisfy the “alternative” condition, a weak form of associativity. Here we verify this property as a unit test.

    eps = 1e-15

    # alternative identities

    a = omult(omult(x, x), y)
    b = omult(x, omult(x, y))
    assert( norm(a - b) < eps )

    a = omult(omult(x, y), y)
    b = omult(x, omult(y, y))
    assert( norm(a - b) < eps )

We also said in the previous post that the octonions satisfy the “Moufang” conditions.

    # Moufang identities
    
    a = omult(z, omult(x, omult(z, y)))
    b = omult(omult(omult(z, x), z), y)
    assert( norm(a - b) < eps )

    a = omult(x, omult(z, omult(y, z)))
    b = omult(omult(omult(x, z), y), z)
    assert( norm(a - b) < eps )

    a = omult(omult(z,x), omult(y, z))
    b = omult(omult(z, omult(x, y)), z)
    assert( norm(a  - b) < eps )

    a = omult(omult(z,x), omult(y, z))
    b = omult(z, omult(omult(x, y), z))
    assert( norm(a - b) < eps )

And finally, we verify that the product of unit length octonions has unit length.

    # norm condition
    n = norm(omult(omult(x, y), z))
    assert( abs(n - 1) < eps )

The next post uses the octionion multiplication code above to look at the distribution of the associator (xy)z – x(yz) to see how far multiplication is from being associative.

Sedenion multiplication

The only normed division algebras over the reals are the real numbers, complex numbers, quaternions, and octonions. These are real vector spaces of dimension 1, 2, 4, and 8.

You can proceed analogously and define a real algebra of dimension 16 called the sedenions. When we go from complex numbers to quaternions we lose commutativity. When we go from quaternions to octonions we lose full associativity, but retain a weak version of associativity. Even weak associativity is lost when we move from octonions to sedenions.  Non-zero octonions form an alternative loop with respect to multiplication, but sedenions do not.

Sedenions have multiplicative inverses, but there are also some zero divisors, i.e. non-zero vectors whose product is zero. So senenions do not form a division algebra. If you continue the Cayley-Dickson construction past the sedenions, you keep getting zero divisors, so no more division algebras.

Here’s Python code for sedenion multiplication, building on the code above.

    def ostar(x):
        mask = -np.ones(8)
        mask[0] = 1
        return x*mask
 
    # sedenion multiplication
    def smult(x, y):
        # Split sedenions into pairs of octonions
        a, b = x[:8], x[8:]
        c, d = y[:8], y[8:]
        z = np.zeros(16)
        z[:8] = omult(a, c) - omult(ostar(d), b)
        z[8:] = omult(d, a) + omult(b, ostar(c))
        return z

As noted above, see this post for more concise code that also generalizes further.

Computing extreme normal tail probabilities

Let me say up front that relying on the normal distribution as an accurate model of extreme events is foolish under most circumstances. The main reason to calculate the probability of, say, a 40 sigma event is to show how absurd it is to talk about 40 sigma events. See my previous post on six-sigma events for an explanation.

For this post I’ll be looking at two-tailed events, the probability that a normal random variable is either less than –kσ or greater than kσ. If you’re only interested in one of these two probabilities, divide by 2. Also, since the results are independent of σ, let’s assume σ = 1 for convenience.

The following Python code will print a table of k-sigma event probabilities.

    from scipy.stats import norm

    for k in range(1, 40):
        print(k, 2*norm.cdf(-k))

This shows, for example, that a “25 sigma event,” something I’ve heard people talk about with a straight face, has a probability of 6 × 10-138.

The code above reports that a 38 or 39 sigma event has probability 0, exactly 0. That’s because the actual value, while not zero, is so close to zero that floating point precision can’t tell the difference. This happens around 10-308.

What if, despite all the signs warning hic sunt dracones you want to compute even smaller probabilities? Then for one thing you’ll need to switch over to log scale in order for the results to be representable as floating point numbers.

Exactly computing these extreme probabilities is challenging, but there are convenient upper and lower bounds on the probabilities that can be derived from Abramowitz and Stegun, equation 7.1.13 with a change of variables. See notes here.

We can use these bounds to compute upper and lower bounds on the base 10 logs of the tail probabilities.

    from scipy import log, sqrt, pi

    def core(t, c):
        x = 2*sqrt(2/pi)/(t + sqrt(t**2 + c))
        ln_p = -0.5*t**2 + log(x)
        return ln_p/log(10)

    def log10_upper(t):
        return core(t, 8/pi)

    def log10_lower(t):
        return core(t, 4)

This tells us that the log base 10 of the probability of a normal random variable being more than 38 standard deviations away from its mean is between -315.53968 and -315.53979. The upper and lower bounds agree to about seven significant figures, and the accuracy only improves as k gets larger. So for large arguments, we can use either the upper or lower bound as an accurate approximation.

The code above was used to compute this table of tail probabilities for k = 1 to 100 standard deviations.

Computing SVD and pseudoinverse

In a nutshell, given the singular decomposition of a matrix A,

A = U \Sigma V^*

the Moore-Penrose pseudoinverse is given by

A^+ = V \Sigma^+ U^*.

This post will explain what the terms above mean, and how to compute them in Python and in Mathematica.

Singular Value Decomposition (SVD)

The singular value decomposition of a matrix is a sort of change of coordinates that makes the matrix simple, a generalization of diagonalization.

Matrix diagonalization

If a square matrix A is diagonalizable, then there is a matrix P such that

A = P D P^{-1}

where the matrix D is diagonal. You could think of P as a change of coordinates that makes the action of A as simple as possible. The elements on the diagonal of D are the eigenvalues of A and the columns of P are the corresponding eigenvectors.

Unfortunately not all matrices can be diagonalized. Singular value decomposition is a way to do something like diagonalization for any matrix, even non-square matrices.

Generalization to SVD

Singular value decomposition generalizes diagonalization. The matrix Σ in SVD is analogous to D in diagonalization. Σ is diagonal, though it may not be square. The matrices on either side of Σ are analogous to the matrix P in diagonalization, though now there are two different matrices, and they are not necessarily inverses of each other. The matrices U and V are square, but not necessarily of the same dimension.

The elements along the diagonal of Σ are not necessarily eigenvalues but singular values, which are a generalization of eigenvalues. Similarly the columns of U and V are not necessarily eigenvectors but left singular vectors and right singular vectors respectively.

The star superscript indicates conjugate transpose. If a matrix has all real components, then the conjugate transpose is just the transpose. But if the matrix has complex entries, you take the conjugate and transpose each entry.

The matrices U and V are unitary. A matrix M is unitary if its inverse is its conjugate transpose, i.e. M* M = MM* = I.

Pseudoinverse and SVD

The (Moore-Penrose) pseudoinverse of a matrix generalizes the notion of an inverse, somewhat like the way SVD generalized diagonalization. Not every matrix has an inverse, but every matrix has a pseudoinverse, even non-square matrices.

Computing the pseudoinverse from the SVD is simple.

If

A = U \Sigma V^*

then

A^+ = V \Sigma^+ U^*

where Σ+ is formed from Σ by taking the reciprocal of all the non-zero elements, leaving all the zeros alone, and making the matrix the right shape: if Σ is an m by n matrix, then Σ+ must be an n by m matrix.

We’ll give examples below in Mathematica and Python.

Computing SVD in Mathematica

Let’s start with the matrix A below.

A = \begin{bmatrix} 2 & -1 & 0 \\ 4 & 3 & -2 \end{bmatrix}

We can find the SVD of A with the following Mathematica commands.

    A = {{2, -1, 0}, {4, 3, -2}}
    {U, S, V} = SingularValueDecomposition[A]

From this we learn that the singular value decomposition of A is

\begin{bmatrix} \frac{1}{\sqrt{26}} & -\frac{5}{\sqrt{26}} \\ \frac{5}{\sqrt{26}} & \frac{1}{\sqrt{26}} \\ \end{bmatrix} \begin{bmatrix} \sqrt{30} & 0 & 0 \\ 0 & 2 & 0 \end{bmatrix} \begin{bmatrix} \frac{11}{\sqrt{195}} & \frac{7}{\sqrt{195}} & -\sqrt{\frac{5}{39}} \\ -\frac{3}{\sqrt{26}} & 2 \sqrt{\frac{2}{13}} & -\frac{1}{\sqrt{26}} \\ \frac{1}{\sqrt{30}} & \sqrt{\frac{2}{15}} & \sqrt{\frac{5}{6}} \end{bmatrix}

Note that the last matrix is not V but the transpose of V. Mathematica returns V itself, not its transpose.

If we multiply the matrices back together we can verify that we get A back.

    U . S. Transpose[V]

This returns

    {{2, -1, 0}, {4, 3, -2}}

as we’d expect.

Computing pseudoinverse in Mathematica

The Mathematica command for computing the pseudoinverse is simply PseudoInverse. (The best thing about Mathematica is it’s consistent, predictable naming.)

    PseudoInverse[A]

This returns

    {{19/60, 1/12}, {-(11/30), 1/6}, {1/12, -(1/12)}}

And we can confirm that computing the pseudoinverse via the SVD

    Sp = {{1/Sqrt[30], 0}, {0, 1/2}, {0, 0}}
    V . Sp . Transpose[U]

gives the same result.

Computing SVD in Python

Next we compute the singular value decomposition in Python (NumPy).

    >>> a = np.matrix([[2, -1, 0],[4,3,-2]])
    >>> u, s, vt = np.linalg.svd(a, full_matrices=True)

Note that np.linalg.svd returns the transpose of V, not the V in the definition of singular value decomposition.

Also, the object s is not the diagonal matrix Σ but a vector containing only the diagonal elements, i.e. just the singular values. This can save a lot of space if the matrix is large. The NumPy method svd has other efficiency-related options that I won’t go into here.

We can verify that the SVD is correct by turning s back into a matrix and multiply the components together.

    >>> ss = np.matrix([[s[0], 0, 0], [0, s[1], 0]])
    >>> u*ss*vt

This returns the matrix A, within floating point accuracy. Since Python is doing floating point computations, not symbolic calculation like Mathematica, the zero in A turns into -3.8e-16.

Note that the singular value decompositions as computed by Mathematica and Python differ in a few signs here and there; the SVD is not unique.

Computing pseudoinverse in Python

The pseudoinverse can be computed in NumPy with np.linalg.pinv.

    >>> np.linalg.pinv(a)
    matrix([[ 0.31666667,  0.08333333],
            [-0.36666667,  0.16666667],
            [ 0.08333333, -0.08333333]])

This returns the same result as Mathematica above, up to floating point precision.

More linear algebra posts

Probability of coprime sets

The latest blog post from Gödel’s Lost Letter and P=NP looks at the problem of finding relatively prime pairs of large numbers. In particular, they want a deterministic algorithm. They mention in passing that the probability of a set of k large integers being relatively prime (coprime) is 1/ζ(k) where ζ is the Riemann zeta function. This blog post will look closer at that statement.

How large is large?

What does it mean that the numbers are large? That they are large enough that the asymptotic result given by the zeta function is accurate enough. We’ll explore this with simulation code shortly.

The exact statement is that in the limit as N goes to infinity, the probability of k integers chosen uniformly from 1 to N being coprime converges to 1/ζ(k). I won’t go into the proof here, but in a nutshell, it falls out of Euler’s product formula for the zeta function.

What does it mean for a set of numbers to be coprime?

The probability above seemed wrong to me at first. The function ζ(k) decreases as k increases, and so its reciprocal increases. That means it’s more likely that a set of numbers is coprime as the size of the set increases. But the larger the set of numbers, the more likely it is that two will have a common factor, so shouldn’t the probability that they’re coprime go down with k, not up?

The resolution to the apparent contradiction is that I had the wrong idea of coprime in mind. Two integers are coprime if they have no prime factors in common. How do you generalize that to more than two integers? You could define a set of numbers to be coprime if every pair of numbers in the set is coprime. That’s the definition I had in mind. But that’s not the standard definition. The right definition here is that the greatest common divisor of the set is 1. For example, (6, 14, 21) would be coprime because no single prime divides all three numbers, even though no pair of numbers from the set is coprime.

Python simulation

Let’s write some Python code to try this out. We’ll randomly generate some sets of large numbers and see how often they’re coprime.

How do you generate large random integers in Python? You can use the function getrandbits to generate numbers as large as you like. In the code below we’ll generate 128-bit integers.

How do you compute the greatest common divisor in Python? There’s a library function gcd, but it only takes two integers. We’ll use the reduce function to fold gcd over a list of integers.

How do you compute the zeta function in Python? SciPy doesn’t have an implementation of ζ(x) per se, but it does have an implementation of ζ(x)-1. Why not just implement ζ itself? Because for large x, ζ(x) is close to 1, so more precision can be saved by computing ζ – 1.

Putting these pieces together, here’s code to randomly generate triples of 128-bit integers, see how often they’re coprime, and compare the result to 1/ζ(3).

    from random import getrandbits
    from fractions import gcd
    from functools import reduce
    from scipy.special import zetac
    
    def riemann_zeta(x):
        return zetac(x) + 1
    
    k = 3
    size = 128
    N = 10000
    
    coprime_count = 0
    for _ in range(N):
        x = [getrandbits(size) for _ in range(k)]
        if reduce(gcd, x) == 1:
            coprime_count += 1
    print(coprime_count/N)
    print(1/riemann_zeta(k))

When I ran this I got

    0.8363
    0.831907372581

Simulation agrees with theory to two decimal places, which is just what we should expect from 10,000 repetitions. (We’d expect error on the order of 1/√N.)

Here’s what I got when I changed k to 4:

    0.9234
    0.923938402922

And here’s what I got for k set to 5:

    0.9632
    0.964387340429

Related posts