Distribution of coordinates on a sphere

Almost Sure posted an interesting fact on X:

If a point (x, y, z) is chosen at random uniformly on the unit sphere, then x, y, and z each have the uniform distribution on [−1, 1] and zero correlations (but not independent!) This follows from Archimedes’ “On the Sphere and Cylinder” published in 225BC.

Archimedes effectively showed that projecting a sphere to a cylinder wrapping round its equator preserves area, so projection of (x,y,z) is uniform on the cylinder. Used in Lambert cylindrical equal area projection.

In this post I want to empirically demonstrate the distribution of the coordinates, their lack of linear correlation, and their dependence.

The usual way to generate random points on a sphere is to generate three samples from a standard normal distribution and normalize by dividing by the Euclidean norm of the three points. So to generate a list of N points on a sphere, I generate an N × 3 matrix of normal samples and divide each row by its norm.

import numpy as np

N = 10000
# Generate N points on a sphere
M = norm.rvs(size=(N, 3))
M /= np.linalg.norm(M, axis=1)[:, np.newaxis]

Next, I find the correlation of each column with the others.

x = M[:, 0]
y = M[:, 1]
z = M[:, 2]
cxy, _ = pearsonr(x, y)
cyz, _ = pearsonr(y, z)
cxz, _ = pearsonr(x, z)
print(cxy, cyz, cxz)

When I ran this I got correlations of -0.0023, 0.0047, and -0.0177. For reasons explained in the previous post, I’d expect values in the interval [−0.02, 0.02] if the columns were independent, so the correlations are consistent with that hypothesis.

To demonstrate that each column is uniformly distributed on [−1, 1] you could look at histograms.

import matplotlib.pyplot as plt

plt.hist(x, bins=int(N**0.5))
plt.show()
plt.hist(y, bins=int(N**0.5))
plt.show()
plt.hist(z, bins=int(N**0.5))
plt.show()

I’ll spare you the plots, but they look like what you’d expect. If you wanted to do something more analytical than visual, you could compute something like the Kolmogorov-Smirnov statistic.

The coordinates are not independent, though they have zero linear correlation. Maybe a test for nonlinear dependence will work. I tried Spearman’s rank correlation and Kendall’s tau, and neither detected the dependence. But distance correlation did.

from dcor import distance_correlation

cxy = distance_correlation(x, y)
cyz = distance_correlation(y, z)
cxz = distance_correlation(x, z)
print(cxy, cyz, cxz)

Each of the distance correlations was approximately 0.11, substantially greater than zero, implying dependence between the coordinates.

Related posts