Last night after dinner, the conversation turned to high-dimensional geometry. (I realize how odd that sentence sounds; I was with some unusual company.)

Someone brought up the fact that two randomly chosen vectors in a high-dimensional space are very likely to be nearly orthogonal. This is a surprising but well known fact. Next the conversation turned to something also surprising but not so well known.

Suppose you’re in a 20,000 dimensional space and you choose 10,000 vectors at random. Now you choose one more, and ask what angle the new vector makes with the space spanned by the 10,000 previous vectors. The new vector is very likely to have a very small component in the direction of each of the previous vectors individually, but what about their span?

There are several ways to approach this problem. You can find the exact distribution on the angle analytically, but I thought it might be more interesting to demonstrate this by actually drawing random vectors. I started to write a simulation and discovered that the Python installation on my laptop is corrupted, and so I decided to take a third approach. I’ll give an informal explanation of what’s going on. Those who wish could either fill in the details to write a proof [1] or code it up to write a simulation.

First of all, what does it mean to pick a vector randomly from a vector space? We say we want to choose vectors uniformly, but that isn’t actually possible. What we actually mean is we want to pick vectors so that their *directions* are uniform. That’s possible, and we only need to work with unit vectors anyway.

The way to generate uniform random samples of unit vectors in *n* dimensions is to first choose a standard normal random value for each coordinate, then normalize so the vector has length 1.

Suppose we do as we discussed above. We work in 20,000 dimensions and choose 10,000 random vectors. Call their span *S*. Then we choose another random vector *v*. What angle does *v* make with *S*? You can’t simply project *v* onto each of the basis vectors for *S* because even though they are *nearly* orthogonal, they’re not *exactly* orthogonal.

It turns out that it doesn’t matter what the vectors are that defined *S*. All that matters is that *S* is almost certainly a 10,000 dimensional subspace. By symmetry it doesn’t matter *which* subspace it is, so we’ll assume for convenience that it’s the subspace spanned by the first 10,000 unit vectors. The angle that *v* makes with *S* is the angle between *v* and its projection *w* onto *S*, and *w* is simply the vector you get by zeroing out the last 10,000 components of *v*.

The cosine of the angle between two unit vectors is their dot product. Our vector *v* is a unit vector, but its projection *w* is not. So the cosine of the angle is

cos θ = *v*·*w*/||*w*||

Now *v*·*w* is the sum of the squares of the first half of *v*‘s components. This is probably very close to 1/2 since *v* is a unit vector. This is also *w*·*w*, and so ||*w*|| is probably very close to the square root of 1/2. And so cos θ is very nearly √2/2, which means θ = 45°.

So in summary, if you generate 10,000 vectors in 20,000 dimensions, then find the angle of a new random vector with the span of the previous vectors, it’s very likely to be very near 45°.

## Related posts

[1] If you’re in an *n*-dimensional space and *S* has dimension *m* < *n*, then cos² θ has the distribution of *X*/(*X* + *Y*) where *X* ~ χ²(*m*) and *Y* ~ χ²(*n*–*m*), which has distribution beta(*m*/2, (*n*–*m*)/2). If *m* = *n*/2 and *m* is large, this is a beta distribution concentrated tightly around 1/2, and so cos θ is concentrated tightly around √2/2. More generally, if *m* and *n* are large, cos² θ is concentrated around *m*/*n*.