Last night after dinner, the conversation turned to high-dimensional geometry. (I realize how odd that sentence sounds; I was with some unusual company.)
Someone brought up the fact that two randomly chosen vectors in a high-dimensional space are very likely to be nearly orthogonal. This is a surprising but well known fact. Next the conversation turned to something also surprising but not so well known.
Suppose you’re in a 20,000 dimensional space and you choose 10,000 vectors at random. Now you choose one more, and ask what angle the new vector makes with the space spanned by the 10,000 previous vectors. The new vector is very likely to have a very small component in the direction of each of the previous vectors individually, but what about their span?
There are several ways to approach this problem. You can find the exact distribution on the angle analytically, but I thought it might be more interesting to demonstrate this by actually drawing random vectors. I started to write a simulation and discovered that the Python installation on my laptop is corrupted, and so I decided to take a third approach. I’ll give an informal explanation of what’s going on. Those who wish could either fill in the details to write a proof  or code it up to write a simulation.
First of all, what does it mean to pick a vector randomly from a vector space? We say we want to choose vectors uniformly, but that isn’t actually possible. What we actually mean is we want to pick vectors so that their directions are uniform. That’s possible, and we only need to work with unit vectors anyway.
The way to generate uniform random samples of unit vectors in n dimensions is to first choose a standard normal random value for each coordinate, then normalize so the vector has length 1.
Suppose we do as we discussed above. We work in 20,000 dimensions and choose 10,000 random vectors. Call their span S. Then we choose another random vector v. What angle does v make with S? You can’t simply project v onto each of the basis vectors for S because even though they are nearly orthogonal, they’re not exactly orthogonal.
It turns out that it doesn’t matter what the vectors are that defined S. All that matters is that S is almost certainly a 10,000 dimensional subspace. By symmetry it doesn’t matter which subspace it is, so we’ll assume for convenience that it’s the subspace spanned by the first 10,000 unit vectors. The angle that v makes with S is the angle between v and its projection w onto S, and w is simply the vector you get by zeroing out the last 10,000 components of v.
The cosine of the angle between two unit vectors is their dot product. Our vector v is a unit vector, but its projection w is not. So the cosine of the angle is
cos θ = v·w/||w||
Now v·w is the sum of the squares of the first half of v‘s components. This is probably very close to 1/2 since v is a unit vector. This is also w·w, and so ||w|| is probably very close to the square root of 1/2. And so cos θ is very nearly √2/2, which means θ = 45°.
So in summary, if you generate 10,000 vectors in 20,000 dimensions, then find the angle of a new random vector with the span of the previous vectors, it’s very likely to be very near 45°.
More on high-dimensional spheres
 If you’re in an n-dimensional space and S has dimension m < n, then cos² θ has the distribution of X/(X + Y) where X ~ χ²(m) and Y ~ χ²(n–m), which has distribution beta(m/2, (n–m)/2). If m = n/2 and m is large, this is a beta distribution concentrated tightly around 1/2, and so cos θ is concentrated tightly around √2/2. More generally, if m and n are large, cos² θ is concentrated around m/n.
6 thoughts on “Random projection”
This reminds me of this exploration of picking random points in a sphere:
Not exactly the same thing, but it does suggest there is some subtlety in picking random vectors.
Thanks for sharing! Interesting facts. I’ve written some R code for the simulations, available in blog post form here: https://statisticaloddsandends.wordpress.com/2019/04/16/two-interesting-facts-about-high-dimensional-random-projections/
I think there’s a slight typo in the second last para? “This is also w·w” should be “||w||^2”?
Assuming it was possible to have a space of unknown (but high) dimensionality and it was possible to generate random unit vectors in that space: Does that mean that if you start with a space of unknown dimensionality, generate k random vectors in that space, and calculate the angle of the (k + 1)th random vector to the span of the preceding k vectors you could estimate the dimensionality of the space?
Let V equal the square of the cosine of the angle. Then V is distributed Beta(10000, 10000). The variance of V is small (=1/40004), so it is not unreasonable to say that with high probability the angle is 45 degrees. For those who like derivations and general results, see Section 5.2.2, eq. (25) of the remarkable book, An Introduction to Multivariate Statistical Analysis, by T.W. Anderson, 2nd Edition, Wiley, New York, 1984.
I’m having trouble with the statement “very likely to be very near 45°.” Why is it likely to be “very near”? Haven’t you just calculated an expected value kind, which gives no notion of its likliehood? Eg, suppose the distribution were uniform in theta, then you’d also expect 45° even though it would NOT likely for it to be “very near” that value.
Jared: See the footnote at the bottom of the post for the exact distribution of the angle.