Suppose you have data in an N-dimensional space where N is large and consider the cube [-1, 1]N. The coordinate basis vectors start in the center of the cube and poke out through the middle of the faces. The diagonals of the cube run from the center to one of the corners.
If your points cluster along one of the coordinate axes, then projecting them to that axis will show the full width of the data. But if your points cluster along one of the diagonal directions, the projection along every coordinate axis will be a tiny smudge near the origin. There are a lot more diagonal directions than coordinate directions, 2N versus N, and so there are a lot of orientations of your points that could be missed by every coordinate projection.
Here’s the math behind the loose statements above. The diagonal directions of the form (±1, ±1, …, ±1). A unit vector in one of these directions will have the form (1/√N)(±1, ±1, …, ±1) and so its inner product with any of the coordinate basis vectors is 1/√N, which goes to zero as N gets large. Said another way, taking a set of points along a diagonal and projecting it to a coordinate axis divides its width by √N.
2 thoughts on “Disappearing data projections”
This is interesting.. Could you apply this to a word space of n words, and then talk of the data as the meaning to be conveyed in a sentence? Does this now illustrate understandability?
Meaningless word salad being the diagonal, or at least, tending to it in long such sentences
Even more generally: take a random (uniformly on S^n) vector and project it on any 1-d subspace, i.e., its first coordinate. Its expected length is 1/sqrt(n). But, the good news is that its length is with high probability 1/sqrt(n). So, the data projection gets smaller, but aside from this constant factor, it’s very predictable, and this allows one to recover the (relative) similarity among high-dimensional points, by projecting them onto low-dimensional spaces. This one of the application of the Johnson-Lindenstrauss lemma; and Dasgupta and Gupta use constructively the 1-d subspace projection to prove it.