We begin with a bit of geometry, then show its relevance to statistics.
Let X, Y, and Z be three unit vectors. If X is nearly parallel to Y, and Y is nearly parallel to Z, then X is nearly parallel to Z.
Here’s a proof. Think of X, Y, and Z as points on a unit sphere. Then saying that X and Y are nearly parallel means that the two points are close together on the sphere. The statement above follows from the triangle inequality on the sphere:
dist(X, Z) ≤ dist(X, Y) + dist(Y, Z).
So if the two terms on the right are small, the term on the left is small, though maybe not quite as small. No more than twice the larger of the other two angles.
We can be a little more quantitative. Let a be the angle between X and Y, b the angle between Y and Z, and c the angle between X and Z. Then the law of cosines for spherical trigonometry says
cos c = cos a cos b + sin a sin b cos γ
where γ is the angle between the arcs a and b. If a and b are small, then sin a and sin b are also small (see here), and so we have the approximation
cos c ≈ cos a cos b.
The error in the approximation is sin a sin b cos γ, the product of two small numbers and a number with absolute value no more than 1.
The geometric exercise above was inspired by a discussion of correlation.
Correlation of random variables is not transitive. Correlation corresponds to directions not being perpendicular. If X is not perpendicular to Y, and Y is not perpendicular to Z, it might be the case that X is perpendicular to Z.
But if we replace “not perpendicular” with “nearly parallel” we see that we do have something like transitivity. That is, correlation of random variables is not transitive, but high correlation is.
If the angles a, b, and c above are correlation angles, then we have the approximation
corr(X, Z) ≈ corr(X, Y) corr(Y, Z)
if all the correlations are near 1.
Exercise for the reader: interpret the error term in the geometric problem in statistical terms.