I heard someone say the other day that the advantage of formal software validation methods is that they let you explore the corners, cases where intuition doesn’t naturally take you.
This made me think of corners in the geometric sense. If you have a sphere in a box in high dimensions, nearly all the volume is in the corners, i.e. outside the sphere. This is more than a metaphor. You can think of software options geometrically, with each independent choice corresponding to a dimension. Paths through a piece of software that are individually rare may account for nearly all use when considered together.
With a circle inside a square, nearly 78.5% of the area is inside the circle. With a ball sitting inside a 3-D box, 52.4% of the volume is inside the ball. As the dimension increases, the proportion of volume inside the sphere rapidly decreases. For a 10-dimensional sphere sitting in a 10-dimensional box, 0.25% of the volume is in the sphere. Said another way, 99.75% of the volume is in the corners.
When you go up to 100 dimensions, the proportion of volume inside the sphere is about 2 parts in 1070, a 1 followed by 70 zeros [1]. If 100 dimensions sounds like pure fantasy, think about a piece of software with more than 100 features. Those feature combinations multiply like geometric dimensions [2].
Here’s a little Python code you could use to see how much volume is in a sphere as a function of dimension.
from scipy.special import gamma from math import pi def unit_sphere_volume(n): return pi**(0.5*n)/gamma(0.5*n + 1) def unit_cube_volume(n): return 2**n def ratio(n): return unit_sphere_volume(n) / unit_cube_volume(n) print( [ratio(n) for n in range(1, 20)] )
* * *
[1] There are names for such extremely large numbers. These names are hardly ever used—scientific notation is much more practical— but they’re fun to say. 1070 is ten duovigintillion in American nomenclature, ten undecilliard in European.
[2] Geometric dimensions are perfectly independent, but software feature combinations are not. In terms of logic, some combinations may not be possible. Or in terms of probability, the probability of exploring some paths is conditional on the probability of exploring other paths. Even so, there are inconceivably many paths through any large software system. And in large-scale operations, events that should “never happen” happen regularly.
OK, I should have been working instead of surfing the web. But I followed this link http://blog.geekpress.com/2016/09/high-dimensional-sphere-packing.html
That pointed to an interesting article on Scientific American:
http://blogs.scientificamerican.com/roots-of-unity/why-you-should-care-about-high-dimensional-sphere-packing/
An interesting piece on error correction using Sphere-packing. As I’m reading it, I remember your intriguing post about corners. I find it fascinating and fun when connections are made from random points.
Does this mean that in a high-dimension, normal distribution most of the probability is in the tails, rather than in the core?
David: The surprising result about the normal distribution in high dimensions is that almost all the probability is in a thin spherical shell!
Details here: https://www.johndcook.com/blog/2011/09/01/multivariate-normal-shell/