# Gaussian correlation inequality

The Gaussian correlation inequality was proven in 2014, but the proof only became widely known this year. You can find Thomas Royan’s remarkably short proof here.

Let X be a multivariate Gaussian random variable with mean zero and let E and F be two symmetric convex sets, both centered at the origin. The Gaussian correlation inequality says that

Prob(in E and F) ≥ Prob(X in E) Prob(X in F).

Here’s a bit of Python code for illustrating the inequality. For symmetric convex sets we take balls of p-norm r where p ≥ 1 and r > 0. We could, for example, set one of the values of p to 1 to get a cube and set the other p to 2 to get a Euclidean ball.

```from scipy.stats import norm as gaussian

def pnorm(v, p):
return sum( abs(x)**p for x in v )**(1./p)

def simulate(dim, r1, p1, r2, p2, numReps):
count_1, count_2, count_both = (0, 0, 0)
for _ in range(numReps):
x = gaussian.rvs(0, 1, dim)
in_1 = (pnorm(x, p1) < r1)
in_2 = (pnorm(x, p2) < r2)
if in_1:
count_1 += 1
if in_2:
count_2 += 1
if in_1 and in_2:
count_both += 1
print("Prob in both:", count_both / numReps)
print("Lower bound: ", count_1*count_2 * numReps**-2)

simulate(3, 1, 2, 1, 1, 1000)
```

When `numReps` is large, we expect the simulated probability of the intersection to be greater than the simulated lower bound. In the example above, the former was 0.075 and the latter 0.015075, ordered as we’d expect.

If we didn’t know that the theorem has been proven, we could use code like this to try to find counterexamples. Of course a simulation cannot prove or disprove a theorem, but if we found what appeared to be a counterexample, we could see whether it persists with different random number generation seeds and with a large value of  `numReps`. If so, then we could try to establish the inequality analytically. Now that the theorem has been proven we know that we’re not going to find real counterexamples, but the code is only useful as an illustration.

## 5 thoughts on “Gaussian correlation inequality”

1. Aleksandr

Hi, John,
could you think of a data science problem where the inequality would be useful?

2. It’s a cube when p = 2, but you’re right that it’s not a cube in higher dimensions. If p = ∞ then you do have a cube in every dimension. (The limit as p goes to infinity is the max norm.)