Big correlations and big interactions

An outcome cannot be highly correlated with a large number of independent predictors.

This observation has been called the piranha problem. Predictors are compared to piranha fish. If you have a lot of big piranhas in a small pond, they start eating each other. If you have a lot of strong predictors, they predict each other.

In [1] the authors quantify the piranha effect several ways. I’ll just quote the first one here. See the paper for several other theorems and commentary on their implications.

If X₁, …, X_p, y are real-valued random variables with finite non-zero variance, then

$\sum_{i=1}^p |\text{corr}(X_i, y)| \leq \sqrt{p + \sum_{i\ne j}|\text{corr}(X_i, X_j)|}$

So if the left side is large, either because p is large or because some of the correlations are large, then the right side is also large, and so the sum of the interaction terms is large.

[1]. The piranha problem: large effects swimming in a small pond. Available on arxiv.

One thought on “Big correlations and big interactions”

Mike Anderson

27 November 2022 at 07:13

I love it! When we teach about multicollinearity, we haven’t had a memorable name for the problem; now we do. Thanks!

Comments are closed.

Related posts

One thought on “Big correlations and big interactions”