Big correlations and big interactions

An outcome cannot be highly correlated with a large number of independent predictors.

This observation has been called the piranha problem. Predictors are compared to piranha fish. If you have a lot of big piranhas in a small pond, they start eating each other. If you have a lot of strong predictors, they predict each other.

In [1] the authors quantify the piranha effect several ways. I’ll just quote the first one here. See the paper for several other theorems and commentary on their implications.

If X1, …, Xp, y are real-valued random variables with finite non-zero variance, then

\sum_{i=1}^p |\text{corr}(X_i, y)| \leq \sqrt{p + \sum_{i\ne j}|\text{corr}(X_i, X_j)|}

So if the left side is large, either because p is large or because some of the correlations are large, then the right side is also large, and so the sum of the interaction terms is large.

Related posts

[1]. The piranha problem: large effects swimming in a small pond. Available on arxiv.

One thought on “Big correlations and big interactions

Comments are closed.