An outcome cannot be highly correlated with a large number of independent predictors.

This observation has been called the **piranha problem**. Predictors are compared to piranha fish. If you have a lot of big piranhas in a small pond, they start eating each other. If you have a lot of strong predictors, they predict each other.

In [1] the authors quantify the piranha effect several ways. I’ll just quote the first one here. See the paper for several other theorems and commentary on their implications.

If *X*_{1}, …, *X*_{p}, *y* are real-valued random variables with finite non-zero variance, then

So if the left side is large, either because *p* is large or because some of the correlations are large, then the right side is also large, and so the sum of the interaction terms is large.

## Related posts

[1]. The piranha problem: large effects swimming in a small pond. Available on arxiv.

I love it! When we teach about multicollinearity, we haven’t had a memorable name for the problem; now we do. Thanks!