Understanding statistical error

A simple linear regression model has the form

y = μ + βx + ε.

This means that the output variable y is a linear function of the input variable x, plus some error term ε that is randomly distributed.

There’s a common misunderstanding over whose error the error term is. A naive view is that the world really is linear, that

y = μ + βx

is some underlying Platonic reality, and that the only reason that we don’t measure exactly that linear part is that we as observers have made some sort of error, that the fault is the real world rather than in the model.

No, reality is what it is, and it’s our model that is in error. Some aspect of reality may indeed have a structure that is approximately linear (over some range, under some conditions), but when we truncate reality to only that linear approximation, we introduce some error. This error may be tolerable—and thankfully it often is—but the error is ours, not the world’s.

2 thoughts on “Understanding statistical error

  1. Amen. I tell my students, using my best Inigo Montoya voice, “Error. In statistics, that word does not mean what you think it means.” Then I explain that Unexplained Variation is a better term, and urge them to think of the Rumsfeldian interpretation of “unexplained.”

Comments are closed.