Shawn Achor comments on “the cult of the average” in science.
So one of the very first things we teach people in economics and statistics and business and psychology is how, in a statistically valid way, do we eliminate the weirdos. How do we eliminate the outliers so we can find the line of best fit? Which is fantastic if I’m trying to find out how many Advil the average person should be taking — two. But if I’m interested in potential, if I’m interested in your potential, or for happiness or productivity or energy or creativity, what we’re doing is we’re creating the cult of the average with science. … If we study what is merely average, we will remain merely average.
Point taken, but we can’t all be transcription errors :)
You could argue that outliers are more likely to be errors than correct observations, but there’s a bit of circular reasoning going on here: I think my model is correct, and this outlier has low probability according to my model, therefore it is incorrect. And when I remove the points that violate my assumptions, see how well my assumptions hold?
Often outlier are errors. If one mouse in a study weighs 1000 times as much as the others, probably somebody misplaced a decimal. But suppose you’re taking a survey of household income and Bill Gates happens to be in your sample. His income will be orders of magnitude larger than the other participants. If you want to study average income, you’d discard his data as atypical. But if you want to study financial achievement, his data point is the most interesting one.
There are physical reasons to believe no mouse is orders of magnitude larger than average. But while an income orders of magnitude larger than average is surprising, and could be a transcription error, we can’t rule it out. Of course things are seldom this simple and it can be quite hard to tell whether you’re dealing with mice or millionaires. (OK, billionaires, but the alliteration with millionaires was too nice to pass up. :) )
Hmmm. Isn’t this simply a matter of trying to make a poor, abused statistic do more work than it was intended to bear? I mean, if I have a density and it is strongly multimodal, isn’t using the average bound to underspecify the interesting things in the population? Even a second moment won’t do it justice.
@Jan: Yes, but in the average case that statistical model does just fine, so why worry too much about the weirdo cases where it doesn’t? ;)
@Chris,
Depends upon the Loss Function … (Expect a comment from John on this, since we’ve Been There before.)
Cult of the average? That’s just mean.
@Chris :-D