Posts Tagged ‘Bias’

Bias

Wednesday, May 7th, 2008

An unbiased estimator, very roughly speaking, is a statistic that gives the correct result on average. For a precise definition, see Wikipedia. Unbiasedness is an intuitively desirable property. In fact, it seems indispensable at first.

In the colloquial sense, “bias” is practically synonymous with self-serving dishonesty. Who wants a self-serving, dishonest statistical estimate? But it’s important to remember that “bias” in statistical sense has a technical meaning that may not correspond to the colloquial meaning.

Here’s the big problem with statistical bias: if U is an unbiased estimator of θ, f(U) is NOT an unbiased estimator of f(θ) in general. For example, standard deviation is the square root of variance, but the square root of an unbiased estimator for variance is not an unbiased estimator for standard deviation. This shows bias has nothing to do with accuracy, since the square root of an accurate estimation of variance is an accurate estimate of standard deviation. In fact, unbiased estimators can be terrible.

The fact that unbiasedness is not preserved under transformations calls into question its usefulness. People seldom care directly about abstract statistical parameters directly. Instead they care about some calculation based on those parameters. An unbiased estimate of the parameters does not generally lead to an unbiased estimate of what people really want to estimate.

Unbiased estimators can be terrible

Saturday, January 12th, 2008

An estimator in statistics is a way of guessing a parameter based on data. An estimator is unbiased if over the long run, your guesses converge to the thing you’re estimating. Sounds eminently reasonable. But it might not be.

Suppose you’re estimating something like the number of car accidents per week in Texas and you counted 308 the first week. What would you estimate is the probability of seeing no accidents the next week?

If you use a Poisson model for the number of car accidents, a very common assumption for such data, there is a unique unbiased estimator. And this estimator would estimate the probability of no accidents during a week as 1. Worse, had you counted 307 accidents, the estimated probability would be -1! The estimator alternates between two ridiculous values, but in the long run these values average out to the true value. Exact in the limit, useless on the way there. A slightly biased estimator would be much more practical.

See Michael Hardy’s article for more details: An_Illuminating_Counterexample.pdf