12 November 2008 | John D. Cook

Folk wisdom says that for all practical purposes, a Student-t distribution with 30 or more degrees of freedom is a normal distribution. Well, not for all practical purposes.

For 30 or more degrees of freedom, the error in approximating the PDF or CDF of a Student-t distribution with a normal is less than 0.005. So for many applications, the n > 30 rule of thumb is appropriate. (See these notes for details.)

However, sometimes you need to look at the quantiles of a t distribution, such as when finding confidence intervals. For example, when computing confidence intervals, you don’t need to evaluate the CDF of a Student-t distribution per se but rather the inverse of such a CDF. And in that case, the error in the normal approximation may be larger than you’d expect.

Say you’re computing a 95% confidence interval for the mean of a set of 31 data points. You first find t^* such that P(t > t^*) = 0.025 where t is a Student-t random variable with 31 − 1 = 30 degrees of freedom. Your confidence interval is the sample mean +/− t^* s/√n where s is the sample standard deviation. For 30 degrees of freedom, t^* = 2.04. If you used the normal approximation, you’d get 1.96 instead of 2.04, a relative error of about 4% meaning the error in computing your confidence interval is about 4%. While the error in normal approximation to the CDF is less than 0.005 for n > 30, the error in the normal approximation to the CDF inverse is an order of magnitude greater. Also, the error increases as the confidence increases. For example, for a 99% confidence interval, the error is about 6.3%.

It may be that none of this is a problem. If you only have 31 data points, there’s a fair amount of uncertainty in your estimate of the mean, and there’s no point in quantifying with great precision an estimate of how uncertain you are! Modeling assumptions are probably a larger source of error than the normal approximation to the Student-t. But as a numerical problem, it’s interesting that the approximation error may be larger than expected. For n = 300, the error in the normal approximation to t^* is about 0.4%. This means the error in the normal approximation to the inverse CDF is as good at n=300 as the normal approximation to the CDF itself is at n = 30.

Day: 12 November 2008

When the normal approximation for Student t isn’t good enough

Architects versus engineers