When the normal approximation for Student t isn't good enough

Folk wisdom says that for all practical purposes, a Student-t distribution with 30 or more degrees of freedom is a normal distribution. Well, not for all practical purposes.

For 30 or more degrees of freedom, the error in approximating the PDF or CDF of a Student-t distribution with a normal is less than 0.005. So for many applications, the n > 30 rule of thumb is appropriate. (See these notes for details.)

However, sometimes you need to look at the quantiles of a t distribution, such as when finding confidence intervals. For example, when computing confidence intervals, you don’t need to evaluate the CDF of a Student t distribution per se but rather the inverse of such a CDF. And in that case, the error in the normal approximation was larger than I expected.

Say you’re computing a 95% confidence interval for the mean of a set of 31 data points. You first find t* such that P(t > t*) = 0.025 where t is a Student-t random variable with 31 – 1 = 30 degrees of freedom. Your confidence interval is the sample mean +/- t* s/√n where s is the sample standard deviation. For 30 degrees of freedom, t* = 2.04. If you used the normal approximation, you’d get 1.96 instead of 2.04, a relative error of about 4% meaning the error in computing your confidence interval is about 4%. While the error in normal approximation to the CDF is less than 0.005 for n > 30, the error in the normal approximation to the CDF inverse is an order of magnitude greater. Also, the error increases as the confidence increases. For example, for a 99% confidence interval, the error is about 6.3%.

It may be that none of this is a problem. If you only have 31 data points, there’s a fair amount of uncertainty in your estimate of the mean, and there’s no point in quantifying with great precision an estimate of how uncertain you are! Modeling assumptions are probably a larger source of error than the normal approximation to the Student-t.  But as a numerical problem, it’s interesting that the approximation error may be larger than expected. For n = 300, the error in the normal approximation to t* is about 0.4%. This means the error in the normal approximation to the inverse CDF is as good at n=300 as the normal approximation to the CDF itself is at n=30.

Read More