Suppose you want to test whether something you’re doing is having any effect. You take a few measurements and you compute the average. The average is different than what it would be if what you’re doing had no effect, but is the difference significant? That is, how likely is it that you might see the same change in the average, or even a greater change, if what you’re doing actually had no effect and the difference is due to random effects?
The most common way to address this question is the one-sample t test. “One sample” doesn’t mean that you’re only taking one measurement. It means that you’re taking a set of measurements, a sample, from one thing. You’re not comparing measurements from two different things.
The t test assumes that the data are coming from some source with a normal (Gaussian) distribution. The Gaussian distribution has thin tails, i.e. the probability of seeing a value far from the mean drops precipitously as you move further out. What if the data are actually coming from a distribution with heavier tails, i.e. a distribution where the probability of being far from the mean drops slowly?
With fat tailed data, the t test loses power. That is, it is less likely to reject the null hypothesis, the hypothesis that the mean hasn’t changed, when it should. First we will demonstrate by simulation that this is the case, then we’ll explain why this is to be expected from theory.
We will repeatedly draw a sample of 20 values from a distribution with mean 0.8 and test whether the mean of that distribution is not zero by seeing whether the t test produces a p-value less than the conventional cutoff of 0.05. We will increase the thickness of the distribution tails and see what that does to our power, i.e. the probability of correctly rejecting the hypothesis that the mean is zero.
We will fatten the tails of our distribution by generating samples from a Student t distribution and decreasing the number of degrees of freedom: as degrees of freedom go down, the weight of the tail goes up.
With a large number of degrees of freedom, the t distribution is approximately normal. As the number of degrees of freedom decreases, the tails get fatter. With one degree of freedom, the t distribution is a Cauchy distribution.
Here’s our Python code:
from scipy.stats import t, ttest_1samp n = 20 N = 1000 for df in [100, 30, 10, 5, 4, 3, 2, 1]: rejections = 0 for _ in range(N): y = 0.8 + t.rvs(df, size=n) stat, p = ttest_1samp(y, 0) if p < 0.05: rejections += 1 print(df, rejections/N)
And here’s the output:
100 0.917 30 0.921 10 0.873 5 0.757 4 0.700 3 0.628 2 0.449 1 0.137
When the degrees of freedom are high, we reject the null about 90% of the time, even for degrees of freedom as small as 10. But with one degree of freedom, i.e. when we’re sampling from a Cauchy distribution, we only reject the null around 14% of the time.
Why do fatter tails lower the power of the t test? The t statistic is
where y bar is the sample average, μ0 is the mean under the null hypothesis (μ0 = 0 in our example), s is the sample standard deviation, and n is the sample size.
As distributions become fatter in the tails, the sample standard deviation increases. This means the denominator in the t statistic gets larger and so the t statistic gets smaller. The smaller the t statistic, the greater the probability that the absolute value of a t random variable is greater than the statistic, and so the larger the p-value.
t statistic, t distribution, t test
There are a lot of t‘s floating around in this post. I’ll finish by clarifying what the various t things are.
The t statistic is the thing we compute from our data, given by the expression above. It is called a t statistic because if the hypotheses of the test are satisfied, this statistic has a t distribution with n-1 degrees of freedom. The t test is a hypothesis test based on the t statistic and its distribution. So the t statistic, the t distribution, and the t test are all closely related.
The t family of probability distributions is a convenient example of a family of distributions whose tails get heavier or lighter depending on a parameter. That’s why in the simulation we drew samples from a t distribution. We didn’t need to, but it was convenient. We would get similar results if we sampled from some other distribution whose tails get thicker, and so variance increases, as we vary some parameter.