Suppose you need to find the slope of a line through a set of data. You can get a surprisingly good approximation by simply fitting a line to the first and last points. This is known as “Bancroft’s rule.” It seems too good to be true. Of course just fitting a line to just two points is not as good as using all the data, but unless you have a fairly large amount of data, it’s not too much worse either. It’s good enough for a quick estimate.
Just how good is this estimate compared to using all the data? We’ll look at the technical details of an example below.
Suppose you have a regression model y = α + βx + ε where ε is random noise. Suppose ε is normally distributed with mean 0 and variance σ2. Let b be the least squares estimator of β. The variance in b is
Now suppose we have observations yi corresponding to xi = 0, 1, 2, …, 2n. The average value of x is n, and the denominator in the expression for the variance of the slope estimator is 2(12 + 22 + 32 + … + n2) = (2n3 + 3n2 + n)/3. If we just use the data at x = 0 and x = 2n, the denominator is (0 – n)2 + (2n – n)2 = 2n2.
If we divide the estimator variance based on Bancroft’s rule by the estimator variance using all the data, the σ2 terms cancel and we are left with n/3 + 1/2 + 1/6n. So Bancroft’s rule increases the variance in the estimate for the slope by roughly n/3 compared to using all the data. Thus it increases the confidence interval by roughly the square root of n/3. So if you had 12 data points, the confidence interval would be about twice as wide. Said another way, the estimate based on all the data is only twice as good as the estimate based on just the first and last points.