Estimating standard deviation from range

Suppose you have a small number of samples, say between 2 and 10, and you’d like to estimate the standard deviation σ of the population these samples came from. Of course you could compute the sample standard deviation, but there is a simple and robust alternative.

Let W be the range of our samples, the difference between the largest and smallest value. Think “w” for “width.” Then

W / d_n

is an unbiased estimator of σ where the constants d_n can be looked up in a table [1].

    |  n | 1/d_n |
    |----+-------|
    |  2 | 0.886 |
    |  3 | 0.591 |
    |  4 | 0.486 |
    |  5 | 0.430 |
    |  6 | 0.395 |
    |  7 | 0.370 |
    |  8 | 0.351 |
    |  9 | 0.337 |
    | 10 | 0.325 |

The values d_n in the table were calculated from the expected value of W/σ for normal random variables, but the method may be used on data that do not come from a normal distribution.

Let’s try this out with a little Python code. First we’ll take samples from a standard normal distribution, so the population standard deviation is 1. We’ll draw five samples, and estimate the standard deviation two ways: by the method above and by the sample standard deviation.

    from scipy.stats import norm, gamma

    for _ in range(5):
        x = norm.rvs(size=10)
        w = x.max() - x.min()
        print(x.std(ddof=1), w*0.325)

Here’s the output:

    | w/d_n |   std |
    |-------+-------|
    | 1.174 | 1.434 |
    | 1.205 | 1.480 |
    | 1.173 | 0.987 |
    | 1.154 | 1.277 |
    | 0.921 | 1.083 |

Just from this example it seems the range method does about as well as the sample standard deviation.

For a non-normal example, let’s repeat our exercise using a gamma distribution with shape 4, which has standard deviation 2.

    | w/d_n |   std |
    |-------+-------|
    | 2.009 | 1.827 |
    | 1.474 | 1.416 |
    | 1.898 | 2.032 |
    | 2.346 | 2.252 |
    | 2.566 | 2.213 |

Once again, it seems both methods do about equally well. In both examples the uncertainty due to the small sample size is more important than the difference between the two methods.

Update: To calculate d_n for other values of n, see this post.

[1] Source: H, A. David. Order Statistics. John Wiley and Sons, 1970.

7 thoughts on “Estimating standard deviation from range”

Kevin Coombes

8 March 2022 at 06:19

Hi, John. You should also look at
https://archives.collections.ed.ac.uk/repositories/2/archival_objects/59045 for a different view of the same underlying idea. We published an article in 2015 using Newman’s method to identify outliers in dose response experiments. And now we are working on an application to find differential expression in paired omits data without needing replicates.

Andreas

8 March 2022 at 11:33

Now it would be useful to have a simple rule to remember the 1/d_n table. d_n = sqrt(n) or sqrt(n-0.5) might suffice.

John

8 March 2022 at 11:45

@Andreas: It would be nice if these numbers were memorable.

I imagine in some contexts people always take samples of the same size, then they’d only have one number to remember.

Even using d_n = 3, as crude as it is, might be useful for back-of-the-envelope estimates.

Ashley Kanter

8 March 2022 at 15:26

d_n=3*logn^0.75 (where the log is base 10) seems to perform quite well even for larger n

Robert Matthews

9 March 2022 at 12:57

Andreas’ approximation of d_n ~ sqrt(n) seems to work pretty well for n>2, but Ashley’s (empirical ?) fit seems at least an order of mag worse. Is there a typo in the formula?

Bill Harris

10 March 2022 at 00:45

In booklet 3 accompanying Stuart Hunter’s “Statistics for Problem Solving and Decision Making” (Westinghouse, 1971), p. 10 uses sigma_hat = Range / d_2 ~ Range / sqrt(n). His Table 2 lists d_2 for each value of n from 2 to 10 as what you list as d_n.

If I’m doing approximations, I figure that the formula using sqrt(n) is by definition good enough, because I’ll not remember d_2 reliably.

Ouz

20 May 2023 at 14:58

How would unbiased estimator change if we also know the mean of the sample alongside min & max?

Comments are closed.