Rolling dice for normal samples

Roll five dice and use the sum to simulate samples from a normal distribution. This is an idea I ran across in Teaching Statistics by Andrew Gelman and Deborah Nolan. The book mentions this idea in the context of a classroom demonstration where students are to simulate IQ scores.

I was surprised that just five dice would do a decent job.  So just how good a job does the dice generator do?

To find out, I looked at the difference between the cumulative densities of the dice generator and an exact normal generator. When you roll a single six-sided die, the outcomes have mean 3.5 and variance 35/12, and so the corresponding mean and variance for rolling 5 dice is 5 times greater. By the central limit theorem, the sum of the five rolls should have approximately the same distribution as a normal random variable with the same mean and variance. So what we want to do is compare the distribution of the sum of the five dice rolls to a Normal(17.5, 3.8192).

Now how do we get the distribution of the sum of the five dice? For one die it’s easy: the numbers 1 through 6 all have probability 1/6 of showing up. The sum of two dice is harder, but not too hard to brute force. But if you’re looking at five or more dice, you need to be a little clever.

By using generating functions, you can see that the probability of getting a sum of k spots on the five dice is the coefficient of xk in the expansion of

(x + x2 + x3 + x4 + x5 + x6)5 / 65

In Mathematica code, the probability density for the sum of the five dice is

pmf = Table[
  6^-5 Coefficient[(x + x^2 + x^3 + x^4 + x^5 + x^6)^5, x^i],
  {i, 30}]

To get the distribution function, we need to create a new list whose nth item is the sum of the first n items in the list above. Here’s the Mathematica code to do that.

cdf = Rest[FoldList[Plus, 0, pmf]]

The FoldList repeatedly applies the function supplied as its first argument, in this case Plus, starting with the second argument, 0. The Rest function removes the extra 0 that FoldList adds to the front of the list. (Like cdr for Lisp fans.)

Here’s the Mathematica code to produce the corresponding normal distribution.

norm = Table[ 
    CDF[NormalDistribution[3.5*5, Sqrt[5*35/12]], i], 
    {i, 30}]

Here’s what the difference between the dice distribution and the normal distribution looks like:

plot of cdf - norm

The maximum error is about 5%. It’s not an industrial quality random number generator, but it’s perfectly adequate for classroom demonstrations.

Now what would happen if we rolled more than five dice? We’d expect a better approximation. In fact, we can be more precise. The central limit theorem suggests that as we roll n dice, the maximum error will be proportionate to 1/√ n when n is large. So if we roll four times as many dice, we’d expect the error to go down by a factor of two. In fact, that’s just what we see. When we go from 5 dice to 20 dice, the maximum error goes from 0.0525 to 0.0262.

14 thoughts on “Rolling dice for normal samples

  1. A similar example I always thought was extremely clever was to add together 12 independent standard uniforms and subtract 6 from the sum. It does a great job of approximating a standard normal and requires no division.

  2. Regarding John Venier’s comment above, note why 12 is special. Not just because it’s a large number. 20 is even bigger, for example, but would not work.

    The variance of a uniform[0,1] random variable is 1/12. So adding 12 together makes the variance 1. That’s why his trick produces a standard random sample.

  3. Exactly so! Also, when using electronics it is almost always the case that (pseudo-)random values from the standard uniform distribution are available even if for no other distribution. Depending on your application, having something fast and easy and almost normally distributed can often be good enough. It is probably worth mentioning here that all of the usual methods of generating Normal random variables are approximations to some degree.

    Another point probably worth mentioning is that the values generated by this method cannot exceed 6 in absolute value, even in theory. But the theoretical probability of a Normal r.v. being outside of (- 6,6) is about 2 per thousand million. And having bounded values is not necessarily bad.

  4. Here’s an interesting empirical test of a random number generator: how many samples would it take before a goodness of fit test could reject the hypothesis that the samples came from the theoretical distribution?

    I tried this with the K-S test, but unfortunately the fact that the dice generator only has 26 possible values throws the test off: samples from a continuous distribution are not supposed to have repeats.

  5. Is it my imagination, or does that distribution of the differences look asymmetric?
    It’s been a long time since I thought about generating functions. You have made me get my battered copy of Cassela-Berger down off the shelf, again! ;-)

  6. You are correct: the difference between the two distributions is asymmetric. This is because one distribution is discrete and one is continuous.

    Imagine the density for the dice. Half of the probability is for rolls between 5 and 17, the other half between 18 and 30. So the CDF of the discrete dice distribution reaches 1/2 at 17, but the CDF of the continuous normal distribution reaches 1/2 at 17.5. Or think about the end. The CDF of the discrete distribution reaches 1 at 30, but the CDF of the normal is only 1 in the limit. So the discrete distribution is a little bit ahead of the continuous distribution.

  7. That makes sense, I should not have had the expectation of symmetry.
    Now that I think on it a bit longer, the asymmetry is small enough that I had real doubt, which also saying something about how well this simple approximation works.

  8. Hi John,

    I ran into your webpage and I have a question, I hope you can help.
    I am trying to find an equation that describe the probability of the following event:

    We have 10 cards (nmbered from 1 to 10) and these cards are shuffled and then leid out in random order. ie the first time we could get 3, 6, 5, 7, 8, 10, 9, 2, 1, 4 and and different order for the second time. Each card is then given a rank according to the order in which they appear (in the example above, card 3 is 1st).
    If one does that a number of times (eg 100), one can see that the average rank of each card has a mean of 5.5 and the distribution of these means is normal. The shape of the normal distribution gets narrower and narrowed the more events are carried out. There must be a function that describes this normal distrbution based on the number of cards and the number of shuffling events. I hope you can help, and /or maybe direct us to a reference (book or article) as a source of information.

  9. Here are the buzz words I think you need to find what you’re looking for: asymptotic distribution of order statistics. For example, if you look at the sample median, it does indeed become progressively more normal as the sample size n increases. In fact sqrt(n)*(sample median – mean) is asymptotically normal with mean 0 and variance 1 / (2 f(mean))^2 where f is the PDF of the distribution you’re sampling from. This comes from Casella and Berger’s book Statistical Inference, 2nd edition, page 484. Casella and Berger say to see Mathematical Statistics by J. Shao for a more general result.

  10. “It is probably worth mentioning here that all of the usual methods of generating Normal random variables are approximations to some degree.”

    The Box-Muller transformation of two uniform random variables actually gives a true normal distribution–provided you have a decent way of generating random numbers in the interval from 0 to 1. It’s a pretty elegant trick based on a polar transformation of a bivariate normal distribution: if X and Y are uniformly distributed random variables in [0,1], then

    Z = sqrt(-2*ln(X))*cos(2*pi*Y)

    is a standard normal random variable. The Mathworld article is here: And here is an online calculator that produces normally distributed random variables (with links to other random generators):

  11. A similar example I always thought was extremely clever was to add together 12 independent standard uniforms and subtract 6 from the sum.

    Why are the errors asymmetric about the mean?

Comments are closed.