Comments on: Rolling dice for normal samples

By: Irene Smith

Irene Smith — Wed, 23 Dec 2020 08:46:46 +0000

A similar example I always thought was extremely clever was to add together 12 independent standard uniforms and subtract 6 from the sum.

Why are the errors asymmetric about the mean?

By: Sean

Sean — Thu, 15 Sep 2011 20:24:22 +0000

“It is probably worth mentioning here that all of the usual methods of generating Normal random variables are approximations to some degree.”

The Box-Muller transformation of two uniform random variables actually gives a true normal distribution–provided you have a decent way of generating random numbers in the interval from 0 to 1. It’s a pretty elegant trick based on a polar transformation of a bivariate normal distribution: if X and Y are uniformly distributed random variables in [0,1], then

Z = sqrt(-2*ln(X))*cos(2*pi*Y)

is a standard normal random variable. The Mathworld article is here:
http://mathworld.wolfram.com/Box-MullerTransformation.html. And here is an online calculator that produces normally distributed random variables (with links to other random generators): http://www.had2know.com/academics/gaussian-normal-random-generator.html

By: John

John — Thu, 04 Nov 2010 10:04:53 +0000

In reply to Paolo. Here are the buzz words I think you need to find what you're looking for: asymptotic distribution of order statistics. For example, if you look at the sample median, it does indeed become progressively more normal as the sample size n increases. In fact sqrt(n)*(sample median - mean) is asymptotically normal with mean 0 and variance 1 / (2 f(mean))^2 where f is the PDF of the distribution you're sampling from. This comes from Casella and Berger's book Statistical Inference, 2nd edition, page 484. Casella and Berger say to see Mathematical Statistics by J. Shao for a more general result.

By: Paolo

Paolo — Thu, 04 Nov 2010 09:38:57 +0000

Hi John,

I ran into your webpage and I have a question, I hope you can help.
I am trying to find an equation that describe the probability of the following event:

We have 10 cards (nmbered from 1 to 10) and these cards are shuffled and then leid out in random order. ie the first time we could get 3, 6, 5, 7, 8, 10, 9, 2, 1, 4 and and different order for the second time. Each card is then given a rank according to the order in which they appear (in the example above, card 3 is 1st).
If one does that a number of times (eg 100), one can see that the average rank of each card has a mean of 5.5 and the distribution of these means is normal. The shape of the normal distribution gets narrower and narrowed the more events are carried out. There must be a function that describes this normal distrbution based on the number of cards and the number of shuffling events. I hope you can help, and /or maybe direct us to a reference (book or article) as a source of information.

By: gb

gb — Mon, 02 Mar 2009 06:51:08 +0000

Oh, sorry, I see it’s already discussed. My apologies.

By: gb

gb — Mon, 02 Mar 2009 06:50:32 +0000

(maybe a continuity-correction issue?)

By: gb

gb — Mon, 02 Mar 2009 06:28:49 +0000

Why are the errors asymmetric about the mean?

By: EastwoodDC

EastwoodDC — Wed, 11 Feb 2009 23:36:41 +0000

That makes sense, I should not have had the expectation of symmetry.
Now that I think on it a bit longer, the asymmetry is small enough that I had real doubt, which also saying something about how well this simple approximation works.

By: John

John — Wed, 11 Feb 2009 22:58:26 +0000

You are correct: the difference between the two distributions is asymmetric. This is because one distribution is discrete and one is continuous.

Imagine the density for the dice. Half of the probability is for rolls between 5 and 17, the other half between 18 and 30. So the CDF of the discrete dice distribution reaches 1/2 at 17, but the CDF of the continuous normal distribution reaches 1/2 at 17.5. Or think about the end. The CDF of the discrete distribution reaches 1 at 30, but the CDF of the normal is only 1 in the limit. So the discrete distribution is a little bit ahead of the continuous distribution.

By: EastwoodDC

EastwoodDC — Wed, 11 Feb 2009 20:57:04 +0000

Is it my imagination, or does that distribution of the differences look asymmetric?
It’s been a long time since I thought about generating functions. You have made me get my battered copy of Cassela-Berger down off the shelf, again! ;-)

By: John

John — Wed, 11 Feb 2009 00:09:01 +0000

In reply to John Venier. Here's an interesting empirical test of a random number generator: how many samples would it take before a goodness of fit test could reject the hypothesis that the samples came from the theoretical distribution? I tried this with the K-S test, but unfortunately the fact that the dice generator only has 26 possible values throws the test off: samples from a continuous distribution are not supposed to have repeats.

By: John Venier

John Venier — Wed, 11 Feb 2009 00:01:05 +0000

Exactly so! Also, when using electronics it is almost always the case that (pseudo-)random values from the standard uniform distribution are available even if for no other distribution. Depending on your application, having something fast and easy and almost normally distributed can often be good enough. It is probably worth mentioning here that all of the usual methods of generating Normal random variables are approximations to some degree.

Another point probably worth mentioning is that the values generated by this method cannot exceed 6 in absolute value, even in theory. But the theoretical probability of a Normal r.v. being outside of (- 6,6) is about 2 per thousand million. And having bounded values is not necessarily bad.

By: John

John — Tue, 10 Feb 2009 19:39:37 +0000

Regarding John Venier’s comment above, note why 12 is special. Not just because it’s a large number. 20 is even bigger, for example, but would not work.

The variance of a uniform[0,1] random variable is 1/12. So adding 12 together makes the variance 1. That’s why his trick produces a standard random sample.

By: John Venier

John Venier — Tue, 10 Feb 2009 19:01:07 +0000

A similar example I always thought was extremely clever was to add together 12 independent standard uniforms and subtract 6 from the sum. It does a great job of approximating a standard normal and requires no division.