“Rolling dice for normal samples: Python version” is locked Rolling dice for normal samples: Python version

A handful of dice can make a decent normal random number generator, good enough for classroom demonstrations. I wrote about this a while ago.

My original post included Mathematica code for calculating how close to normal the distribution of the sum of the dice is. Here I’d like to redo the code in Python to show how to do the same calculations using SymPy. [Update: I’ll also give a solution that does not use SymPy and that scales much better.]

If you roll five dice and add up the spots, the probability of getting a sum of k is the coefficient of x^k in the expansion of

(x + x² + x³ + x⁴ + x⁵ + x⁶)⁵ / 6⁵.

Here’s code to find the probabilities by expanding the polynomial and taking coefficients.

    from sympy import Symbol

    sides = 6
    dice = 5
    rolls = range( dice*sides + 1 )

    # Tell SymPy that we want to use x as a symbol, not a number
    x = Symbol('x')

    # p(x) = (x + x^2 + ... + x^m)^n
    # where m = number of sides per die
    # and n = number of dice
    p = sum( x**i for i in range(1, sides + 1) )**dice

    # Extract the coefficients of p(x) and divide by sides**dice
    pmf = [sides**(-dice) * p.expand().coeff(x, i) for i in rolls]

If you’d like to compare the CDF of the dice sum to a normal CDF you could add this.

    from scipy import array, sqrt
    from scipy.stats import norm

    cdf = array(pmf).cumsum()

    # Normal CDF for comparison
    mean = 0.5*(sides + 1)*dice
    variance = dice*(sides**2 -1)/12.0
    temp = [norm.cdf(i, mean, sqrt(variance)) for i in roles]
    norm_cdf = array(temp)

    diff = abs(cdf - norm_cdf)
    # Print the maximum error and where it occurs
    print diff.max(), diff.argmax()

Question: Now suppose you want a better approximation to a normal distribution. Would it be better to increase the number of dice or the number of sides per dice? For example, would you be better off with 10 six-sided dice or 5 twelve-sided dice? Think about it before reading the solution.

Update: The SymPy code does not scale well. When I tried the code with 50 six-sided dice, it ran out of memory. Based on Andre’s comment, I rewrote the code using polypow. SymPy offers much more symbolic calculation functionality than NumPy, but in this case NumPy contains all we need. It is much faster and it doesn’t run out of memory.

    from numpy.polynomial.polynomial import polypow
    from numpy import ones

    sides = 6
    dice = 100

    # Create an array of polynomial coefficients for
    # x + x^2 + ... + x^sides
    p = ones(sides + 1)
    p[0] = 0

    # Extract the coefficients of p(x)**dice and divide by sides**dice
    pmf = sides**(-dice) * polypow(p, dice)
    cdf = pmf.cumsum()

That solution works for up to 398 dice. What’s up with that? With 399 dice, the largest polynomial coefficient overflows. If we divide by the number of dice before raising the polynomial to the power dice, the code becomes a little simpler and scales further.

    p = ones(sides + 1)
    p[0] = 0
    p /= sides
    pmf = polypow(p, dice)
    cdf = pmf.cumsum()

I tried this last approach on 10,000 dice with no problem.

3 thoughts on “Rolling dice for normal samples: Python version”

Andre Manoel

29 April 2013 at 09:12

Interesting!
Perhaps numpy.polynomial.polypow would be more practical in this situation, though? At least for me, since I’m not very used to Sympy :-)
Cem

29 April 2013 at 10:47

If M is the number of sides and N is the number of dice, I am inclined to say that increasing N would be better, at least for large N and M. That way, you are increasing the “exponent” in the number of outcomes, as opposed to “base”.

Also, CLT works for N going to infinity, while for M going to infinity and N=2, the pdf of the sum would be triangular shaped (due to the convolution of two uniform pdfs) rather than Gaussian-like.
Justin Patera

29 April 2013 at 21:53

…it’s things like this that show me I *NEED* more education!

I’ve written a (tiny & very experimental) library for doing some dice exploration & such, which lets you find the mean, variance, standard dev, and the raw probability of rolling a given number on nDx. It can be found at https://github.com/serialhex/dice – though unfortunately (for most people) it’s in Haskell. Anyway, I haven’t actually implemented a CDF for my dice exploder, but it shouldn’t be too hard. If you want to take a look at the code &| steal some ideas from it feel free (but beware: it’s ugly!!)

p.s. reading your blog helps push my boundaries… thank you!

Comments are closed.