I was running the NIST statistical test suite recently. I wanted an example of a random number generator where the tests failed, and so I used a simple generator, a linear congruence generator. But to my surprise, the generator passed nearly all the tests, even though some more sophisticated generators failed some of the same tests. This says more about the NIST test suite than it says about the LCG being tested.

This post will implement a couple of the simplest tests in Python and show that the generator does well.

The linear congruential generator used here starts with an arbitrary seed, then at each step produces a new number by multiplying the previous number by a constant and taking the remainder by 2^{31} – 1. The multiplier constant was chosen to be one of the multipliers recommended in [1].

We’ll need a couple math functions:

from math import sqrt, log

and we need to define the constants for our generator.

# Linear congruence generator (LCG) constants z = 20170705 # seed a = 742938285 # multiplier e = 31 # will need this later m = 2**e -1 # modulus

Next we form a long string of 0’s and 1’s using our generator

# Number of random numbers to generate N = 100000 # Format to print bits, padding with 0's on the left if needed formatstr = "0" + str(e) + "b" bit_string = "" for _ in range(N): z = a*z % m # LCG bit_string += format(z, formatstr)

Next we run a couple tests. First, we count the number of 1’s in our string of bits. We expect about half the bits to be 1’s. We can quantify “about” as within two standard deviations.

def count_ones(string): ones = 0 for i in range(len(string)): if string[i] == '1': ones += 1 return ones ones = count_ones(bit_string) expected = e*N/2 sd = sqrt(0.25*N) print( "Number of 1's: {}".format(ones) ) print( "Expected: {} to {}".format(expected - 2*sd, expected + 2*sd) )

The results are nothing unusual:

Number of 1's: 1550199 Expected: 1549683.8 to 1550316.2

Next we look at the length of the longest runs on 1’s. I’ve written before about the probability of long runs and the code below uses a couple results from that post.

def runs(string): max_run = 0 current_run = 0 for i in range(len(string)): if string[i] == '1': current_run += 1 else: current_run = 0 max_run = max(max_run, current_run) return max_run runlength = runs(bit_string) expected = -log(0.5*e*N)/log(0.5) sd = 1/log(2) print( "Run length: {}".format(runlength) ) print( "Expected: {} to {}".format(expected - 2*sd, expected + 2*sd) )

Again the results are nothing unusual:

Run length: 19 Expected: 17.7 to 23.4

Simple random number generators are adequate for many uses. Some applications, such as high dimensional integration and cryptography, require more sophisticated generators, but sometimes its convenient and sufficient to use something simple. For example, code using the LCG generator above would be easier to debug than code using the Mersenne Twister. The entire state of the LCG is a single number, whereas the Mersenne Twister maintains an internal state of 312 numbers.

One obvious limitation of the LCG used here is that it couldn’t possibly produce more than 2^{31} – 1 values before repeating itself. Since the state only depends on the last value, every time it comes to a given output, the next output will be whatever the next output was the previous time. In fact, [1] shows that it does produce 2^{31} – 1 values before cycling. If the multiplier were not chosen carefully it could have a shorter period.

So our LCG has a period of about two billion values. That’s a lot if you’re writing a little game, for example. But it’s not enough for many scientific applications.

* * *

[1] George S. Fishman and Louis R. Moore III, An exhaustive analysis of multiplicative congruential random number generators with modulus 2^{31} – 1, SIAM Journal of Scientific and Statistical Computing, Vol. 7, no. 1, January 1986.

How do you compute the standard deviation? Standard deviation of what, is what I’m really wondering?

A binomial random variable counts the number of successes out of n trials where the probability of success on each trial is p. The standard deviation of such a random variable is sqrt(np(1-p)).

In the post above, p = 0.5 and n = Ne.

I wonder how this random generator (x_{n-1},x_{n}) or (x_{n-2},x_{n-1},x_{n}) attractor would look like…

In 2001 and 2002 Michal Zalewski (lcamtuf) used the chaos theory technique described above to examine (non)-randomness of TCP/IP sequence numbers:

http://lcamtuf.coredump.cx/newtcp/

Consider 16 bits [0..32767], and the recurrence relation X[n] = (A*X[n-1]+c) mod m. The sequence must repeat after landing on a number previously generated, eg the number you start with. However… what if that happens BEFORE exhausting the whole ‘bit space’? Is the space divided into 2 sets, each forming such a cycle, eg odds vs evens?? I investigated this question with one of my students (~1972) and we found [SPOILER ALERT!] that half of the numbers in the range formed one group, and 1/4 of the remaining formed another group, 1/8 formed another group, … and one number transformed back to itself, i.e cycle length = 1! This subject is great for programming and/or math students to investigate. We couldn’t use an array to track what was going on (computer had only 8K or 16 words total!), so I think we represented each number as a bit in a bit string.