Predicting when an RNG will output a given value

A few days ago I wrote about how to pick the seed of a simple random number generator so that a desired output came n values later. The number n was fixed and we varied the seed. In this post, the seed will be fixed and we’ll solve for n. In other words, we ask when a pseudorandom sequence will produce a given value.

In principle you could just run the RNG until you get the output you’re looking for, but we’ll assume such a brute force approach is not feasible or at least not fast enough.

If a LCG (linear congruential generator) has seed z, multiplier a, and modulus m, then the nth output is an z reduced mod m. So our task is to solve

x = an z mod m

for n. If we forget for a moment that we’re working with integers mod m, we see that the solution is

n = loga (x / z)

We can actually make this work if we interpret division by z to mean multiplication by the inverse of z mod m and if we interpret the logarithm to be a discrete logarithm. For more on discrete logarithms and one algorithm for computing them, see this post.

In an earlier post I used  a = 742938285 and m = 231 – 1 = 2147483647. We set n = 100 and solved for z to make the 100th output equal to 20170816, the date of that post. It turned out that z = 1898888478.

Now let’s set the seed z = 1898888478 and ask when the LCG sequence will take on the value x = 20170816. Of course we know that n will turn out to be 100, but let’s pretend we don’t know that. We’ll write a little Python script to find n.

I expect there’s a simple way to compute modular inverses using SymPy, but I haven’t found it, so I used some code from StackOverflow.

The following code produces n = 100, as expected.

from sympy.ntheory import discrete_log

def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modinv(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m

a = 742938285
z = 1898888478
m = 2**31 - 1
x = 20170816
zinv = modinv(z, m)

n = discrete_log(m, x*zinv, a)
print(n)

Reverse engineering the seed of a linear congruential generator

The previous post gave an example of manipulating the seed of a random number generator to produce a desired result. This post will do something similar for a different generator.

A couple times I’ve used the following LCG (linear congruential random number generator) in examples. An LCG starts with an initial value of z and updates z at each step by multiplying by a constant a and taking the remainder by m. The particular LCG I’ve used has a = 742938285 and m = 231 – 1 = 2147483647.

(I used these parameters because they have maximum range, i.e. every positive integer less than m appears eventually, and because it was one of the LCGs recommended in an article years ago. That is, it’s very good as far as 32-bit LCGs go, which isn’t very far. An earlier post shows how it quickly fails the PractRand test suite.)

Let’s pick the seed so that the 100th output of the generator is today’s date in ISO form: 20170816.

We need to solve

a100z = 20170816 mod 2147483647.

By reducing  a100 mod 2147483647 we  find we need to solve

160159497 z = 20170816 mod 2147483647

and the solution is z = 1898888478. (See How to solve linear congruences.)

The following Python code verifies that the solution works.

    a = 742938285
    z = 1898888478
    m = 2**31 - 1

    for _ in range(100):
        z = a*z % m
    print(z)

Update: In this post, I kept n = 100 fixed and solved for the seed to give a specified output n steps later. In a follow up post I do the opposite, fixing the seed and solving for n.

Testing RNGs with PractRand

PractRand is a random number generator test suite, somewhat like the DIEHARDER and NIST tests I’ve written about before, but more demanding.

Rather than running to completion, it runs until it a test fails with an infinitesimally small p-value. It runs all tests at a given sample size, then doubles the sample and runs the tests again.

32-bit generators

LCG

A while back I wrote about looking for an RNG that would fail the NIST test suite and being surprised that a simple LCG (linear congruential generator) did fairly well. PractRand, however, dismisses this generator with extreme prejudice:

RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x4a992b2c
test set = normal, folding = standard (32 bit)

rng=RNG_stdin32, seed=0x4a992b2c
length= 64 megabytes (2^26 bytes), time= 2.9 seconds
Test Name Raw Processed Evaluation
BCFN(2+0,13-3,T) R=+115128 p = 0 FAIL !!!!!!!!
BCFN(2+1,13-3,T) R=+105892 p = 0 FAIL !!!!!!!!
...
[Low1/32]FPF-14+6/16:(8,14-9) R= +25.8 p = 1.5e-16 FAIL
[Low1/32]FPF-14+6/16:(9,14-10) R= +15.5 p = 8.2e-9 very suspicious
[Low1/32]FPF-14+6/16:(10,14-11) R= +12.9 p = 1.8e-6 unusual
[Low1/32]FPF-14+6/16:all R=+380.4 p = 8.2e-337 FAIL !!!!!!!
[Low1/32]FPF-14+6/16:all2 R=+33954 p = 0 FAIL !!!!!!!!
[Low1/32]FPF-14+6/16:cross R= +33.6 p = 6.4e-25 FAIL !!
...and 9 test result(s) without anomalies

I don’t recall the last time I saw a p-value of exactly zero. Presumably the p-value was so small that it underflowed to zero.

MWC

Another 32 bit generator, George Marsaglia’s MWC generator, also fails, but not as spectacularly as LCG.

RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x187edf12
test set = normal, folding = standard (32 bit)

rng=RNG_stdin32, seed=0x187edf12
length= 64 megabytes (2^26 bytes), time= 2.9 seconds
Test Name Raw Processed Evaluation
DC6-9x1Bytes-1 R= +6.3 p = 2.2e-3 unusual
Gap-16:A R= +33.3 p = 1.6e-28 FAIL !!!
Gap-16:B R= +13.6 p = 7.9e-12 FAIL
...and 105 test result(s) without anomalies

64-bit generators

Next let’s see how some well-regarded 64-bit random number generators do. We’ll look at xorshift128+ and xoroshir0128+ by Sebastiano Vigna and David Blackman, the famous Mersenne Twister, and PCG by Melissa O’Neill.

The numbers generated by xhoroshir0128+ and xorshift128+ are not random in the least significant bit and so the PractRand tests end quickly. The authors claim that xoroshiro128+ “passes the PractRand test suite up to (and included) 16TB, with the exception of binary rank tests.” I’ve only run PractRand with its default settings; I have not tried getting it to keep running the rest of the tests.

A lack of randomness in the least significant bits is inconsequential if you’re converting the outputs to floating point numbers, say as part of the process of generating Gaussian random values. For other uses, such as reducing the outputs modulo a small number, a lack of randomness in the least significant bits would be disastrous. (Here numerical analysis and number theory have opposite ideas regarding which parts of a number are most “significant.”)

At the time of writing, both Mersenne Twister and PCG have gone through 256 GB of generated values and are still running. I intend to add more results tomorrow.

Update: Mersenne Twister failed a couple of tests with 512 GB of input. I stopped the tests after PCG passed 16 TB.

xoroshiro128+

RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xe15bf63c
test set = normal, folding = standard (64 bit)

rng=RNG_stdin64, seed=0xe15bf63c
length= 128 megabytes (2^27 bytes), time= 2.8 seconds
Test Name Raw Processed Evaluation
[Low1/64]BRank(12):256(2) R= +3748 p~= 3e-1129 FAIL !!!!!!!!
[Low1/64]BRank(12):384(1) R= +5405 p~= 3e-1628 FAIL !!!!!!!!
...and 146 test result(s) without anomalies

xorshift128+

RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0x8c7c5173
test set = normal, folding = standard (64 bit)

rng=RNG_stdin64, seed=0x8c7c5173
length= 128 megabytes (2^27 bytes), time= 2.8 seconds
Test Name Raw Processed Evaluation
[Low1/64]BRank(12):256(2) R= +3748 p~= 3e-1129 FAIL !!!!!!!!
[Low1/64]BRank(12):384(1) R= +5405 p~= 3e-1628 FAIL !!!!!!!!
...and 146 test result(s) without anomalies

Mersenne Twister

RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0x300dab94
test set = normal, folding = standard (64 bit)

rng=RNG_stdin64, seed=0x300dab94
length= 256 megabytes (2^28 bytes), time= 3.7 seconds
no anomalies in 159 test result(s)

rng=RNG_stdin64, seed=0x300dab94
length= 512 megabytes (2^29 bytes), time= 7.9 seconds
Test Name Raw Processed Evaluation
BCFN(2+2,13-2,T) R= -7.0 p =1-1.2e-3 unusual
[Low1/64]BCFN(2+2,13-6,T) R= -5.7 p =1-1.0e-3 unusual
...and 167 test result(s) without anomalies

...

rng=RNG_stdin64, seed=0x300dab94
length= 256 gigabytes (2^38 bytes), time= 3857 seconds
  no anomalies in 265 test result(s)

rng=RNG_stdin64, seed=0x300dab94
length= 512 gigabytes (2^39 bytes), time= 8142 seconds
 Test Name Raw Processed Evaluation
 BRank(12):24K(1) R=+99759 p~= 0 FAIL !!!!!!!!
 [Low16/64]BRank(12):16K(1) R= +1165 p~= 1.3e-351 FAIL !!!!!!!
 ...and 274 test result(s) without anomalies

PCG

RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0x82f88431
test set = normal, folding = standard (64 bit)

rng=RNG_stdin64, seed=0x82f88431
length= 128 megabytes (2^27 bytes), time= 2.0 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/64]DC6-9x1Bytes-1           R=  +6.6  p =  1.6e-3   unusual
  ...and 147 test result(s) without anomalies

rng=RNG_stdin64, seed=0x82f88431
length= 256 megabytes (2^28 bytes), time= 4.7 seconds
  no anomalies in 159 test result(s)

rng=RNG_stdin64, seed=0x82f88431
length= 512 megabytes (2^29 bytes), time= 9.5 seconds
  no anomalies in 169 test result(s)

...

rng=RNG_stdin64, seed=0x82f88431
length= 16 terabytes (2^44 bytes), time= 254943 seconds
  no anomalies in 329 test result(s)

Random walk on quaternions

The previous post was a riff on a tweet asking what you’d get if you extracted all the i‘s, j‘s, and k‘s from Finnegans Wake and multiplied them as quaternions. This post is a probabilistic variation on the previous one.

If you randomly select a piece of English prose, extract the i‘s, j‘s, and k‘s, and multiply them together as quaternions, what are you likely to get?

The probability that a letter in this sequence is an i is about 91.5%. There’s a 6.5% chance it’s a k, and a 2% chance it’s a j. (Derived from here.) We’ll assume the probabilities of each letter appearing next are independent.

You could think of the process multiplying all the i‘s, j‘s, and k‘s together as a random walk on the unit quaternions, an example of a Markov chain. Start at 1. At each step, multiply your current state by i with probability 0.915, by j with probability 0.02, and by k with probability 0.065.

After the first step, you’re most likely at i. You could be at j or k, but nowhere else. After the second step, you’re most likely at -1, though you could be anywhere except at 1. For the first several steps you’re much more likely to be at some places than others. But after 50 steps, you’re about equally likely to be at any of the eight possible values.

Wolfram Alpha, Finnegans Wake, and Quaternions

James Joyce

I stumbled on a Twitter account yesterday called Wolfram|Alpha Can’t. It posts bizarre queries that Wolfram Alpha can’t answer. Here’s one that caught my eye.

Suppose you did extract all the i‘s, j‘s, and k‘s from James Joyce’s novel Finnegans Wake. How would you answer the question above?

You could initialize an accumulator to 1 and then march through the list, updating the accumulator by multiplying it by the next element. But is is there a more efficient way?

Quaternion multiplication is not commutative, i.e. the order in which you multiply things matters. So it would not be enough to have a count of how many times each letter appears. Is there any sort of useful summary of the data short of carrying out the whole multiplication? In other words, could you scan the list while doing something other than quaternion multiplication, something faster to compute? Something analogous to sufficient statistics.

We’re carrying out multiplications in the group Q of unit quaternions, a group with eight elements: ±1, ±i, ±j, ±k. But the input to the question about Finnegans Wake only involves three of these elements. Could that be exploited for some slight efficiency?

How would you best implement quaternion multiplication? Of course the answer depends on your environment and what you mean by “best.”

Note that we don’t actually need to implement quaternion multiplication in general, though that would be sufficient. All we really need is multiplication in the group Q.

You could implement multiplication by a table lookup. You could use an 8 × 3 table; the left side of our multiplication could be anything in Q, but the right side can only be ij, or k. You could represent quaternions as a list of four numbers—coefficients of 1, ij, and k—and write rules for multiplying these. You could also represent quaternions as real 4 × 4 matrices or as complex 2 × 2 matrices.

If you have an interesting solution, please share it in a comment below. It could be interesting by any one of several criteria: fast, short, cryptic, amusing, etc.

Related posts:

The cross polytope

There are five regular solids in three dimensions:

  • tetrahedron
  • octahedron (pictured above)
  • hexahedron (cube)
  • dodecahedron
  • icosahedron.

I give a proof here that these are the only five.

The first three of these regular solids generalize to all dimensions, and these generalizations are the only regular solids in dimensions 5 and higher. (There are six regular solids in dimension 4.)

I’ve mentioned generalizations of the cube, the hypercube, lately. I suppose you could call the generalization of a octahedron a “hyperoctahedron” by analogy with the hypercube, though I’ve never heard anybody use that term. Instead, the most common name is cross polytope.

This post will focus on the cross polytope. In particular, we’re going to look at the relative volume of a ball inside a cross polytope.

The cross polytope in n dimensions is the convex hull of all n-dimensional vectors that are ±1 in one coordinate and 0 in all the rest. It is the “plus or minus” part that gives the cross polyhedron its name, i.e. the vertices are in pairs across the origin.

In analysis, the cross polytope is the unit ball in ℓ1 (“little ell one”), the set of points (x1, x1, …, xn) such that

|x1| + |x2| + … + |xn| = 1.

The ℓ1 norm, and hence the ℓ1 ball, comes up frequently in compressed sensing and in sparse regression.

In recent blog posts we’ve looked at how the relative volume in a ball inscribed in a hypercube drops quickly as dimension increases. What about the cross polytope? The relative volume of a ball inscribed in a cross polytope decreases rapidly with dimension as well. But does it decreases faster or slower than the relative volume of a ball inscribed in a hypercube? To answer this, we need to compute

\left.\frac{\mbox{vol ball in cross poly}}{\mbox{vol cross poly}}\middle/\frac{\mbox{vol ballin hypercube}}{\mbox{vol hypercube}}\right.

Let’s gather what we need to evaluate this. We need the volume of a ball of radius r in n dimensions, and as mentioned before this is

V = \frac{\pi^{\frac{n}{2}} r^n}{\Gamma\left(\frac{n}{2} + 1\right)}

A ball sitting inside an n-dimensional unit cross polytope will have radius 1/√n. This is because if n positive numbers sum to 1, the sum of their squares is minimized by making them all equal, and the point (1/n, 1/n, …, 1/n) has norm 1/√ n. A ball inside a unit hypercube will have radius 1/2.

The cross polytope has volume 2n / n! and the hypercube has volume 1.

Putting this all together, the relative volume of a ball in a cross polytope divided by the relative volume of a ball inside a hypercube is

\left. \frac{ \frac{\pi^{n/2}}{\Gamma\left(\frac{n}{2} + 1\right)} \left(\frac{1}{\sqrt{n}}\right)^n } { \frac{2^n}{n!} } \middle/ \frac{ \frac{\pi^{n/2}}{\Gamma\left(\frac{n}{2} + 1\right)} \left(\frac{1}{2}\right)^n } { 1 } \right.

which fortunately reduces to just

\frac{n!}{n^n}

But how do we compare n! and nn/2? That’s a job for Stirling’s approximation. It tells us that for large n, the ratio is approximately

\sqrt{2\pi n}\, n^{n/2}e^{-n}

and so the ratio diverges for large n, i.e. the ball in the cross polytope takes up increasingly more relative volume.

Looking back at just the relative volume of the ball inside the cross polytope, and applying Stirling’s approximation again, we see that the relative volume of the ball inside the cross polytope is approximately

\sqrt{2}\left( \frac{\pi}{2e} \right )^{n/2}

and so the relative volume decreases geometrically as n increases, decreasing much slower than the relative volume of a ball in a hypercube.

Sphere packing

The previous couple blog posts touched on a special case of sphere packing.

We looked at the proportion of volume contained near the corners of a hypercube. If you take the set of points within a distance 1/2 of a corner of a hypercube, you could rearrange these points to form a full ball centered one corner of the hypercube. Saying that not much volume is located near the corners is equivalent to saying that the sphere packing that centers spheres at points with integer coordinates is not very dense.

We also looked at centering balls inside hypercubes. This is the same sphere packing as above, just shifting every coordinate by 1/2. So saying that a ball in a box doesn’t take up much volume in high dimensions is another way of saying that the integer lattice sphere packing is not very dense.

How much better can we pack spheres? In 24 dimensions, balls centered inside hypercubes would have density equal to the volume of a ball of radius 1/2, or (π/2)12 / 12!. The most dense packing in 24 dimensions, the Leech lattice sphere packing, has a density of π12 / 12!, i.e. it is 212 = 4096 times more efficient.

The densest sphere packings have only been proven in dimensions 1, 2, 3, 8, and 24. (The densest regular (lattice) packings are known for dimensions up to 8, but it is conceivable that there exist irregular packings that are more efficient than the most efficient lattice packing.) Dimension 24 is special in numerous ways, and it appears that 24 is a local maximum as far as optimal sphere packing density. How does sphere packing based on a integer lattice compare to the best packing in other high dimensions?

Although optimal packings are not known in high dimensions, upper and lower bounds on packing density are known. If Δ is the optimal sphere packing density in dimension n, then we have the following upper and lower bounds for large n:

-1 \leq \frac{1}{n} \log_2 \Delta \leq -0.599

The following plot shows how the integer lattice packing density (solid line) compares to the upper and lower bounds (dashed lines).

The upper and lower bounds come from Sphere Packings, Lattices, and Groups, published in 1998. Perhaps tighter bounds have been found since then.

Is most volume in the corners or not?

I’ve written a couple blog posts that may seem to contradict each other. Given a high-dimensional cube, is most of the volume in the corners or not?

I recently wrote that the corners of a cube stick out more in high dimensions. You can quantify this by centering a ball at a corner and looking at how much of the ball comes from the cube and how much from surrounding space. That post showed that the proportion of volume near a corner goes down rapidly as dimension increases.

About a year ago I wrote a blog post about how formal methods let you explore corner cases. Along the way I said that most cases are corner cases, i.e. most of the volume is in the corners.

Both posts are correct, but they use the term “corner” differently. That is because there are two ideas of “corner” that are the same in low dimensions but diverge in higher dimensions.

Draw a circle and then draw a square just big enough to contain it. You could say that the area in the middle is the area inside the circle and the corners are everything else. Or you could say that the corners are the regions near a vertex of the square, and the middle is everything else. These two criteria aren’t that different. But in high dimensions they’re vastly different.

The post about pointy corners looked at the proportion of volume near the vertices of the cube. The post about formal methods looked at the proportion of volume not contained in a ball in the middle of the cube. As dimension increases, the former goes to zero and the latter goes to one.

In other words, in high dimensions most of the volume is neither near a vertex nor in a ball in the middle. This gives a hint at why sphere packing is interesting in high dimensions. The next post looks at how the sphere packings implicit in this post compare to the best possible packings.

Corners stick out more in high dimensions

High-dimensional geometry is full of surprises. For example, nearly all the area of a high-dimensional sphere is near the equator, and by symmetry it doesn’t matter which equator you take.

Here’s another surprise: corners stick out more in high dimensions. Hypercubes, for example, become pointier as dimension increases.

How might we quantify this? Think of a pyramid and a flag pole. If you imagine a ball centered at the top of a pyramid, a fair proportion of the volume of the ball contains part of the pyramid. But if you do the same for a flag pole, only a small proportion of the ball contains pole; nearly all the volume of the ball is air.

So one way to quantify how pointy a corner is would be to look at a neighborhood of the corner and measure how much of the neighborhood intersects the solid that the corner is part of. The less volume, the pointier the corner.

Consider a unit square. Put a disk of radius r at a corner, with r < 1. One quarter of that disk will be inside the square. So the proportion of the square near a particular corner is πr²/4, and the proportion of the square near any corner is πr².

Now do the analogous exercise for a unit cube. Look at a ball of radius r < 1 centered at a corner. One eighth of the volume of that ball contains part of the cube. The proportion of cube’s volume located within a distance r of a particular corner is πr³/6, and the proportion located within a distance r of any corner is 4πr³/3.

The corner of a cube sticks out a little more than the corner of a square. 79% of a square is within a distance 0.5 of a corner, while the proportion is 52% for a cube. In that sense, the corners of a cube stick out a little more than the corners of a square.

Now let’s look at a hypercube of dimension n. Let V be the volume of an n-dimensional ball of radius r < 1. The proportion of the hypercube’s volume located within a distance r of a particular corner is V / 2n and the proportion located with a distance r of any corner is simply V.

The equation for the volume V is

V = \frac{\pi^{\frac{n}{2}} r^n}{\Gamma\left(\frac{n}{2} + 1\right)}

If we fix r and let n vary, this function decreases rapidly as n increases.

Saying that corners stick out more in high dimensions is a corollary of the more widely known fact that a ball in a box takes up less and less volume as the dimension of the ball and the box increase.

Let’s set r = 1/2 and plot how the volume of a ball varies with dimension n.

Volume of a ball of radius 1/2 in dimension n

You could think of this as the volume of a ball sitting inside a unit hypercube, or more relevant to the topic of this post, the proportion of the volume of the hypercube located with a distance 1/2 of a corner.

Discrete example of concentration of measure

The previous post looked at a continuous example of concentration of measure. As you move away from a thin band around the equator, the remaining area in the rest of the sphere decreases as an exponential function of the dimension and the distance from the equator. This post will show a very similar result for discrete sequences.

Suppose you have a sequence X1, X2, …, Xn of n random variables that each take on the values {-1, 1} with equal probability. You could think of this as a random walk: you start at 0 and take a sequence of steps either to the left or the right.

Let Sn = X1 + X2 + … + Xn be the sum of the sequence. The expected value of Sn is 0 by symmetry, but it could be as large as n or as small as –n. We want to look at how large |Sn| is likely to be relative to the sequence length n.

Here’s the analytical bound:

P\left( \frac{|S_n|}{n} \geq r\right) \leq 2 \exp\left( - \frac{nr^2}{2}\right )

(If you’re reading this via email, you probably can’t see the equation. Here’s why and how to fix it.)

Here is a simulation in Python to illustrate the bound.

    from random import randint
    import numpy as np
    import matplotlib.pyplot as plt

    n = 400  # random walk length
    N = 10000 # number of repeated walks

    reps = np.empty(N)

    for i in range(N):
        random_walk = [2*randint(0, 1) - 1 for _ in range(n)]
        reps[i] = abs(sum(random_walk)) / n
    
    plt.hist(reps, bins=41)
    plt.xlabel("$|S_n|/n$")
    plt.show()

And here are the results.

Nearly all the area in a high-dimensional sphere is near the equator

Nearly all the area of a high-dimensional sphere is near the equator.  And by symmetry, it doesn’t matter which equator you take. Draw any great circle and nearly all of the area will be near that circle.  This is the canonical example of “concentration of measure.”

What exactly do we mean by “nearly all the area” and “near the equator”? You get to decide. Pick your standard of “nearly all the area,” say 99%, and your definition of “near the equator,” say within 5 degrees. Then it’s always possible to take the dimension high enough that your standards are met. The more demanding your standard, the higher the dimension will need to be, but it’s always possible to pick the dimension high enough.

This result is hard to imagine. Maybe a simulation will help make it more believable.

In the simulation below, we take as our “north pole” the point (1, 0, 0, 0, …, 0). We could pick any unit vector, but this choice is convenient. Our equator is the set of points orthogonal to the pole, i.e. that have first coordinate equal to zero. We draw points randomly from the sphere, compute their latitude (i.e. angle from the equator), and make a histogram of the results.

The area of our planet isn’t particularly concentrated near the equator.

But as we increase the dimension, we see more and more of the simulation points are near the equator.

Here’s the code that produced the graphs.

from scipy.stats import norm
from math import sqrt, pi, acos, degrees
import matplotlib.pyplot as plt

def pt_on_sphere(n):
    # Return random point on unit sphere in R^n.
    # Generate n standard normals and normalize length.
    x = norm.rvs(0, 1, n)
    length = sqrt(sum(x**2))
    return x/length

def latitude(x):
    # Latitude relative to plane with first coordinate zero.
    angle_to_pole = acos(x[0]) # in radians
    latitude_from_equator = 0.5*pi - angle_to_pole
    return degrees( latitude_from_equator )

N = 1000 # number of samples

for n in [3, 30, 300, 3000]: # dimension of R^n
    
    latitudes = [latitude(pt_on_sphere(n)) for _ in range(N)]
    plt.hist(latitudes, bins=int(sqrt(N)))
    plt.xlabel("Latitude in degrees from equator")
    plt.title("Sphere in dimension {}".format(n))
    plt.xlim((-90, 90))
    plt.show()

Not only is most of the area near the equator, the amount of area outside a band around the equator decreases very rapidly as you move away from the band. You can see that from the histograms above. They look like a normal (Gaussian) distribution, and in fact we can make that more precise.

If A is a band around the equator containing at least half the area, then the proportion of the area a distance r or greater from A is bound by exp( -(n-1)r² ). And in fact, this holds for any set A containing at least half the area; it doesn’t have to be a band around the equator, just any set of large measure.

Related post: Willie Sutton and the multivariate normal distribution

DIEHARDER random number generator test results for PCG and MWC

A few days ago I wrote about testing the PCG random number generator using the DIEHARDER test suite. In this post I’ll go into a little more background on this random number generator test suite. I’ll also show that like M. E. O’Neill’s PCG (“permuted congruential generator”), George Marsaglia’s MWC (“multiply with carry”) generator does quite well.

This is not to say that MWC is the best generator for every purpose, but any shortcomings of MWC are not apparent from DIEHARDER. The PCG family of generators, for example, is apparently superior to MWC, but you couldn’t necessarily conclude that from these tests.

Unless your application demands more of a random number generator than these tests do, MWC is probably adequate for your application. I wouldn’t recommend it for cryptography or for high-dimensional integration by darts, but it would be fine for many common applications.

DIEHARDER test suite

George Marsaglia developed the DIEHARD battery of tests in 1995. Physics professor Robert G. Brown later refined and extended Marsaglia’s original test suite to create the DIEHARDER suite. (The name of Marsaglia’s battery of tests was a pun on the Diehard car battery. Brown continued the pun tradition by naming his suite after Die Harder, the sequel to the movie Die Hard.) The DIEHARDER suite is open source. It is designed to be at least as rigorous as the original DIEHARD suite and in some cases more rigorous.

There are 31 distinct kinds of tests in the DIEHARDER suite, but some of these are run multiple times. In total, 114 tests are run.

Marsaglia’s MWC

The main strength of Marsaglia’s MWC algorithm is that it is very short. The heart of the code is only three lines:

    m_z = 36969 * (m_z & 65535) + (m_z >> 16);
    m_w = 18000 * (m_w & 65535) + (m_w >> 16);
    return (m_z << 16) + m_w;

You can find a full implementation of a random number generator class based in MWC here.

The heart of PCG is also very short, but a bit more mysterious.

    rng->state = oldstate * 6364136223846793005ULL + (rng->inc | 1);
    uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
    uint32_t rot = oldstate >> 59u;
    return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));

(These are the 64-bit state versions of MWC and PCG. Both have versions based on larger state.)

Because these generators require little code, they’d be relatively easy to step into with a debugger, compared to other RNGs such as Mersenne Twister that require more code and more state.

Test results

Out of the 114 DIEHARDER tests run on MWC, all but three returned a pass, and the rest returned a weak pass.

A few weak passes are to be expected. The difference between pass, weak pass, and failure is whether a p-value falls below a certain threshold. Theory says that ideally p-values would uniformly distributed, and so one would fall outside the region for a strong pass now and then.

Rather than counting strong and weak passes, let’s look at the p-values themselves. We’d expect these to be uniformly distributed. Let’s see if they are.

Here are the p-values reported by the DIEHARDER tests for MWC:

Histogram of p-values for MWC

Here are the corresponding values for PCG:

Histogram of p-values for PCG

Neither test has too many small p-values, no more than we’d expect. This is normally what we’re concerned about. Too many small p-values would indicate that the generated samples are showing behavior that would be rare for truly random input.

But both sets of test results have a surprising number of large p-values. Not sure what to make of that. I suspect it says more about the DIEHARDER test suite than the random number generators being tested.

Update: I went back to look at some results from Mersenne Twister to see if this pattern of large p-values persists there. It does, and in fact the p-values are even more biased toward the high end for Mersenne Twister.

Histogram of Mersenne Twister p-values

One thing I noticed is that the large p-values are disproportionately coming from some of the same tests each time. In particular, the repetitions of thests_serial test have an unexpectedly high number of large p-values for each generator.

The chaos game and the Sierpinski triangle

The chaos game is played as follows. Pick a starting point at random. Then at each subsequent step, pick a triangle vertex at random and move half way from the current position to that vertex.

The result looks like a fractal called the Sierpinski triangle or Sierpinski gasket.

Here’s an example:

Unbiased chaos game results

If the random number generation is biased, the resulting triangle will show it. In the image below, the lower left corner was chosen with probability 1/2, the top with probability 1/3, and the right corner with probability 1/6.

Biased chaos game results

Update: Here’s an animated version that lets you watch the process in action.

animated gif

Here’s Python code to play the chaos game yourself.

from scipy import sqrt, zeros
import matplotlib.pyplot as plt
from random import random, randint

def midpoint(p, q):
    return (0.5*(p[0] + q[0]), 0.5*(p[1] + q[1]))

# Three corners of an equilateral triangle
corner = [(0, 0), (0.5, sqrt(3)/2), (1, 0)]

N = 1000
x = zeros(N)
y = zeros(N)

x[0] = random()
y[0] = random()
for i in range(1, N):
    k = randint(0, 2) # random triangle vertex
    x[i], y[i] = midpoint( corner[k], (x[i-1], y[i-1]) )
    
plt.scatter(x, y)
plt.show()

 

Update 2: Peter Norvig posted some Python code with variations on the game presented here, generalizing a triangle to other shapes. If you try the analogous procedure with a square, you simply get a square filled with random dots.

However, you can get what you might expect, the square analog of the Sierpinski triangle, the product of a Cantor set with itself, if you make a couple modifications. First, pick a side at random, not a corner. Second, move 1/3 of the way toward the chosen side, not 1/2 way.

Here’s what I got with these changes:

Chaos game for a square

Source: Chaos and Fractals

Testing the PCG random number generator

M. E. O’Neill’s PCG family of random number generators looks very promising. It appears to have excellent statistical and cryptographic properties. And it takes remarkably little code to implement. (PCG stands for Permuted Congruential Generator.)

The journal article announcing PCG gives the results of testing it with the TestU01 test suite. I wanted to try it out by testing it with the DIEHARDER test suite (Robert G. Brown’s extension of George Marsaglia’s DIEHARD test suite) and the NIST Statistical Test Suite. I used what the generator’s website calls the “minimal C implementation.”

The preprint of the journal article is dated 2015 but apparently hasn’t been published yet.

Update: See the very informative note by the author of PCG in the comments below.

For the NIST test suite, I generated 10,000,000 bits and divided them into 10 streams.

For the DIEHARDER test suite, I generated 800,000,000 unsigned 32-bit integers. (DIEHARDER requires a lot of random numbers as input.)

For both test suites I used the seed (state) 20170707105851 and sequence constant (inc) 42.

The PCG generator did well on all the NIST tests. For every test, at least least 9 out of 10 streams passed. The test authors say you should expect at least 8 out of 10 streams to pass.

Here’s an excerpt from the results. You can find the full results here.

--------------------------------------------------------
 C1  C2  C3 ...  C10  P-VALUE  PROPORTION  STATISTICAL TEST
--------------------------------------------------------
  2   0   2        0  0.213309     10/10   Frequency
  0   0   1        3  0.534146     10/10   BlockFrequency
  3   0   0        0  0.350485     10/10   CumulativeSums
  1   1   0        2  0.350485     10/10   CumulativeSums
  0   2   2        1  0.911413     10/10   Runs
  0   0   1        1  0.534146     10/10   LongestRun
  0   1   2        0  0.739918     10/10   Rank
  0   4   0        0  0.122325     10/10   FFT
  1   0   0        1  0.000439     10/10   NonOverlappingTemplate
  ...              
  2   1   0        0  0.350485      9/10   NonOverlappingTemplate
  0   2   1        0  0.739918     10/10   OverlappingTemplate
  1   1   0        2  0.911413     10/10   Universal
  1   1   0        0  0.017912     10/10   ApproximateEntropy
  1   0   1        1     ----       3/4    RandomExcursions
  ...              
  0   0   0        1     ----       4/4    RandomExcursions
  2   0   0        0     ----       4/4    RandomExcursionsVariant
  ...              
  0   0   3        0     ----       4/4    RandomExcursionsVariant
  1   2   3        0  0.350485      9/10   Serial
  1   1   1        0  0.739918     10/10   Serial
  1   2   0        0  0.911413     10/10   LinearComplexity

...

The DIEHARDER suite has 31 kinds tests, some of which are run many times, making a total of 114 tests. Out of the 114 tests, two returned a weak pass for the PCG input and all the rest passed. A few weak passes are to be expected from running so many tests and so this isn’t a strike against the generator. In fact, it might be suspicious if no tests returned a weak pass.

Here’s an edited version of the results. The full results are here.

#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.46682782|  PASSED
      diehard_operm5|   0|   1000000|     100|0.83602120|  PASSED
  diehard_rank_32x32|   0|     40000|     100|0.11092547|  PASSED
    diehard_rank_6x8|   0|    100000|     100|0.78938803|  PASSED
   diehard_bitstream|   0|   2097152|     100|0.81624396|  PASSED
        diehard_opso|   0|   2097152|     100|0.95589325|  PASSED
        diehard_oqso|   0|   2097152|     100|0.86171368|  PASSED
         diehard_dna|   0|   2097152|     100|0.24812341|  PASSED
diehard_count_1s_str|   0|    256000|     100|0.75417270|  PASSED
diehard_count_1s_byt|   0|    256000|     100|0.25725000|  PASSED
 diehard_parking_lot|   0|     12000|     100|0.59288414|  PASSED
    diehard_2dsphere|   2|      8000|     100|0.79652706|  PASSED
    diehard_3dsphere|   3|      4000|     100|0.14978100|  PASSED
     diehard_squeeze|   0|    100000|     100|0.35356584|  PASSED
        diehard_sums|   0|       100|     100|0.04522121|  PASSED
        diehard_runs|   0|    100000|     100|0.39739835|  PASSED
        diehard_runs|   0|    100000|     100|0.99128296|  PASSED
       diehard_craps|   0|    200000|     100|0.64934221|  PASSED
       diehard_craps|   0|    200000|     100|0.27352733|  PASSED
 marsaglia_tsang_gcd|   0|  10000000|     100|0.10570816|  PASSED
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00267789|   WEAK
         sts_monobit|   1|    100000|     100|0.98166534|  PASSED
            sts_runs|   2|    100000|     100|0.05017630|  PASSED
          sts_serial|   1|    100000|     100|0.95153782|  PASSED
...
          sts_serial|  16|    100000|     100|0.59342390|  PASSED
         rgb_bitdist|   1|    100000|     100|0.50763759|  PASSED
...
         rgb_bitdist|  12|    100000|     100|0.98576422|  PASSED
rgb_minimum_distance|   2|     10000|    1000|0.23378443|  PASSED
...
rgb_minimum_distance|   5|     10000|    1000|0.13215367|  PASSED
    rgb_permutations|   2|    100000|     100|0.54142546|  PASSED
...
    rgb_permutations|   5|    100000|     100|0.96040216|  PASSED
      rgb_lagged_sum|   0|   1000000|     100|0.66587166|  PASSED
...
      rgb_lagged_sum|  31|   1000000|     100|0.00183752|   WEAK
      rgb_lagged_sum|  32|   1000000|     100|0.13582393|  PASSED
     rgb_kstest_test|   0|     10000|    1000|0.74708548|  PASSED
     dab_bytedistrib|   0|  51200000|       1|0.30789191|  PASSED
             dab_dct| 256|     50000|       1|0.89665788|  PASSED
        dab_filltree|  32|  15000000|       1|0.67278231|  PASSED
        dab_filltree|  32|  15000000|       1|0.35348003|  PASSED
       dab_filltree2|   0|   5000000|       1|0.18749029|  PASSED
       dab_filltree2|   1|   5000000|       1|0.92600020|  PASSED

Simple random number generator does surprisingly well

I was running the NIST statistical test suite recently. I wanted an example of a random number generator where the tests failed, and so I used a simple generator, a linear congruence generator. But to my surprise, the generator passed nearly all the tests, even though some more sophisticated generators failed some of the same tests.

This post will implement a couple of the simplest tests in Python and show that the generator does surprisingly well.

The linear congruential generator used here starts with an arbitrary seed, then at each step produces a new number by multiplying the previous number by a constant and taking the remainder by 231 – 1. The multiplier constant was chosen to be one of the multipliers recommended in [1].

We’ll need a couple math functions:

from math import sqrt, log

and we need to define the constants for our generator.

# Linear congruence generator (LCG) constants
z = 20170705   # seed
a = 742938285  # multiplier
e = 31         # will need this later
m = 2**e -1    # modulus

Next we form a long string of 0’s and 1’s using our generator

# Number of random numbers to generate
N = 100000     

# Format to print bits, padding with 0's on the left if needed
formatstr = "0" + str(e) + "b"

bit_string = ""
for _ in range(N):
    z = a*z % m # LCG
    bit_string += format(z, formatstr)

Next we run a couple tests. First, we count the number of 1’s in our string of bits. We expect about half the bits to be 1’s. We can quantify “about” as within two standard deviations.

def count_ones(string):
    ones = 0
    for i in range(len(string)):
        if string[i] == '1':
            ones += 1
    return ones

ones = count_ones(bit_string)
expected = e*N/2
sd = sqrt(0.25*N)
print( "Number of 1's: {}".format(ones) )
print( "Expected: {} to {}".format(expected - 2*sd, expected + 2*sd) )

The results are nothing unusual:

Number of 1's: 1550199
Expected: 1549683.8 to 1550316.2

Next we look at the length of the longest runs on 1’s. I’ve written before about the probability of long runs and the code below uses a couple results from that post.

def runs(string):
    max_run = 0
    current_run = 0
    for i in range(len(string)):
        if string[i] == '1':
            current_run += 1
        else:
            max_run = max(max_run, current_run)
            current_run = 0
    return max_run

runlength = runs(bit_string)
expected = -log(0.5*e*N)/log(0.5)
sd = 1/log(2)
print( "Run length: {}".format(runlength) )
print( "Expected: {} to {}".format(expected - 2*sd, expected + 2*sd) )

Again the results are nothing unusual:

Run length: 19
Expected: 17.7 to 23.4

Simple random number generators are adequate for many uses. Some applications, such as high dimensional integration and cryptography, require more sophisticated generators, but sometimes its convenient and sufficient to use something simple. For example, code using the LCG generator above would be easier to debug than code using the Mersenne Twister. The entire state of the LCG is a single number, whereas the Mersenne Twister maintains an internal state of 312 numbers.

One obvious limitation of the LCG used here is that it couldn’t possibly produce more than  231 – 1 values before repeating itself. Since the state only depends on the last value, every time it comes to a given output, the next output will be whatever the next output was the previous time. In fact, [1] shows that it does produce 231 – 1 values before cycling. If the multiplier were not chosen carefully it could have a shorter period.

So our LCG has a period of about two billion values. That’s a lot if you’re writing a little game, for example. But it’s not enough for many scientific applications.

* * *

[1] George S. Fishman and Louis R. Moore III, An exhaustive analysis of multiplicative congruential random number generators with modulus 231 – 1, SIAM Journal of Scientific and Statistical Computing, Vol. 7, no. 1, January 1986.