Collatz conjecture skepticism

The Collatz conjecture asks whether the following procedure always terminates at 1. Take any positive integer n. If it’s odd, multiply it by 3 and add 1. Otherwise, divide it by 2. For obvious reasons the Collatz conjecture is also known as the 3n + 1 conjecture. It has been computationally verified that the Collatz conjecture is true for n less than 260.

Alex Kontorovich posted a long thread on Twitter last week explaining why he thinks the Collatz conjecture might be false. His post came a few days after Tao’s paper was posted giving new partial results toward proving the Collatz conjecture is true.

Tao cites earlier work by Kontorovich. The two authors don’t contradict each other. Both are interested in the Collatz problem and have produced very technical papers studying the problem from different angles. See Konotorovich’s papers from 2006 and 2009.

Konotorovich’s 2009 paper looks at both the 3n + 1 problem and the 5n + 1 problem, the latter simply replacing the “3” in the Collatz conjecture with a “5”. The 5n + 1 has cycles, such as {13, 33, 83, 208, 104, 52, 26}. It is conjectured that the sequence produced by the 5n + 1 problem diverges for almost all inputs.

Related posts

Detecting typos with the four color theorem

In my previous post on VIN numbers, I commented that if a check sum has to be one of 11 characters, it cannot detect all possible changes to a string from an alphabet of 33 characters. The number of possible check sum characters must be at least as large as the number of possible characters in the string.

Now suppose you wanted to create a check sum for text typed on a computer keyboard. You want to detect any change where a single key was wrongly typed by using an adjacent key.

You don’t need many characters for the check sum because you’re not trying to detect arbitrary changes, such as typing H for A on a QWERTY keyboard. You’re only trying to detect, for example, if someone typed Q, W, S, or Z for A. In fact you would only need one of five characters for the check sum.

Here’s how to construct the check sum. Think of the keys of the keyboard as a map, say by drawing boundaries through the spaces between the keys. By the four color theorem, you can assign the numbers 0, 1, 2, and 3 to each key so that no two adjacent keys have the same number. Concatenate all these digits and interpret it as a base 4 number. Then take the remainder when the number is divided by 5. That’s your check sum. As proved here, this will detect any typo that hits an adjacent key. It will also detect transpositions of adjacent keys.

Note that this doesn’t assume anything about the particular keyboard. You could have any number of keys, and the keys could have any shape. You could even define “adjacent” in some non-geometrical way as long as your adjacency graph is planar.

Vehicle Identification Number (VIN) check sum

VIN number on an old car

A VIN (vehicle identification number) is a string of 17 characters that uniquely identifies a car or motorcycle. These numbers are used around the world and have three standardized formats: one for North America, one for the EU, and one for the rest of the world.

Letters that resemble digits

The characters used in a VIN are digits and capital letters. The letters I, O, and Q are not used to avoid confusion with the numerals 0, 1, and 9. So if you’re not sure whether a character is a digit or a letter, it’s probably a digit.

It would have been better to exclude S than Q. A lower case q looks sorta like a 9, but VINs use capital letters, and an S looks like a 5.

Check sum

The various parts of a VIN have particular meanings, as documented in the Wikipedia article on VINs. I want to focus on just the check sum, a character whose purpose is to help detect errors in the other characters.

Of the three standards for VINs, only the North American one requires a check sum. The check sum is in the middle of the VIN, the 9th character.

Algorithm

The scheme for computing the check sum is both complicated and weak. The end result is either a digit or an X. There are 33 possibilities for each character (10 digits + 23 letters) and 11 possibilities for a check sum, so the check sum cannot possibly detect all changes to even a single character.

The check sum is computed by first converting all letters to digits, computing a weighted sum of the 17 digits, and taking the remainder by 11. The weights for the 17 characters are

8, 7, 6, 5, 4, 3, 2, 10, 0, 9, 8 ,7 ,6, 5, 4, 3, 2

I don’t see any reason for these weights other than that adjacent weights are different, which is enough to detect transposition of consecutive digits (and characters might not be digits). Maybe the process was deliberately complicated in an attempt to provide a little security by obscurity.

Historical quirk

There’s an interesting historical quirk in how the letters are converted to digits: each letter is mapped to the last digit of its EBCDIC code.

EBCDIC?! Why not ASCII? Both EBCDIC and ASCII go back to 1963. VINs date back to 1954 in the US but were standardized in 1981. Presumably the check sum algorithm using EBCDIC digits became a de facto standard before ASCII was ubiquitous.

A better check sum

Any error detection scheme that uses 11 characters to detect changes in 33 characters is necessarily weak.

A much better approach would be to use a slight variation on the check sum algorithm Douglas Crockford recommended for base 32 encoding described here. Crockford says to take a string of characters from an alphabet of 32 characters, interpret it as a base 32 number, and take the remainder by 37 as the check sum. The same algorithm would work for an alphabet of 33 characters. All that matters is that the number of possible characters is less than 37.

Since the check sum is a number between 0 and 36 inclusive, you need 37 characters to represent it. Crockford recommended using the symbols *, ~, $, =, and U for extra symbols in his base 32 system. His system didn’t use U, and VIN numbers do. But we only need four more characters, so we could use *, ~, $, and =.

The drawback to this system is that it requires four new symbols. The advantage is that any change to a single character would be detected, as would any transposition of adjacent characters. This is proved here.

Related posts

Mickey Mouse, Batman, and conformal mapping

A conformal map between two regions in the plane preserves angles [1]. If two curves meet at a given angle in the domain, their images will meet at the same angle in the range. Two subsets of the plane are conformally equivalent if there is a conformal map between them.

The Riemann mapping theorem says that any simply connected proper open subset of the plane is conformally equivalent to the interior of the unit disk. Simply connected means that the set has no holes [2]. Proper means that the set is not the entire plane [3].

Two regions that are conformally equivalent to the disk are conformally equivalent to each other. That means if you have a region like a Mickey Mouse silhouette

Mickey Mouse

and another region like the Batman logo

Batman logo

then there’s a conformal map between them. If you specify a point in the domain and where it goes in the range, and you specify its derivative at that point, then the conformal map is unique.

Now this only applies to open sets, so the boundary of the figures is not included. The boundaries of the figures above clearly have different angles, so it wouldn’t be possible to extend the conformal map to the boundaries. But move in from the boundary by any positive distance and the map will be conformal.

Conformal maps are useful transformations in applications because they’re very nice functions to work with. In theory, you could write out a function that maps Mickey Mouse to Batman and one that goes the other way around. These functions would be difficult if not impossible to write down analytically, but there is software to find conformal maps numerically. For simple geometric shapes there are reference books that give exact conformal maps.

What about holes?

The Riemann mapping theorem applies to regions with no holes. What if we take two regions that do have a hole in them? To keep things simple, we will look at two annular regions, that is, each is the region between two circles. For i = 1 and 2 let Ai be the region between two circles centered at the origin, with inner radius ri and outer radius Ri. Are these regions conformally equivalent?

Clearly they are if R1/r1 = R2/r2, i.e. if one annulus is a magnification of the other. In that case the conformal map is given by mapping z to (r2/r1) z.

Here’s a big surprise: this is the only way the two annuli can be conformally equivalent! That is, if R1/r1 does not equal R2/r2 then there is no conformal map between the two regions.

So suppose you have to rings, both with inner radius 1. One has outer radius 2 and the other has outer radius 3. Then these rings are not conformally equivalent. They look a lot more like each other than do Mickey Mouse and Batman, but the latter are conformally equivalent and the former are not.

Almost any two regions without holes are conformally equivalent, but it’s much harder for two regions with holes to be conformally equivalent.

Related posts

[1] Conformal maps also preserve orientation. So if you move clockwise from the first curve to the second in the domain, you move clockwise from the image of the first to the image of the second by the same angle.

A function of a complex variable is conformal if and only if it is holomorphic, i.e. analytic on the interior of its domain, with non-zero derivative. Some authors leave out the orientation preserving part of the definition; under their weaker definition a conformal map could also be the conjugate of a holomophic function.

[2] A rigorous way to say that a region has no holes is to say its fundamental group is trivial. Any loop inside the region is homotopic to a point, i.e. you can continuously shrink it to a point without leaving the region.

[3] Liouville’s theorem says an analytic function that is bounded on the entire plane must be constant. That’s why the Riemann mapping theorem says a region must be a proper subset of the plane, not the whole plane.

Stone-Weierstrass on a disk

A couple weeks ago I wrote about a sort of paradox, that Weierstrass’ approximation theorem could seem to contradict Morera’s theorem. Weierstrass says that the uniform limit of polynomials can be an arbitrary continuous function, and so may have sharp creases. But Morera’s theorem says that the uniform limit of polynomials is analytic and thus very smooth.

Both are true, but they involve limits in different spaces. Weierstrass’ theorem applies to convergence over a compact interval of the real line, and Morera’s theorem applies to convergence over compact sets in the complex plane. The uniform limit of polynomials is better behaved when it has to be uniform in two dimensions rather than just one.

This post is a sort of variation on the post mentioned above, again comparing convergence over the real line and convergence in the complex plane, this time looking at what happens when you conjugate variables [1].

There’s an abstract version of Weierstrass’ theorem due to Marshall Stone known as the Stone-Weierstrass theorem. It generalizes the real interval of Weierstrass’ theorem to any compact Hausdorff space [2]. The compact Hausdorff space we care about for this post is the unit disk in the complex plane.

There are two versions of the Stone-Weierstrass theorem, one for real-valued functions and one for complex-valued functions. I’ll state the theorems below for those who are interested, but my primary interest here is a corollary to the complex Stone-Weierstrass theorem. It says that any continuous complex-valued function on the unit disk can be approximated as closely as you like with polynomials in z and the conjugate of z with complex coefficients, i.e. by polynomials of the form

p(z, \bar{z})

When you throw in conjugates, things change a lot. The uniform limit of polynomials in z alone must be analytic, very well behaved. But the uniform limit of polynomials in z and z conjugate is merely continuous. It can have all kinds of sharp edges etc.

Conjugation opens up a lot of new possibilities, for better or for worse. As I wrote about here, an analytic function can only do two things to a tiny disk:  stretch it or rotate it. It cannot flip it over, as conjugation does.

By adding or subtracting a variable and its conjugate, you can separate out the real and imaginary parts. The the parts are no longer inextricably linked and this allows much more general functions. The magic of complex analysis, the set of theorems that seem too good to be true, depends on the real and imaginary parts being tightly coupled.

***

Now for those who are interested, the statement of the Stone-Weierstrass theorems.

Let X be a compact Hausdorff space. The set of real or complex valued functions on X forms an algebra. That is, it is closed under taking linear combinations and products of its elements.

The real Stone-Weierstrass theorem says a subalgebra of the continuous real-valued functions on X is dense if it contains a non-zero constant function and if it separates points. The latter condition means that for any two points, you can find functions in the subalgebra that take on different values on the two points.

If we take the interval [0, 1] as our Hausdorff space, the Stone-Weierstrass theorem says that the subalgebra of polynomials is dense.

The complex Stone-Weierstrass theorem says a subalgebra of he continuous complex-valued functions on X is dense if it contains a non-zero constant function, separates points, and is closed under taking conjugates. The statement above about polynomials in z and z conjugate follows immediately.

***

[1] For a complex variable z of the form a + bi where a and b are real numbers and i is the imaginary unit, the conjugate is abi.

[2] A Hausdorff space is a general topological space. All it requires is that for any two distinct points in the space, you can find disjoint open sets that each contain one of the points.

In many cases, a Hausdorff space is the most general setting where you can work without running into difficulties, especially if it is compact. You can often prove something under much stronger assumptions, then reuse the proof to prove the same result in a (compact) Hausdorff space.

See this diagram of 34 kinds of topological spaces, moving from strongest assumptions at the top to weakest at the bottom. Hausdorff is almost on bottom. The only thing weaker on that diagram is T1, also called a Fréchet space.

In a T1 space, you can separate points, but not simultaneously. That is, given two points p1 and p2, you can find an open set U1 around p1 that does not contain p2, and you can find an open set U2 around p2 that does not contain p1. But you cannot necessarily find these sets in such a way that U1 and U2 are disjoint. If you could always choose these open sets to be distinct, you’d have a Hausdorff space.

Here’s an example of a space that is T1 but not Hausdorff. Let X be an infinite set with the cofinite topology. That is, open sets are those sets whose complements are finite. The complement of p1 is an open set containing p2 and vice versa. But you cannot find disjoint open sets separating two points because there are no disjoint open sets. The intersection of any pair of open sets must contain an infinite number of points.

Distribution of zip code population

There are three schools of thought regarding power laws: the naive, the enthusiasts, and the skeptics. Of course there are more than three schools of thought, but there are three I want to talk about.

The naive haven’t heard of power laws or don’t know much about them. They probably tend to expect things to be more uniformly distributed than they really are.

The enthusiasts have read books about fractals, networks, power laws, etc. and see power laws everywhere.

The skeptics say the prevalence of power laws is overestimated.

Within the skeptics are the pedants. They’ll point out that nothing exactly follows a power law over the entirety of its range, which is true but unhelpful. Nothing follows any theoretical distribution exactly and everywhere. This is true of the normal distribution as much as it is true of power laws, but that doesn’t mean that normal distributions and power laws aren’t useful models.

Guessing the distribution

If you asked a power law enthusiast how zip code populations are distributed, their first guess would of course be a power law. They might suppose that 80% of the US population lives in 20% of the zip codes, following Pareto’s 80-20 rule.

The enthusiasts wouldn’t be too far off. They would do better than the uniformitarians who might expect that close to 20% of the population lives in 20% of the zip codes.

It turns out that about 70% of Americans live in the zip codes in the top 20th percentile of population. To include 80% of the population, you have to include the top 27% of zip codes.

But a power law implies more than just the 80-2o rule. It implies that something like the 80-20 rule holds on multiple scales, and that’s not true of zip codes.

Looking at plots

The signature of a power law is a straight line on a log-log plot. Power laws never hold exactly and everywhere, but a lot of things approximately follow a power law over a useful range, typically in the middle.

What do we get if we make a log-log plot of zip code population?

log-log plot of US zip code population by rank

This looks more like a squircle than a straight line.

If we zoom in on just part of the plot, say the largest 2,000 zip codes [1], we get something that has a flat spot, but plot bows outward, and continues to bow outward as we zoom in on different parts of it.

zooming in on zip code population distribution

Why is the distribution of zip code populations not a power law? One reason is that zip codes are artificial. They were designed to make it easier to distribute mail, and so there was a deliberate effort to make them somewhat uniform in population. The population of the largest zip code is only 12 times the average.

You’re more likely to see power laws in something organic like city populations. The largest city in the US is nearly 1,400 larger than the average city [2].

Another reason zip code populations do not follow a power law is that zip codes are somewhat of a geographical designation. I say “somewhat” because the relationship between zip codes and geography is complicated. See [1].

But because there’s at least some attempt to accommodate geography, and because the US population is very thin in large regions of the country, there are many sparsely populated zip codes, even when you roll them up from 5 digits to just 3 digits, and that’s why the curve drops very sharply at the end.

Related posts

[1] This post is based on ZCTAs (Zip Code Tabulation Areas) according to the 2010 census. ZCTAs are not exactly zip codes for reasons explained here.

[2] It depends on how you define “city,” but the average local jurisdiction population in the United States is around 6,200 and the population of New York City is around 8,600,000, which is 1,400 times larger.

Landau kernel

The previous post was about the trick Lebesgue used to construct a sequence of polynomials converging to |x| on the interval [-1, 1]. This was the main step in his proof of the Weierstrass approximation theorem.

Before that, I wrote a post on Bernstein’s proof that used his eponymous polynomials to prove Weierstrass’ theorem. This is my favorite proof because it’s an example of using results from probability to prove a statement that has nothing to do with randomness.

This morning I’ll present one more way to prove the approximation theorem, this one due to Landau.

The Landau kernel is defined as

K_n(u) = (1 - u^2)^n

Denote its integral by

k_n = \int_{-1}^1 K_n(u)\, du

Let f(x) be any continuous function on [-1, 1]. Then the convolution of the normalized Landau kernel with f gives a sequence of polynomial approximations that converge uniformly to f. By “normalized” I mean dividing the kernel by its integral so that it integrates to 1.

For each n,

\frac{1}{k_n}\int_{-1}^1 K_n(t-x)\, f(t)\, dt

is a polynomial in x of degree 2n, and as n goes to infinity this converges uniformly to f(x).

Connections

There are a few connections I’d like to mention. First, the normalized Landau kernel is essentially a beta distribution density, just scaled to live on [-1, 1] rather than [0, 1].

And as with Bernstein’s proof of the Weierstrass approximation theorem, you could use probability to prove Landau’s result. Namely, you could use the fact that two independent random variables X and Y, the PDF of their sum is the convolution of their PDFs.

The normalizing constants kn have a simple closed form in terms of double factorials:

\frac{k_n}{2} = \frac{(2n)!!}{(2n+1)!!}

I don’t know which Landau is responsible for the Landau kernel. I’ve written before about the Edmund Landau and his Big O notation, and I wrote about Lev Landau and his license plate game. Edmund was a mathematician, so it makes sense that he might be the one to come up with another proof of Weierstrass’ theorem. Lev was a physicist, and I could imagine he would be interested in the Landau kernel as an approximation to the delta function.

If you know which of these Landaus, or maybe another, is behind the Landau kernel, please let me know.

Update: Someone sent me this paper which implies Edmund Landau is the one we’re looking for.

Lebesgue’s proof of Weierstrass’ theorem

A couple weeks ago I wrote about the Weierstrass approximation theorem, the theorem that says every continuous function on a closed finite interval can be approximated as closely as you like by a polynomial.

The post mentioned above uses a proof by Bernstein. And in that post I used the absolute value function as an example. Not only is |x| an example, you could go the other way around and use it as a step in the proof. That is, there is a proof of the Weierstrass approximation theorem that starts by proving the special case of |x| then use that result to build a proof for the general case.

There have been many proofs of Weierstrass’ theorem, and recently I ran across a proof due to Lebesgue. Here I’ll show how Lebesgue constructed a sequence of polynomials approximating |x|. It’s like pulling a rabbit out of a hat.

The staring point is the binomial theorem. If x and y are real numbers with |x| > |y| and r is any real number, then

(x+y)^r & =\sum_{k=0}^\infty {r \choose k} x^{r-k} y^k.

Now apply the theorem substituting 1 for x and x² – 1 for y above and you get

|x| = (1 + (x^2 - 1))^{1/2} =\sum_{k=0}^\infty {1/2 \choose k} (x^2-1)^k

The partial sums of the right hand side are a sequence of polynomials converging to |x| on the interval [-1, 1].

***

If you’re puzzled by the binomial coefficient with a top number that isn’t a positive integer, see the general definition of binomial coefficients. The top number can even be complex, and indeed the binomial theorem holds for complex r.

You might also be puzzled by the binomial theorem being an infinite sum. Surely if r is a positive integer we should get the more familiar binomial theorem which is a finite sum. And indeed we do. The general definition of binomial coefficients insures that if r is a positive integer, all the binomial coefficients with k > r are zero.

Proving that a choice was made in good faith

How can you prove that a choice was made in good faith? For example, if your company selects a cohort of people for random drug testing, how can you convince those who were chosen that they weren’t chosen deliberately? Would a judge find your explanation persuasive? This is something I’ve helped companies with.

It may be impossible to prove that a choice was not deliberate, but you can show good faith by providing evidence that the choice was deliberate by a different criteria than the one in question.

I’ll give four examples, three positive and one negative.

Cliff RNG

My previous three blog posts looked at different aspects of the Cliff random number generator. The generator needs a seed between 0 and 1 to start. Suppose I chose 0.8121086949937715 as my seed. On the one hand, that’s a number with no apparent special features. But you might ask “Hey, why that number?” and you’d be right. I show in the first post in the series how that number was chosen to make the generator start off producing duplicate output.

In the next two posts in the series, I chose π – 3 as my seed. That’s a recognizable number and obviously a deliberate choice. But it has no apparent connection to the random number generator, and so it’s reasonable to assume that the seed wasn’t chosen to make the generator look good or bad.

SHA-2

The SHA-2 cryptographic hash function seventy two 32-bit numbers for initial state that needed to be “random” in some sense. But if the values were actually chosen at random, critics would suspect that the values were chosen to provide a back door. And maybe there is a clever way to pick the initial state that provides a non-obvious exploitable weakness.

The designers of SHA-2 chose the square roots of the first consecutive primes to fill one set of constants, and the cube roots of the first consecutive primes to fill another. See code here.

The initial state is definitely not random. Someone looking at the state would eventually discover where it came from. So while the choice was obviously deliberate, but apparently not designed by any cryptographic criteria.

Curve 25519

Daniel Bernstein’s elliptic curve Curve25519 is widely trusted in part because Bernstein made his design choices transparent. The curve is

y² = x³ + 486662x² + x

over the finite field with 2255-19 elements, hence the name.

2255-19 is the largest prime less than 2255, and being close to 2255 makes the method efficient to implement. The coefficient 48666 is less obvious. But Bernstein explains in his paper that he took the three smallest possible values of this parameter that met the explicit design criteria, and then rejected two of them on objective grounds described at the bottom of the paper.

NIST P-384

The design of elliptic curve NIST P-384 is not as transparent as that of Curve25519 which has lead to speculation that NIST may have designed the curve to have a back door.

The curve has has Weierstrass form

y² = x³ – 3x + b

over the finite field with p elements where

p = 2384 – 2128 – 296 + 232 – 1.

As with Curve25519, the choice of field size was motivated by efficiency; the pattern of powers of 2 enables some tricks for efficient implementation. Also, there are objective reasons why the linear coefficient is -3. But the last coefficient b is the 383-bit number

27580193559959705877849011840389048093056905856361568521428707301988689241309860865136260764883745107765439761230575

which has raised some eyebrows. NIST says the value was chosen at random, though not directly. More specifically, NIST says it first generated a certain 160-bit random number, then applied a common key stretching algorithm to obtain b above. Researchers are divided over whether they believe this. See more in this post.

Conclusion

Sometimes you can’t prove that a choice wasn’t deliberate. In that case the best you can do is show that the choice was deliberate, but by an innocent criteria, one unrelated to the matter at hand.

I tried to do this in the Cliff RNG blog posts by using π as my seed. I couldn’t actually use π because the seed had to be between 0 and 1, but there’s an obvious way to take π and produce a number between 0 and 1.

The designers of SHA-2 took a similar route. Just as π is a natural choice for a real number, primes are natural choices for integers. They couldn’t use integers directly, but they made complicated patterns from simple integers in a natural way by taking roots.

Daniel Bernstein gained the cryptography community’s trust by making his design criteria transparent. Given his criteria, the design was almost inevitable.

NIST was not as persuasive as Daniel Bernstein. They claim to have chosen a 160-bit number at random, and they may very well have. But there’s no way to know whether they generated a lot of 160-bit seeds until they found one that resulted in a constant term that has some special property. They may have chosen their seed in good faith, but they have a not been completely persuasive.

Sometimes it’s not enough to act in good faith; you have to make a persuasive case that you acted in good faith.

Detecting a short period in an RNG

The last couple posts have been looking at the Cliff random number generator. I introduce the generator here and look at its fixed points. These turn out to be less of a problem in practice than in theory.

Yesterday I posted about testing the generator with the DIEHARDER test suite, the successor to George Marsaglia’s DIEHARD battery of RNG tests.

This morning I discovered something about the Cliff RNG which led to discovering something about DIEHARDER.The latter is more important: I don’t think anyone is using the Cliff RNG for serious work, but people are definitely using DIEHARDER.

The Cliff RNG has a short period, and yet many of the DIEHARDER tests passed. However, several of the tests failed as well, and perhaps these tests failed due to the short period, in particular rgb_lagged_sum. But at least some tests can pass despite a short period.

Finding the period

Since the Cliff RNG maintains internal state as a floating point number and outputs integers, the period is a bit subtle.

The state of the Cliff RNG is a floating point number between 0 and 1, and so it has 253 possible values. (See Anatomy of a floating point number.) That means the maximum possible period is 253. We use the internal state x to create 32-bit integers n by multiplying x by 232 and truncating to an integer value. The maximum period could conceivably be larger than 232 because an output value n could repeat but correspond to two different x‘s. The actual period, at least in my experiment, was between 222 and 223.

I seeded the Cliff RNG with π – 3 (why?) and found that starting with index i = 3,075,302, the output values repeat with period p = 6,705,743. So there was a burn-in period before the state entered a cycle, but it would repeat that cycle forever. Not only are the output values repeating, the state values x repeat. (Otherwise it would be possible for the integer outputs to cycle for a while then break out.)

It’s possible that starting with other seeds, the generator has other cycle lengths, longer or shorter. But I stumbled across one cycle of period 6,705,743.

Testing RNGs

I wrote a chapter for O’Reilly’s book Beautiful Testing in which I discuss How to test a random number generator. More specifically, now to test a non-uniform random number generator. That is, starting with a trusted uniform random number generator, transform the values to have some other probability distribution. This is the most common scenario. Few developers write their own core RNG, but it’s fairly common to have to use a core RNG to create a custom distribution.

If you do want to test a uniform random number generator, as I do in this post, there are test suites like DIEHARDER. This is one of the oldest and best known test suites. There are newer and more rigorous suites, like TestU01 that I blog about here. There’s also the NIST statistical test suite that I write about in the same post.

These tests are a little challenging to build and use. I’ve had clients ask me to run these tests for them and help them interpret the results. If you’d like for me to do that for you, let’s talk.