# Evolution of random number generators

The previous post showed that the bits of prime powers look kinda chaotic. When you plot them, they form a triangular shape because the size of the numbers is growing. The numbers are growing geometrically, so their binary representations are growing linearly. Here’s the plot for powers of 5: You can crop the triangle so that you just see a rectangular portion of it, things look less regular. If we look at powers of p modulo a power of 2, say 232, then we’re chopping off the left side of the triangle.

Suppose we don’t start with 1, but start with some other number, call it a seed, and multiply by 5 every time. Would we get a chaotic pattern? Yes, and we have just invented the congruential random number generator. These RNGs have the form

xn+1 = a xn mod m

The RNG we implicitly described above would have a = 5 and m = 232. This is not a good choice of a and m, but it’s on to something. We can use different values of multiplier and modulus to make a decent RNG.

The RANDU random number generator, commonly used in the 1960’s, used a = 65539 and m = 231. This turned out to have notoriously bad statistical properties. We can see this with a small modification of the code used to produce the plot above.

```    def out(s):
s = s.replace('1', chr(0x2588))
s = s.replace('0', ' ')
print(s)

a, m = 65539, 2**31
x = 7
for i in range(32):
x = a * x % m
s = format(x, '32b')
out(s)
```

Here’s the plot. The pattern on the right side looks like a roll of film. This shows that the lowest order bits are very regular. You can also see that we need to start with a large seed: starting with 7 created the large triangular holes at the top of the image. There are other patterns in the image that are hard to describe, but you get the sense something is not right, especially when you compare this image to a better alternative.

A much better choice of parameters is multiplier a = 48271 and modulus m = 231 -1. These are the parameters in the MINSTD random number generator. Here’s a plot of its bits, also starting with seed 7. The MINSTD generator is much better than RANDU, and adequate for some applications, but far from state of the art. You could do better by using a much larger multiplier and a modulus on the order of  264 or 2128.

You could do even better by adding a permutation step to your congruential generator. That’s the idea behind the PCG (permuted congruential generator) family of random number generators. These generators pass all the standard statistical tests with flying colors.

There are many other approaches to random number generation, but this post shows one thread of historical development, how hypothetically someone could go from noticing that prime powers have chaotic bit patterns to inventing PCG.

# Extracting independent random bits from dependent sources

Sometimes you have a poor quality source of randomness and you want to refine it into a better source. You might, for example, want to generate cryptographic keys from a hardware source that is more likely to produce 1’s than 0’s. Or maybe your source of bits is dependent, more likely to produce a 1 when it has just produced a 1.

## Von Neumann’s method

If the bits are biased but independent, a simple algorithm by John von Neumann will produce unbiased bits. Assume each bit of input is a 1 with probability p and a 0 otherwise. As long as 0 < p < 1 and the bits are independent, it doesn’t matter what p is .

Von Neumann’s algorithm works on consecutive pairs bits. If the pair is 00 or 11, don’t output anything. When you see 10 output a 1, and when you see 01 output a 0. This will output a 0 or a 1 with equal probability.

## Manuel Blum’s method

Manuel Blum built on von Neumann’s algorithm to extract independent bits from dependent sources. So von Neumann’s algorithm can remove bias, but not dependence. Blum’s algorithm can remove bias and dependence, if the dependence takes the form of a Markov chain.

Blum’s assumes each state in the Markov chain can be exited two ways, labeled 0 and 1. You could focus on a single state and use von Neumann’s algorithm, returning 1 when the state exits by the 1 path the first time and 0 the second time, and return 0 when it does the opposite. But if there are many states, this method is inefficient since it produces nothing when the chain is in all the other states.

Blum basically applies the same method to all states, not just one state. But there’s a catch! His paper shows that the naive implementation of this idea is wrong. He gives a pseudo-theorem and a pseudo-proof that the naive algorithm is correct before showing that it is not. Then he shows that the correct algorithm really is correct.

This is excellent exposition. It would have been easier to say nothing of the naive algorithm, or simply mention in passing that the most obvious approach is flawed. But instead he went to the effort of fully developing the wrong algorithm (with a warning up-front that it will be shown to be wrong).

## Related posts

 It matters for efficiency. The probability that a pair of input bits will produce an output bit is 2p(1 – p), so the probability is maximized when p = 0.5. If p = 0.999, for example, the method will still work correctly, but you’ll have to generate 500 pairs of input on average to produce each output bit.

# ChaCha RNG with fewer rounds

ChaCha is a CSPRING, a cryptographically secure pseudorandom number generator. When used in cryptography, ChaCha typically carries out 20 rounds of its internal scrambling process. Google’s Adiantum encryption system uses ChaCha with 12 rounds.

The runtime for ChaCha is proportional to the number of rounds, so you don’t want to do more rounds than necessary if you’re concerned with speed. Adiantum was developed for mobile devices and so Google wanted to reduce the number of rounds while maintaining a margin of cryptographic safety.

Cryptographer Jean-Philippe Aumasson suggests 8 rounds of ChaCha is plenty. He reports that there is a known attack on ChaCha with 6 rounds that requires on the order of 2116 operations, but that ChaCha with 5 rounds is definitely breakable, requiring on the order of only 216 operations.

Three rounds of ChaCha are not enough , but four rounds are enough to satisfy DIEHARDER, PractRand, and TestU01 . This illustrates the gap between adequate statistical quality and adequate cryptographic quality: Four rounds of ChaCha are apparently enough to produce the former but five rounds are not enough to produce the latter.

There doesn’t seem to be any reason to use ChaCha with four rounds. If you don’t need cryptographic security, then there are faster RNGs you could use. If you do need security, four rounds are not enough.

ChaCha with six rounds seems like a good compromise if you want an RNG that is fast enough for general use and that that also has reasonably good cryptographic quality. If you want more safety margin for cryptographic quality, you might want to go up to eight rounds.

## What a difference one round makes

One thing I find interesting about random number generation and block encryption is that a single round of obfuscation can make a huge difference. ChaCha(3) fails statistical tests but ChaCha(4) is fine. ChaCha(5) is easily broken but ChaCha(6) is not.

## Interesting failures

Often random number generators are either good or bad; they pass the standard test suites or they clearly fail. ChaCha(3) is interesting in that it is somewhere in between. As the results in the footnotes show, ChaCha(3) is an intermediate case. DIEHARDER hints at problems, but small crush thinks everything is fine. The full crush battery however does find problems.

The decisive failure of the Fourier tests is understandable: low-quality generators often fail spectral tests. But the results of the simple poker test are harder to understand. What is it about simulating poker that makes ChaCha(3) fail spectacularly? And in both cases, one more round of ChaCha fixes the problems.

## Related posts

 ChaCha(3) passes DIEHARDER, though five of the tests passed with a “weak” pass. Passes TestU01 small crush but fails full crush:

```========= Summary results of Crush =========

Version:          TestU01 1.2.3
Generator:        32-bit stdin
Number of statistics:  144
Total CPU time:   00:39:05.05
The following tests gave p-values outside [0.001, 0.9990]:
(eps  means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):

Test                          p-value
----------------------------------------------
25  SimpPoker, d = 64                eps
26  SimpPoker, d = 64                eps
27  CouponCollector, d = 4          7.7e-4
28  CouponCollector, d = 4          7.4e-4
29  CouponCollector, d = 16          eps
33  Gap, r = 0                      2.7e-8
51  WeightDistrib, r = 0             eps
52  WeightDistrib, r = 8           5.2e-15
53  WeightDistrib, r = 16           2.1e-5
55  SumCollector                     eps
69  RandomWalk1 H (L = 10000)       1.1e-4
74  Fourier3, r = 0               1.3e-144
75  Fourier3, r = 20               6.9e-44
80  HammingWeight2, r = 0           2.8e-6
83  HammingCorr, L = 300           3.2e-10
84  HammingCorr, L = 1200          6.1e-13
----------------------------------------------
All other tests were passed
```

ChaCha(3) also fails PractRand decisively.

```RNG_test using PractRand version 0.94
RNG = chacha(3), seed = 0x7221236f
test set = core, folding = standard (32 bit)

rng=chacha(3), seed=0x7221236f
length= 128 megabytes (2^27 bytes), time= 2.0 seconds
Test Name                         Raw       Processed     Evaluation
[Low1/32]BCFN(2+0,13-6,T)         R= +31.0  p =  4.7e-11   VERY SUSPICIOUS
[Low1/32]BCFN(2+1,13-6,T)         R= +27.4  p =  8.3e-10   VERY SUSPICIOUS
[Low1/32]BCFN(2+2,13-6,T)         R= +52.7  p =  1.8e-18    FAIL !
[Low1/32]BCFN(2+3,13-6,T)         R= +47.6  p =  9.5e-17    FAIL !
[Low1/32]BCFN(2+4,13-7,T)         R= +26.1  p =  1.7e-8   very suspicious
[Low1/32]DC6-9x1Bytes-1           R= +26.3  p =  1.3e-14    FAIL !
[Low1/32]FPF-14+6/16:all          R=  +8.6  p =  2.2e-7   very suspicious
...and 147 test result(s) without anomalies
```

 ChaCha(4) passed TestU01 small crush and full crush. It passed PractRand using up to 512 GB.

# Inverse congruence RNG

Linear congruence random number generators have the form

xn+1 = a xn + b mod p

Inverse congruence generators have the form

xn+1 = a xn-1 + b mod p

were x-1 means the modular inverse of x, i.e. the value y such that xy = 1 mod p. It is possible that x = 0 and so it doesn’t have an inverse. In that case the generator returns b.

Linear congruence generators are quite common. Inverse congruence generators are much less common.

Inverse congruence generators were first proposed by J. Eichenauer and J. Lehn in 1986. (The authors call them inversive congruential generators; I find “inverse congruence” easier to say than “inversive congruential.”)

An advantage of such generators is that they have none of the lattice structure that can be troublesome for linear congruence generators. This could be useful, for example, in high-dimensional Monte Carlo integration.

These generators are slow, especially in Python. They could be implemented more efficiently in C, and would be fast enough for applications like Monte Carlo integration where random number generation isn’t typically the bottleneck.

The parameters a, b, and p have to satisfy certain algebraic properties. The parameters I’ll use in the post were taken from . With these parameters the generator has maximal period p.

The generator naturally returns integers between 0 and p. If that’s what you want, then the state and the random output are the same.

If you want to generate uniformly distributed real numbers, replace x above with the RNG state, and let x be the state divided by p. Since p has 63 bits and a floating point number has 53 significant bits, all the bits of the result should be good.

Here’s a Python implementation for uniform floating point values. We pass in x and the state each time to avoid global variables.

```    from sympy import mod_inverse

p = 2**63 - 25
a = 5520335699031059059
b = 2752743153957480735

def icg(pair):
x, state = pair
if state == 0:
return (b/p, b)
else:
state = (a*mod_inverse(state, p) + b)%p
return (state/p, state)
```

If you’re running Python 3.8 or later, you can replace

`    mod_inverse(state, p)`

with

`    pow(state, -1, p)`

and not import `sympy`.

Here’s an example of calling `icg` to print 10 floating point values.

```    x, state = 1, 1
for _ in range(10):
(x, state) = icg((x, state))
print(x)
```

If we want to generate random bits rather than floating point numbers, we have to work a little harder. If we just take the top 32 bits, our results will be slightly biased since our output is uniformly distributed between 0 and p, not between 0 and a power of 2. This might not be a problem, depending on the application, since p is so near a power of 2.

To prevent this bias, I used a acceptance-rejection sampling method, trying again when the method generates a value above the largest multiple of 232 less than p. The code should nearly always accept.

```    T = 2**32
m = p - p%T

def icg32(pair):
x, state = pair
if state == 0:
return (b % T, b)
state = (a*mod_inverse(state, p) + b)%p
while state > m:  # rare
state = (a*mod_inverse(state, p) + b)%p
return (state % T, state)
```

You could replace the last line above with

`    return state >> (63-32)`

which should be more efficient.

The generator passes the four most commonly used test suites

• DIEHARDER
• NIST STS
• PractRand
• TestU01

whether you reduce the state mod 32 or take the top 32 bits of the state. I ran the “small crush” battery of the TestU01 test suite and all tests passed. I didn’t take the time to run the larger “crush” and “big crush” batteries because they require two and three orders of magnitude longer.

## More on random number generation

 Jürgen Eichenauer-Herrmann. Inversive Congruential Pseudorandom Numbers: A Tutorial. International Statistical Review, Vol. 60, No. 2, pp. 167–176.

# TestU01 small crush test suite In recent posts I’ve written about using RNG test suites on the output of the μRNG entropy extractor. This is probably the last post in the series. I’ve looked at NIST STS, PractRand, and DIEHARDER before. In this post I’ll be looking at TestU01.

TestU01 includes three batteries of tests: Small Crush, Crush, and Big Crush. The entropy extractor failed the smallest of the three, so I didn’t go on to the larger suites. Small Crush isn’t small; it used over 22 billion 32-bit samples as input, about 0.84 GB of data. Crush uses two orders of magnitude more data, and Big Crush uses another order of magnitude more data than Crush.

SmallCrush consists of 10 tests:

• `smarsa_BirthdaySpacings`
• `sknuth_Collision`
• `sknuth_Gap`
• `sknuth_SimpPoker`
• `sknuth_CouponCollector`
• `sknuth_MaxOft`
• `svaria_WeightDistrib`
• `smarsa_MatrixRank`
• `sstring_HammingIndep`
• `swalk_RandomWalk1`

The test names begin with `s`, followed by a prefix indicating the origin of the test. For example, `knuth` refers to Donald Knuth’s tests in volume 2 of TAOCP and `marsa` refers to George Marsaglia. The remainder of the name is more descriptive, such as `SimpPoker` for Knuth’s simple poker test.

The output of the entropy extractor failed four of the tests, failure being defined as producing a p-value less than 10-300. The other tests passed without issue, meaning they returned p-values in the range [0.001, 0.999].

Recall from earlier posts that μRNG entropy extractor takes three possibly biased bit streams and produces an unbiased bit stream, provided each of the input streams has min-entropy of at least 1/3. I produced biased streams by taking the bitwise OR of two consecutive values, producing a stream with probability 0.75 of being a 1 and probability 0.25 of being a 0. The result passed all STS and DIEHARDER tests, but failed some PractRand and Test01 SmallCrush tests. This is consistent with the generally held opinion that STS and DIEHARDER are relatively weak tests and PractRand and TestU01 are more rigorous tests.

I applied the entropy extractor to PCG without creating a biased stream, and the result passed PractRand and TestIU01 SmallCrush. Presumably it would have passed STS and DIEHARDER as well. This confirms that the extractor does no harm to a high-quality stream of pseudorandom bits. It largely removes the bias from biased streams, enough to pass the easier two test suites but not enough to pass the two more demanding test suites.

# DIEHARDER test suite

The first well-known suite of tests for random number generators was George Marsalia’s DIEHARD battery. The name was a pun on DieHard car batteries. Robert G. Brown took over the DIEHARD test suite and called it DIEHARDER, presumably a pun on the Bruce Willis movie.

I’ve written lately about an entropy extractor that creates a stream of unbiased bits from streams of biased bits. The output passes the NIST STS tests but fails the PractRand tests decisively. I was curious to see how it did on DIEHARDER, whether the results would be more like STS or like PractRand.

Turns out they’re more like STS. All tests pass, with the only minor anomaly being that one test out of 114 tests had a “weak” pass.

It’s remarkable that random number generators are even possible. They’re deterministic programs that produce output that for many practical purposes behaves as if it’s random. The entropy extractor convincingly imitates random output as far as the DIEHARDER tests are concerned, and as far as the STS tests are concerned, and even 2/3 of the PractRand tests. The remaining 1/3 of the PractRand tests, however, are not impressed.

It would be too simplistic to say that an RNG is “bad” because it fails a particular test. It would be even worse to assume and RNG is “good” because it passes a test or suite of tests. Good and bad depend on suitability for a given tasks. But all other things being equal, you can have more confidence in a generator that passes more tests.

# Using PractRand to test an RNG

Yesterday I wrote about my experience using NIST STS to test an entropy extractor, a filtering procedure that produces unbiased bits from biased sources. This post will look at testing the same entropy extractor using the Practically Random (PractRand) test suite. The results were much worse this time, which speaks to the limitations of both the entropy extractor and the STS test suite.

## Difficulties with PractRand

I built PractRand on Linux with no difficulty. There were multiple warnings but no errors. But when I piped a file of bits into `RNG_test` I got a message saying “error reading standard input”.

A couple years ago I ran PractRand using my Mac with no problems (results here) so I switched over to the machine I’d used before. The latest version of PractRand (pre95) produces compile errors, but the version I’d used before (93) compiled successfully.

Piping bits to `RNG_test` from the command line worked when I used the program `RNG_output` included with PractRand, but piping my own file of bits to the test suite failed with the same error I saw on Linux. I applied a patch posted by Daniel Lemire and was able to pipe my file to the test program. I went back to Linux to see if I could get version 93 to work there, but no luck.

Testing bits from a file is unusual; it’s faster to generate random numbers and send them directly to the test without going to disk and back, and so naturally that’s what people will do more often. But I wanted to use bits from a file because I need to do that on another project.

## Test results

The entropy extractor failed 47 out of 151 tests using 128 MB of data. Five other tests reported anomalies ranging from “mildly suspicious” to “very suspicious.” The tests that failed had p-values of 10-16 and smaller. Most of the p-values were less than can be represented by a 64-bit floating point number. PractRand must use extended precision to report just how infinitesimally small some of the p-values are.

To make sure the test failures weren’t due to some anomaly with piping the file into `RNG_test` I repeated my procedure using output from the Melissa O’Neill’s PCG generator. All tests passed with no anomalies.

## First do no harm

What if you run the entropy extractor on a high-quality source of random bits? When I reran the experiment on three streams from PCG, the result passed all PractRand tests without any anomalies. This suggests that applying the μRNG entropy extractor improves random streams, though perhaps not as much as we’d like, and at a minimum it makes streams no worse.

# Testing entropy extractor with NIST STS

Around this time last year I wrote about the entropy extractor used in μRNG. It takes three biased random bit streams and returns an unbiased bit stream, provided each input stream as has least 1/3 of a bit of min-entropy.

I’ve had in the back of my mind that I should go back and run the output of the extractor through a standard test suite. I’ve been doing more RNG testing lately, and so while the software is fresh on my mind I wanted to go back and test the entropy extractor.

To create a biased bit stream, I first created an unbiased bit stream using the PCG random number generator, then took the bitwise OR of consecutive samples. In C notation, I created two 32-bit unsigned integers `u` and `v` and used `u|v`. The resulting bits are 1’s three out of four times since a bit is 0 only if both corresponding bits are 0’s. The min-entropy of the resulting stream is

-log2 max(0.25, 0.75) = 0.415

which is larger than 1/3, so we should be able to apply the μRNG entropy extractor. To do this, we create three bit streams. Let a, b, and c be 8-bit bytes, each from a different stream. Then we combine these into

a×b + c

and use that as one byte of output. So it takes three bytes of input to produce one byte of output, which is to be expected since we’re starting with sources that may contain only 1/3 of a bit of min-entropy per bit of output.

The multiplication and addition above are carried out in the Galois field GF(28). This means that multiplication may be like nothing you expect, and addition is XOR, i.e. bitwise exclusive OR. The multiplication is the same at that used in AES encryption.

## NIST Statistical Test Suite

There are several test suites we could use—DIEHARDER, PractRand, TestU01, etc.—and expect I’ll write more about those before long, but for this post we’ll focus on the NIST Statistical Test Suite or STS. (Update: the extractor fails hard on PractRand.) The STS suite includes the following 15 tests.

• Frequency (monobit)
• Frequency test within a block
• Runs test
• Test for the longest run of ones in a block
• Binary matrix rank test
• Discrete Fourier Transform (spectral) test
• Non-overlapping template matching test
• Overlapping template matching test
• Maurer’s universal statistical test
• Linear complexity test
• Serial test
• Approximate entropy test
• Cumulative sums (cumsum) test
• Random excursions test
• Random excursions variant test

See NIST Special Publication 800-22: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications.

When we run the input of PCG alone through STS, it passes with flying colors. If we run the output of PCG after OR’ing pairs of unsigned integers together, the test fails spectacularly, spewing underflow warnings. But if we take three streams of biased bits that would each fail and keep the extracted output, it passes all the tests, just as with the original output of PCG.

In each case I tested one million 32-bit unsigned integers. In the biased case I sampled three million integers

Note that in our example we know the amount of bias because we deliberately created the bias. You might not know the bias, or equivalently the min-entropy, in general. As long as the min-entropy is greater than 1/3, the entropy extractor works.

# Detecting a short period in an RNG

The last couple posts have been looking at the Cliff random number generator. I introduce the generator here and look at its fixed points. These turn out to be less of a problem in practice than in theory.

Yesterday I posted about testing the generator with the DIEHARDER test suite, the successor to George Marsaglia’s DIEHARD battery of RNG tests. I don’t think anyone is using the Cliff RNG for serious work, but people are definitely using DIEHARDER.

The Cliff RNG has a short period, and yet many of the DIEHARDER tests passed. However, several of the tests failed as well, and perhaps these tests failed due to the short period, in particular rgb_lagged_sum. But at least some tests can pass despite a short period.

## Finding the period

Since the Cliff RNG maintains internal state as a floating point number and outputs integers, the period is a bit subtle.

The state of the Cliff RNG is a floating point number between 0 and 1, and so it has 253 possible values. (See Anatomy of a floating point number.) That means the maximum possible period is 253. We use the internal state x to create 32-bit integers n by multiplying x by 232 and truncating to an integer value. The maximum period could conceivably be larger than 232 because an output value n could repeat but correspond to two different x‘s. The actual period, at least in my experiment, was between 222 and 223.

I seeded the Cliff RNG with π – 3 (why?) and found that starting with index i = 3,075,302, the output values repeat with period p = 6,705,743. So there was a burn-in period before the state entered a cycle, but it would repeat that cycle forever. Not only are the output values repeating, the state values x repeat. (Otherwise it would be possible for the integer outputs to cycle for a while then break out.)

It’s possible that starting with other seeds, the generator has other cycle lengths, longer or shorter. But I stumbled across one cycle of period 6,705,743.

## Testing RNGs

I wrote a chapter for O’Reilly’s book Beautiful Testing in which I discuss How to test a random number generator. More specifically, now to test a non-uniform random number generator. That is, starting with a trusted uniform random number generator, transform the values to have some other probability distribution. This is the most common scenario. Few developers write their own core RNG, but it’s fairly common to have to use a core RNG to create a custom distribution.

If you do want to test a uniform random number generator, as I do in this post, there are test suites like DIEHARDER. This is one of the oldest and best known test suites. There are newer and more rigorous suites, like TestU01 that I blog about here. There’s also the NIST statistical test suite that I write about in the same post.

These tests are challenging to build and use. I run these tests for clients and help them interpret the results, and you’d like for me to do that for you, let’s talk.

# Testing Cliff RNG with DIEHARDER

My previous post introduced the Cliff random number generator. The post showed how to find starting seeds where the generator will start out by producing approximately equal numbers. Despite this flaw, the generator works well by some criteria.

I produced a file of s billion 32-bit integers by multiplying the output values, which were floating point numbers between 0 and 1, by 232 and truncating to integer. Then I ran the DIEHARDER random number generator test suite.

The results were interesting. Before running the tests, I thought the tests would nearly all pass or nearly all fail, more likely the latter. But what happened was that many tests passed and some failed hard .

Here’s a list of the tests that passed:

• diehard_birthdays
• diehard_rank_32x32
• diehard_rank_6x8
• diehard_bitstream
• diehard_oqso
• diehard_dna
• diehard_count_1s_str
• diehard_count_1s_byt
• diehard_runs
• sts_monobit
• sts_serial
• rgb_bitdist
• rgb_kstest_test
• dab_dct
• dab_filltree2
• dab_monobit2

The tests that failed were:

• diehard_parking_lot
• diehard_2sphere
• diehard_3sphere
• diehard_squeeze
• diehard_craps
• marsaglia_tsang_gcd
• rgb_lagged_sum
• dab_bytedistrib

I’ve left out a few test results that were ambiguous as well as tests that were described as “Suspect” and “Do not use” on the DIEHARDER website.

The site I mentioned in the previous post where I ran across this generator said that it passed a spherical generation test. I assume the implementation of that test was less demanding that the version included in DIEHARD. But the generator does well by other tests.

The lagged sum test tests for autocorrelation. There’s a good reason for this: the generator has a short period, as I discuss here. The lagged sum test fails because the output has perfect autocorrelation at a lag equal to the period.

This generator demonstrates how passing a few RNG tests can be misleading. Or to look at it a different way, it shows how a generator can be useful for some tasks and terrible for other tasks.

## More on testing RNGs

 By “failed hard” I mean the test return a p-value of zero. The p-value couldn’t actually be zero, but it was close enough that it the displayed value was exactly zero.