“Reproducible” and “randomized” don’t seem to go together. If something was unpredictable the first time, shouldn’t it be unpredictable if you start over and run it again? As is often the case, we want incompatible things.
But the combination of reproducible and random can be reconciled. Why would we want a randomized controlled trial (RCT) to be random, and why would we want it to be reproducible?
One of the purposes in randomized experiments is the hope of scattering complicating factors evenly between two groups. For example, one way to test two drugs on a 1000 people would be to gather 1000 people and give the first drug to all the men and the second to all the women. But maybe a person’s sex has something to do with how the drug acts. If we randomize between two groups, it’s likely that about the same number of men and women will be in each group.
The example of sex as a factor is oversimplified because there’s reason to suspect a priori that sex might make a difference in how a drug performs. The bigger problem is that factors we can’t anticipate or control may matter, and we’d like them scattered evenly between the two treatment groups. If we knew what the factors were, we could assure that they’re evenly split between the groups. The hope is that randomization will do that for us with things we’re unaware of. For this purpose we don’t need a process that is “truly random,” whatever that means, but a process that matches our expectations of how randomness should behave. So a pseudorandom number generator (PRNG) is fine. No need, for example, to randomize using some physical source of randomness like radioactive decay.
Another purpose in randomization is for the assignments to be unpredictable. We want a physician, for example, to enroll patients on a clinical trial without knowing what treatment they will receive. Otherwise there could be a bias, presumably unconscious, against assigning patients with poor prognosis if the physicians know the next treatment be the one they hope or believe is better. Note here that the randomization only has to be unpredictable from the perspective of the people participating in and conducting the trial. The assignments could be predictable, in principle, by someone not involved in the study.
And why would you want an randomization assignments to be reproducible? One reason would be to test whether randomization software is working correctly. Another might be to satisfy a regulatory agency or some other oversight group. Still another reason might be to defend your randomization in a lawsuit. A physical random number generator, such as using the time down to the millisecond at which the randomization is conducted would achieve random assignments and unpredictability, but not reproducibility.
Computer algorithms for generating random numbers (technically pseudo-random numbers) can achieve reproducibility, practically random allocation, and unpredictability. The randomization outcomes are predictable, and hence reproducible, to someone with access to the random number generator and its state, but unpredictable in practice to those involved in the trial. The internal state of the random number generator has to be saved between assignments and passed back into the randomization software each time.
Random number generators such as the Mersenne Twister have good statistical properties, but they also carry a large amount of state. The random number generator described here has very small state, 64 bits, and so storing and returning the state is simple. If you needed to generate a trillion random samples, Mersenne Twitster would be preferable, but since RCTs usually have less than a trillion subjects, the RNG in the article is perfectly fine. I have run the Die Harder random number generator quality tests on this generator and it performs quite well.
Need help with randomized trials? Let’s talk.
7 thoughts on “Reproducible randomized controlled trials”
Thanks for the article. It raises some interesting points.
But I disagree with the statement, “A physical random number generator, such as using the time down to the millisecond at which the randomization is conducted would achieve random assignments and unpredictability, but not reproducibility.”
If you record the random sequence, you can reproduce it.
Isn’t the sole underlying reason for reproducibility to be evidence that whatever insights were made generalize to the overall population?
An oft-overlooked reason to want reproducibility of (pseudo-)random number streams is that there are useful variance reduction techniques that can be applied if you are able to produce positively or negatively correlated pseudorandom sequences. This is most often used in the context of monte carlo simulations, but can be applied to other estimation problems as well.
I’ve never understood the fixation on “physical randomness”. It seems highly likely to me that any physical apparatus is going to have inadvertent induced nonrandomness, while a good PRN will be both well-understood and (probably) more ‘random’ in the ways we care about.
dod: Here I’m talking about reproducing the randomization sequence, something I’ve had to do before on clinical trials. There’s another sort of reproducibility that is more like what you’re talking about, repeating the experiment to see whether the results can be independently confirmed.
David: I agree re physical randomness. It has to be measured, and there’s probably more bias in the measurement than in a PRNG. I think the objection is philosophical rather than practical. People want, or think they want, something that is “really random.” But in practice you have more tangible goals.
One thing I think left out of all this is SAMPLE SIZE… Randomization is only likely to adequately ‘scatter complicating factors’ if the sample size is large enough to do so. In complicated studies (like many clinical ones) the variables, especially unrecognized ones, are soooo many I suspect sample sizes are generally way too small.
Reproducibility is far more difficult to achieve than simply having the random seed. One also must save precisely the data that the random generator was applied to, which is harder than it looks.
Individuals come in and are excluded over time. If the generator was applied to a batch, in what order were individuals presented and associated with the RNG output? This may depend on the internal representation in the database, which will sometimes change over time, even if the set of individuals does not, and the old state is lost. How about calls to the RNG that were part of a workflow which somehow failed and did not result in an assignment, with the software silently recovering with another call to the RNG?
Extremely careful snapshotting of *everything* is required to have any hope of reproducibility.
(hit Post too soon)
I found it much more practical to have assignment be a function of the individual, as in:
Treatment <- ( hash(individual_id) %% 100) < treat_percentage
This still has issuesbut it's very "random", and very deterministic at the same time.