When people ask for a random sequence, they’re often disappointed with what they get.
Random sequences clump more than most folks expect. For graphical applications, quasi-random sequence may be more appropriate.These sequences are “more random than random” in the sense that they behave more like what some folks expect from randomness. They jitter around like a random sequence, but they don’t clump as much.
Researchers conducting clinical trials are dismayed when a randomized trial puts several patients in a row on the same treatment. They want to assign patients one at a time to one of two treatments with equal probability, but they also want the allocation to work out evenly. This is like saying you want to flip a coin 100 times, and you also want to get exactly 50 heads and 50 tails. You can’t guarantee both, but there are effective compromises.
One approach is to randomize in blocks. For example, you could randomize in blocks of 10 patients by taking a sequence of 5 A‘s and 5 B‘s and randomly permuting the 10 letters. This guarantees that the allocations will be balanced, but some outcomes will be predictable. At a minimum, the last assignment in each block is always predictable: you assign whatever is left. Assignments could be even more predictable: if you give n A‘s in a row in a block of 2n, you know the last n assignments will be all B‘s.
Another approach is to “encourage” balance rather than enforce it. When you’ve given more A‘s than B‘s you could increase the probability of assigning a B. The greater the imbalance, the more heavily you bias the randomization probability in favor of the treatment that has been assigned less. This is a sort of compromise between equal randomization and block randomization. All assignments are random, though some assignments may be more predictable than others. Large imbalances are less likely than with equal randomization, but more likely than with block randomization. You can tune how aggressively the method responds to imbalances in order to make the method more like equal randomization or more like block randomization.
No approach to randomization will satisfy everyone because there are conflicting requirements. Randomization is a dilemma to be managed rather than a problem to be solved.
Isnt a more fundamental problem the method as to how random numbers are generated in the first place. This is the digital age, and we dont look to the quantum for our randomness. We look to a nice and tidy computer program in what is called pseudo-random for the sake of simplicity and easy, and by doing so compromise the very nature of randomness. It isnt random to begin with – not truly. And this is how we derive our assignments, is it not?
Im only suggesting that we first need true randomness, we need to solve the more fundamental issue of how we get random numbers, before we worry about this “clumping” issue. We may find that the clumping issue becomes moot, dissolves right away, if our numbers were truly random to begin with. I know that probably sounds mystical, and perhaps I misunderstand, but frankly we dont understand the nature of randomness in the first place. So how can we base anything on it? How can we assume randomness from what we already know and can admit to be pseudo-randomness?
But then again, what is randomness? Is random truly “random”, as we would like to believe it is? Or is random merely the inability for a human being to predict due to complexity of the system and there being far too many variables to measure, and thirdly that human brains are too slow to resolve the necessary computations? Because if the latter is true, then any Newtonian system is random if sufficiently unpredictable. The physics associated with the roll of a die is pretty straightforward, and yet it is one of the most common examples of what we deem to be true randomness.
And it just dawned on me. If randomness were merely the state of being outside of human ability to predict in a timely fashion then, for all intent and purpose, pseudo-random IS random in every sense of the word, albeit perhaps to a letter degree.
well, first, in reply to cogito above, ill just tidy up what he said as there may exist a random function R(x), that captures what was conceptualized in those paragraphs above.
second, in your post, the first example blocks of 10, I believe that is just a step function unless the end time/number of patients/rate of patients is unknown. If some or all of those factors are unknown, than it is Poisson distribution (not random).
The second example you discuss is Gaussian distribution (not random again). There, with varying weights, well that is just a Gaussian measure concept…
Anyhow, so neither is random, but why do you need random for clinical tests? If you are treating a disease and want to know E[x] or conditional probabilities of drugtreatment given disease, why do you need random ? Can you not generate relevant probabilities for outcome relative to know E[x] (expectation of disease, ie 1:1000 people have this disease). What kind of probabilities are trying to be inferred in clinical studies ?
You have a really low calibre of commenter, but they seem to think they are intelligent! (myself included)
Quality varies, but often the comments are very good. I’ve learned a lot from the comments.
There a some excellent cartoons on the subject of “technically random, yet psychologically unsatisfying”.
Dilbert: http://dilbert.com/strips/comic/2001-10-25/
XKCD: http://xkcd.com/221/
The second method you mention is interestingly used in some computer games. A character has a chance of landing a “critical hit” with each attack. In a player versus player game, a fight being decided by a lucky sequence of critical hits is often seeing as undesirable from the designer’s perspective, so this method is applied in order to reduce the probability of extreme outcomes.
I have lately been working on Monte Carlo integration for computer graphics and yes this “purely random” method converges poorly to the ground truth due to the clumping you mentioned. One slighly better method is “statified sampling” which reduces the variance over the purely random method by splitting the domain to even size sub-domains (stratas) and taking random samples within each strata. You can also reduce variance with importance sampling which distributes samples based on probability density function (PDF) and is often fed with Hammersley quasi-random sequence for faster convergence.
Cogito asked for true random sequences.
That used to be done using, effectively, valve noise, for ERNIE in the UK. But its quite hard to guarantee randomness with a physical system. First, what we mostly want a random number in a range (usually 0,1), not just any number (you could ask for normal, but the extremes woild be under-represented – the observable universe is finite and the normal distribution isn’t). So we’d have to constrain most physical ‘random’ systems – which tends to mess the distribution up or result in non-random censoring. Second, we usually ask for a uniform random distribution in (0,1); hard to find any physical system at all that does that.
Finally, though, for most experimental work we need only ‘random enough’, and that is not very demanding. By contrast, the best algorithms for (pseudo) random number generation are very good indeed when tested for randomness and have very long repeat lengths indeed. Check out the repeat length for things like the extended Wichman-Hill, for example, or the most recent Mersenne twisters. They have no trouble generating a sample for a 200-patient clinical trial.
So we really are in a world where the most important question is about representativeness more than randomisation.
I think that Cogito misses John’s excellent point, which is that when we express a desire for “random”, we can easily overlook additional requirements for the resulting sequence, ascribing quality to the randomness, rather than to the sampling.
Suppose I have a classroom of 30 kids and want to randomly assign them to one of two teams. One random sequence might have 7 on team A and 23 on team B. Focusing on “fixing” the RNG is the wrong thing to do here. It makes no difference whether the generator is pseudo-random, or if it comes from a quantum source — the true problem is that the additional requirement of team balance was overlooked.
In this case, one may make the mistake of stating that a sequence that generates balanced teams is not “truly random”, but again, that’s not necessarily true. There is certainly a set of all possible balanced teams, and it is possible to randomly choose from that set. If your model of the problem is that you must iterate over the students and randomly select their team, then it’s not the random number generator that’s the problem, it’s your model of the solution space, and the way you’ve included undesirable answers in the space of possible candidates.
Similarly for jittered sampling in computer graphics. A jittered sampling (random limited deviation from a grid of pixels) is no less random than allowing complete freedom in the sample points: it’s just the model that is different.
It reminds of a story I read a long time ago. A teacher challenged her class: I will leave the room and you shall make a 100 tosses with a coin and then make up a random sequence of heads and tails; write down the sequences on the blackboard, then I will come back and guess which is which. She would always guess correctly. The secret was that she would look for the longest stretch of heads or tails, that would be the genuine random sequence!