When the rule for stopping an experiment depends on the data in the experiment, the results could be biased if the stopping rule isn’t taken into account in the analysis .
For example, suppose Alice wants to convince Bob that π has a greater proportion of even digits than odd digits.
Alice: I’ll show you that π has more even digits than odd digits by looking at the first N digits. How big would you like N to be?
Bob: At least 1,000. Of course more data is always better.
Alice: Right. And how many more even than odd digits would you find convincing?
Bob: If there are at least 10 more evens than odds, I’ll believe you.
Alice: OK. If you look at the first 2589 digits, there are 13 more even digits than odd digits.
Now if Alice wanted to convince Bob that there are more odd digits, she could do that too. If you look at the first 2077 digits, 13 more are odd than even.
No matter what two numbers Bob gives, Alice can find a sample size that will give the result she wants. Here’s Alice’s Python code.
from mpmath import mp import numpy as np N = 3000 mp.dps = N+2 digits = str(mp.pi)[2:] parity = np.ones(N, dtype=int) for i in range(N): if digits[i] in ['1', '3', '5', '7', '9']: parity[i] = -1 excess = parity.cumsum() print(excess[-1]) print(np.where(excess == 13)) print(np.where(excess == -13))
N is a guess at how far out she might have to look. If it doesn’t work, she increases it and runs the code again.
parity contains a 1 in positions where the digits of π (after the decimal point) are even and a -1 where they are odd. The cumulative sum shows how many more even than odd digits there have been up to a given point, a negative number meaning there have been more odd digits.
Alice thought that stopping when there are exactly 10 more of the parity she wants would look suspicious, so she looked for places where the difference was 13.
Here are the results:
[ 126, 128, 134, …, 536, 2588, … 2726] [ 772, 778, 780, …, 886, 2076, … 2994]
There’s one minor gotcha. The array
excess is indexed from zero, so Alice reports 2589 rather than 2588 because the 2589th digit has index 2588.
Bob’s mistake was that he specified a minimum sample size. By saying “at least 1,000” he gave Alice the freedom to pick the sample size to get the result she wanted. If he specified an exact sample size, there probably would be either more even digits or more odd digits, but there couldn’t be both. And if he were more sophisticated enough, he could pick an excess value that would be unlikely given that sample size.
 This does not contradict the likelihood principle; it says that informative stopping rules should be incorporated into the likelihood function.