When the rule for stopping an experiment depends on the data in the experiment, the results could be biased if the stopping rule isn’t taken into account in the analysis [1].

For example, suppose Alice wants to convince Bob that π has a greater proportion of even digits than odd digits.

Alice: I’ll show you that π has more even digits than odd digits by looking at the firstNdigits. How big would you likeNto be?

Bob: At least 1,000. Of course more data is always better.

Alice: Right. And how many more even than odd digits would you find convincing?

Bob: If there are at least 10 more evens than odds, I’ll believe you.

Alice: OK. If you look at the first 2589 digits, there are 13 more even digits than odd digits.

Now if Alice wanted to convince Bob that there are more odd digits, she could do that too. If you look at the first 2077 digits, 13 more are odd than even.

No matter what two numbers Bob gives, Alice can find a sample size that will give the result she wants. Here’s Alice’s Python code.

from mpmath import mp import numpy as np N = 3000 mp.dps = N+2 digits = str(mp.pi)[2:] parity = np.ones(N, dtype=int) for i in range(N): if digits[i] in ['1', '3', '5', '7', '9']: parity[i] = -1 excess = parity.cumsum() print(excess[-1]) print(np.where(excess == 13)) print(np.where(excess == -13))

The number `N`

is a guess at how far out she might have to look. If it doesn’t work, she increases it and runs the code again.

The array `parity`

contains a 1 in positions where the digits of π (after the decimal point) are even and a -1 where they are odd. The cumulative sum shows how many more even than odd digits there have been up to a given point, a negative number meaning there have been more odd digits.

Alice thought that stopping when there are exactly 10 more of the parity she wants would look suspicious, so she looked for places where the difference was 13.

Here are the results:

[ 126, 128, 134, …, 536, 2588, … 2726] [ 772, 778, 780, …, 886, 2076, … 2994]

There’s one minor gotcha. The array `excess`

is indexed from zero, so Alice reports 2589 rather than 2588 because the 2589th digit has index 2588.

Bob’s mistake was that he specified a minimum sample size. By saying “at least 1,000” he gave Alice the freedom to pick the sample size to get the result she wanted. If he specified an exact sample size, there probably would be either more even digits or more odd digits, but there couldn’t be both. And if he were more sophisticated enough, he could pick an excess value that would be unlikely given that sample size.

## Related posts

- Stopping trials of ineffective drugs sooner
- Finding coffee in pi
- Balancing profit and learning in A/B testing

[1] This does not contradict the likelihood principle; it says that informative stopping rules should be incorporated into the likelihood function.

Very apt

Hypothesis: All odds are prime.

1, 3, 5, 7, 9 (experimental error), 11, 13… proven.

It’s amusing to see this principle abused in TV adverts in the UK. During a commercial, in the small print, it’s common to see statistics based on an unusual sample size. E.g. “85% of people agreed this beauty cream makes your skin look younger (93 people gave their opinion)”.

Why 93? Because if they had added another 7 people, the percentage would no doubt be less!