Estimate effect size in A/B testing to stop early

The basic question that A/B testing answers is which of two options is better, such as which of two page designs customers prefer. You may run with whichever option seems better, even if it is only slightly better.

But sometimes simply asking whether A is better than B, or vice versa, is not the right question. You might want to know whether B is substantially better than A. For example, if option A is the status quo, you might want to know whether B is enough better than A that it is worth the implementation cost.

Maybe it’s unclear whether B is better than A, but it is clear that B is not much better. In that case it may not be worthwhile to resolve which of A and B is actually better because the difference is small.

For example, suppose B needs to be 40% better than A before it’s worth the cost of moving to B. And suppose it looks like B is somewhere between 10% worse and 20% better. There’s a lot of uncertainty regarding whether B is truly an improvement, but there’s less uncertainty that it is an insufficient improvement.

Early stopping

When is it possible to stop an experiment early? It’s always possible. The better question is what is the cost of stopping an experiment early. This article looks at this question in some detail.

In short, the earlier you stop, the less certain your conclusions are, and this statement can be quantified. Maybe there’s a great deal of uncertainty as to whether A is better than B, but little doubt that the difference between the options is not worth pursuing. In that case you might save money by deciding to stop testing. Or you may save time by immediately going on to testing something else rather than waiting for results that very likely won’t change your decision.

Bayesian formulation

You can evaluate the effects of early stopping from either a Frequentist or Bayesian perspective. Here we take a Bayesian perspective because that perspective is simpler and easier to understand.

The performance of each of two options, A and B, is uncertain, and we represent our knowledge of the performance of the two options by random variables θ_A and θ_B.

We start with prior probability distributions on θ_A and θ_B and update these distributions according to Bayes’ rule as we accrue data.

In a simple A/B test, we stop when the probability

Prob(θ_A > θ_B)

is either sufficiently large or sufficiently small. But if we want to know whether either of the options is an improvement by an amount δ, we look at the probabilities

Prob(θ_A > θ_B + δ)

and

Prob(θ_B > θ_A + δ).

If one of these probabilities is sufficiently large we can stop and declare a winner. If both are sufficiently small, we can stop and say it’s unlikely there is a winner by a wide enough margin.

Help with testing

We can help you design experiments, including A/B tests, customized to your business needs. We assist clients with statistical matters, such as quantifying uncertainty, determining sample sizes, designing early stopping rules, etc. We also assist clients with the logistics of carrying out experiments, such as the biases and limitations of web analytics and complying with applicable privacy regulations.

LET’S TALK

Trusted consultants to some of the world’s leading companies