One problem with A/B testing is that your results may depend on the order of your tests.
Suppose you’re testing three options: X, Y, and Z. Let’s say you have three market segments, equal in size, each with the following preferences.
Segment 1: X > Y > Z.
Segment 2: Y > Z > X.
Segment 3: Z > X > Y.
Now suppose you test X against Y in an A/B test, then test the winner against Z. Segments 1 and 3 prefer X to Y, so X wins the first round of testing. Now you compare X to Z. Segments 2 and 3 prefer Z to X, so Z wins round 2 and is the overall winner.
Now let’s run the tests again in a different order. First we test Y against Z. Segments 1 and 2 will go for Y. Then in the next round, Y against X, segments 1 and 3 prefer X, so X is the overall winner. So one way of running the tests results in Z winning, and another way results in X winning.
Can we arrange our tests so that Y wins? Yes, by testing X against Z first. Z wins the first round, and Y wins in the second round.
The root of the problem is that group preferences are not transitive. We say that preferences are transitive if when someone prefers a to b, and they prefer b to c, then they prefer a to c. We implicitly assumed that each segment has transitive preferences. For example, when we said that the first segment’s preferences are X > Y > Z, we meant that they would rank X > Y, Y > Z, and X > Z.
Individuals (generally) have transitive preferences, but groups may not. In the example above, the market at a whole prefers X to Y, prefers Y to Z, but prefers Z to X. The segments have transitive preference but the market does not. This is known as the Condorcet voting paradox.
This is not purely hypothetical. Our example is simplified, but it reflects a phenomenon that does happen in practice. It has been observed in voting. Constituencies in a legislature may have transitive preferences while the legislature as a whole does not. This opens the possibility of manipulating the final outcome by controlling the order in which items are voted on. In the example above, someone who knows the preferences of the groups could make any of the three outcomes the winner by picking the order of A/B comparisons.
Political scientists have looked back at congressional voting records and found instances of this happening, and can roughly determine when someone first discovered the technique of rigging sequential votes. They can also roughly point to when legislators became aware of the manipulation and learned that they sometimes need to vote against their actual preferences in one vote in order to get a better outcome at the end of the sequence of votes. (I think this was around 1940, but my memory could be wrong.) Political scientists call this sophisticated voting, as opposed to naive voting in which one always votes according to honest preferences.
The voting example is relevant to market research because it shows that intransitive group preferences really happen. But unlike in voting, customers respond honestly to A/B tests. They don’t even know that they’re part of an A/B test.
In the example above, we come away from our test believing that we have a clear winner. In both rounds of testing, the winner gets twice as many responses as the loser. The large margin in each test is misleading.
Any of the three options could be the winner, depending on the order of testing, but none of the options is any better than the others. So in the example we don’t so much make a bad choice, but we have too much confidence in our choice.
But now suppose the groups are not all the same size. Suppose the three segments represent 45%, 35%, and 20% of the market respectively. We can still have any option be the final winner, depending on the order of testing. But now some rests are better than others. If we tested all three options at once in an A/B/C test, we’d learn that a plurality of the market prefers X, and we’d learn that there is no option that the market as a whole prefers.