In a dose-finding clinical trial, you have a small number of doses to test, and you hope find the one with the best response. Here “best” may mean most effective, least toxic, closest to a target toxicity, some combination of criteria, etc.

Since your goal is to find the best dose, it seems natural to compare dose-finding methods by how often they find the best dose. This is what is most often done in the clinical trial literature. But this seemingly natural criterion is actually artificial.

Suppose a trial is testing doses of 100, 200, 300, and 400 milligrams of some new drug. Suppose further that on some scale of goodness, these doses rank 0.1, 0.2, 0.5, and 0.51. (Of course these goodness scores are unknown; the point of the trial is to estimate them. But you might make up some values for simulation, pretending with half your brain that these are the true values and pretending with the other half that you don’t know what they are.)

Now suppose you’re evaluating two clinical trial designs, running simulations to see how each performs. The first design picks the 400 mg dose, the best dose, 20% of the time and picks the 300 mg dose, the second best dose, 50% of the time. The second design picks each dose with equal probability. The latter design picks the *best* dose more often, but it picks a *good* dose less often.

In this scenario, the two largest doses are essentially equally good; it hardly matters how often a method distinguishes between them. The first method picks one of the two good doses 70% of the time while the second method picks one of the two good doses only 50% of the time.

This example was exaggerated to make a point: obviously it doesn’t matter how often a method can pick the better of two very similar doses, not when it very often picks a bad dose. But there are less obvious situations that are quantitatively different but qualitatively the same.

The goal is actually to find a good dose. Finding the absolute best dose is impossible. The most you could hope for is that a method finds with high probability the best of the four *arbitrarily chosen doses* under consideration. Maybe the *best* dose is 350 mg, 843 mg, or some other dose not under consideration.

A simple way to make evaluating dose-finding methods less arbitrary would be to estimate the benefit to patients. Finding the best dose is only a matter of curiosity in itself unless you consider how that information is used. Knowing the best dose is important because you want to treat future patients as effectively as you can. (And patients in the trial itself as well, if it is an adaptive trial.)

Suppose the measure of goodness in the scenario above is probability of successful treatment and that 1,000 patients will be treated at the dose level picked by the trial. Under the first design, there’s a 20% chance that 51% of the future patients will be treated successfully, and a 50% chance that 50% will be. The expected number of successful treatments from the two best doses is 352. Under the second design, the corresponding number is 252.5.

(To simplify the example above, I didn’t say how often the first design picks each of the two lowest doses. But the first design will result in at least 382 expected successes and the second design 327.5.)

You never know how many future patients will be treated according to the outcome of a clinical trial, but there must be some implicit estimate. If this estimate is zero, the trial is not worth conducting. In the example given here, the estimate of 1,000 future patients is irrelevant: the future patient horizon cancels out in a comparison of the two methods. The patient horizon matters when you want to include the benefit to patients in the trial itself. The patient horizon serves as a way to weigh the interests of current versus future patients, an ethically difficult comparison usually left implicit.