The term overfitting usually describes fitting too complex a model to available data. But it is possible to overfit a model before there are any data.
An experimental design, such as a clinical trial, proposes some model to describe the data that will be collected. For simple, well-known models the behavior of the design may be known analytically. For more complex or novel methods, the behavior is evaluated via simulation.
If an experimental design makes strong assumptions about data, and is then simulated with scenarios that follow those assumptions, the design should work well. So designs must be evaluated using scenarios that do not exactly follow the model assumptions. Here lies a dilemma: how far should scenarios deviate from model assumptions? If they do not deviate at all, you don’t have a fair evaluation. But deviating too far is unreasonable as well: no method can be expected to work well when it’s assumptions are flagrantly violated.
With complex designs, it may not be clear to what extent scenarios deviate from modeling assumptions. The method may be robust to some kinds of deviations but not to others. Simulation scenarios for complex designs are samples from a high dimensional space, and it is impossible to adequately explore a high dimensional space with a small number of points. Even if these scenarios were chosen at random—which would be an improvement over manually selecting scenarios that present a method in the best light—how do you specify a probability distribution on the scenarios? You’re back to a variation on the previous problem.
Once you have the data in hand, you can try a complex model and see how well it fits. But with experimental design, the model is determined before there are any data, and thus there is no possibility of rejecting the model for being a poor fit. You might decide after its too late, after the data have been collected, that the model was a poor fit. However, retrospective model criticism is complicated for adaptive experimental designs because the model influenced which data were collected.
This is especially a problem for one-of-a-kind experimental designs. When evaluating experimental designs — not the data in the experiment but the experimental design itself—each experiment is one data point. With only one data point, it’s hard to criticize a design. This means we must rely on simulation, where it is possible to obtain many data points. However, this brings us back to the arbitrary choice of simulation scenarios. In this case there are no empirical data to test the model assumptions.