Works well versus well understood

While I was looking up the Tukey quote in my earlier post, I ran another of his quotes:

The test of a good procedure is how well it works, not how well it is understood.

At some level, it’s hard to argue against this. Statistical procedures operate on empirical data, so it makes sense that the procedures themselves be evaluated empirically.

But I question whether we really know that a statistical procedure works well if it isn’t well understood. Specifically, I’m skeptical of complex statistical methods whose only credentials are a handful of simulations. “We don’t have any theoretical results, buy hey, it works well in practice. Just look at the simulations.”

Every method works well on the scenarios its author publishes, almost by definition. If the method didn’t handle a scenario well, the author would publish a different scenario. Even if the author didn’t select the most flattering scenarios, he or she may simply not have considered unflattering scenarios. The latter is particularly understandable, almost inevitable.

Simulation results would have more credibility if an adversary rather than an advocate chose the scenarios. Even so, an adversary and an advocate may share the same blind spots and not explore certain situations. Unless there’s a way to argue that a set of scenarios adequately samples the space of possible inputs, it’s hard to have a great deal of confidence in a method based on simulation results alone.

Related posts

10 thoughts on “Works well versus well understood

  1. Rather than all the trouble of getting an adversary to pick scenarios, wouldn’t it be more straightforward to insist that authors use simulations to show the weaknesses of their method, as well as their strengths?

    Following engineering practice, we should test methods until they break, not just test them in benign settings

  2. Yes, of course. But I’ve also heard of editors and peer reviewers, who get to insist that papers are revised, before publication. Forcing authors to show where their method breaks seems a natural part of peer review.

  3. Fred, that does seem natural, but what do you do when the author says he or she can’t find any situations in which their method breaks? Or if they say they found only one situation, which is so hopelessly pathological that it would never be observed in practice? Do you just take them at their word, knowing that even if honest they may have simply overlooked some important weakness?

    It kind of reminds me of the classic advertising gambit where a company claims that no other toothpaste was demonstrated to be superior … in their own underpowered study.

    I’d like to be the editor of a mathematical journal who gets to insist that before a proof is published, at least three counterexamples are listed where it is shown to be false! “Yes of course you’ve proved it to be true in all cases, but that isn’t good enough. We need full disclosure of all the cases where it is false before we can publish it, and you keep refusing to tell us even one!”

  4. John, I agree it wouldn’t be foolproof. But competent reviewers and editors should be able to come up with something, even if the authors can’t – and should be able to point out overlooked weaknesses.

    Indeed, many good reviewers already do this; I just want to make it journal policy that they are right to do so, and that authors should expect probing questions to come up in reviews.

    And I’m not sure I understand what sort of math journal you want to edit!

  5. Fred, I’d expect the reviewers to at least ask about potential weak spots, if not outright require them to be addressed, before publication. I just think that requiring authors to do this to themselves is unenforceable and would only keep the competant and honest authors honest, and those folks are probably already doing this themselves anyway. It would be kind of like asking people going through customs to declare anything they have which is forbidden to import. If they don’t know what is forbidden or not, they won’t declare anything, and if they know what’s forbidden but are smuggling contraband anyway they’ll lie. The people who both know what is forbidden and honestly say they have nothing would have made sure of that before they ever attempted to get in the country, whether or not they are asked.

    The only way it might make a difference is if falsely declaring that all known wesknesses have been disclosed would entail a more serious punishment than the act itself. That may cause the dishonest to reconsider, and might motivate the ignorant to search harder. But if the result would be the same whether or not a person were to lie about it, then I don’t think it would help a bit.

    One key issue in this that hasn’t been mentioned is that journals are competative for good publications and are generally in pretty poor financial shape. I attended a talk by a longtime senior journal editor and academic about the issue or reproducible computing and full disclosure of statistical methods. The issue was how to address the problem when the author is either dishonest or incompetant in their statistical methods. The results they report in that case may bear no relation to the results obtained through the method they describe. A related problem is then the methods are too poorly described to be able to attempt to verify that the reported results are accurate. These are very real problems, as those who have followed the Anil Potti scandal know already. A good remedy for both of these situations, championed by Keith Baggerly and Kevin Coombes, is to get the journal editors to require that authors provide enough information and data to enable someone else to reproduce their calculations. So this senior editor pointed out that unless all journals decide to enforce this simultaneously, the journals which require it would make it significantly more difficult for even competant and honest authors to publish, giving the authors incentive to publish in a journal which does not require it. This could be a ruinous problem for journals which are already struggling financially.

    So I see the requirement of disclosing and addressing weknesses as equally problematic, but much more difficult to assess whether the standard has been met or not. An author can provide, for example, R source which can easily be run to determine if the reported results are obtained. If it is difficult or impossible to run the source, or if the data have not been disclosed, then clearly the standard has not been met. But how do you tell if an author has disclosed all the weaknesses in their method, or publication? Even if they disclose and discuss a few, how can you tell if they are ignoring a big one, or are simply not competant enough to even be aware of a big probolem?

    So on the one hand, attempting to bring about such a requirement for publication will be very difficult, for the same reason it was difficult to get all clinical trials to be registered, and would be difficult to require reproducible statistical methods. But in addition to this, it is much harder to verify whether the standard has been met. Philosophers would argue it could never in fact be verified.

    Regarding the math journal — I was simply imagining how entertaining it would be if some blockheaded bureaucrat were put in charge of requiring that all weaknesses be disclosed in a situation where no such weaknesses could possibly exist. I imagined the bureaucrat attempting to enforce the policy by requiring at least three weakensses be disclosed, and stubbornly refusing to accept a paper for publication until they had that list, whether or not it was even logically possible to find any weaknesses.

    I have a strange sense of humor though. I find it equally entertaining to see HR types requiring, for example, at least five years experience using a technology which is only two years old (true story, and I saw the job listing). Or a title deed agency requiring ownerhip paperwork on a parcel of land going back three hundred years, when three hundred years ago the land, if it had legal owners, was owned by people who didn’t use paperwork. (Possibly an urban legend, but humorous nonetheless, as the seller provided docmentation about the transfer of the land between European kings before it became part of the USA).

    So I thought it would be hilarious to be the drone mindlessly applying policy in clearly inappropriate ways, requiring Pythagoras to provide at least three right triangles for which his theorem doesn’t work, for example.

  6. … one of these things where a lot of people feel
    “If only the rest of the world was educated enough to understand
    what this is about, they’d be better off.”
    And I actually kind of agree with that.
    The problem is that most of the world could actually care less.
    — James Gosling

  7. I think the three important roles for a statistician in an organization are: (1) To be a repository of knowledge and methods, which others can consult. (2) To be somewhat separated from the business case, the management, in the same way that good test organizations are, so thinking and assumptions can be challenged, and being a somewhat dull kill-joy. (3) To be an educator, so, in time, others with which they work imbibe and adopt these same ideas of critical thinking, and challenging assumptions, despite how attractive they might be to hold.

Comments are closed.