My interest in the Anil Potti scandal started when my former colleagues could not reproduce the analysis in one of Potti’s papers. (Actually, they did reproduce the analysis, at great effort, in the sense of forensically determining the erroneous steps that were carried out.) Two years ago, the story was on 60 Minutes. The straw that broke the camel’s back was not bad science but résumé padding.
It looks like the story is a matter of fraud rather than sloppiness. This is unfortunate because sloppiness is much more pervasive than fraud, and this could have made a great case study of bad analysis. However, one could look at it as a case study in how good analysis (by the folks at MD Anderson) can uncover fraud.
Now there’s a new development in the Potti saga. The latest issue of The Cancer Letter contains letters by whistle-blower Bradford Perez who warned officials at Duke about problems with Potti’s research.
What lead folks to dig deeply into Potti’s papers? Did they give the results a quick Benford’s Law (http://www.johndcook.com/blog/?s=benford) test or were they just suspicious for some reason?
I wonder if Dr. Anil Potti still has that job at the Cancer Center of North Dakota.
The real problem here is that the Duke administration, and the University as a whole, will suffer absolutely no adverse consequences as a result of their cover-up. Which, in turn, will help to guarantee that University administrations continue to act in exactly the same way in future cases.
@lens: Doctors approached the biostat department because Potti’s results looked either exciting or too good to be true. They asked my former colleagues to reproduce the results. They did not suspect fraud, only sloppy analysis.
About 25 years ago I was a programmer working with physicists to develop a new sensor that was on the ragged edge of viability. My job was to take the experimental lab work and evolve it to create production prototypes. I had to take the sensor data and push back on the EEs to create better sensor circuitry, and push back on the physicists to create better analysis algorithms in their lab.
Everything came down to my firmware generating “good enough readings, fast enough to be useful”. This was before floating-point hardware was available in low-power embedded systems, so every floating point operation had a huge cost, meaning as much work as possible was pushed into the integer domain.
Given that we were on the bleeding edge of sensor physics, the processing was inherently statistical, since each individual reading had, at best, only a passing resemblance to reality. So I built a network of prototype sensors (of various hardware revisions) and ran them to a workstation to do some statistical crunching to see how well each firmware change performed. A large part of the analysis was comparing the effect of different firmware revisions against different hardware units (each had different behaviors) and their test locations (different environments).
The statistical analysis was soon beyond the engineering stats I learned while getting my Bachelor’s degree. While discussing this with a colleague, he mentioned that it looked like I was treating my sensors like participants in a clinical trial, where each had its own “initial symptoms” (hardware characteristics), the “treatment” (firmware) was being tested for efficacy, and I was trying to understand what the “responses” meant within the environment of each sensor.
This led me to examine the statistical methods used in small clinical trials. I had to refer back to my studies in the Design of Experiments to tease apart the relationship of the chosen statistical analysis technique to the organization of the experiment and the acquisition of data. The greatest gift to my education in this area was the trove of comprehensive survey papers, especially when they addressed outliers and ways to minimize reproducibility difficulties.
Anyhow, once my own experiments were proceeding with confidence, I decided to check my upgraded stats skills against recent clinical studies that hadn’t yet received the “survey treatment”. I chose five papers that had caused a stir in the popular press that also had full published full data sets.
From the start, I figured I had learned something wrong, because four out of the five papers contained what seemed to me to be completely unsupportable data reduction techniques combined with inappropriate statistical analysis techniques. I lacked the time to pursue my concerns with the academic medical community (without wasting their time), so I set it aside.
What I was left with were two maxims that have stayed with me to this day:
Don’t believe any clinical paper until it has:
1. Been reproduced.
2. Been included in at least one survey paper.
Which means my heroes aren’t the original researchers, but those working to reproduce results and write survey papers.
The researchers I respect most are those who aggressively push to get their results tested by others. I did find two remarkable papers where the authors publicly lamented that the current work was based on a prior paper who’s results hadn’t yet been validated by the community.
Only researchers with a solid record of validation through reproduction and surveys can also become my heroes.
Another take-away: Survey papers are awesome. But it is important to learn how to read them efficiently, because they are often huge, contain a bazillion references, and the best nuggets of true wisdom can sometimes be buried in the footnotes. So it is vital to know when to “surf” and when to “dig”.
And if you want to sharpen your citation-hunting skills, see which papers are most cited by a collection of survey papers in a field (a meta-survey). Some papers gain a reputation for clarity of exposition and quality of analysis that is independent of their research content (including Isaac Asimov’s
, which had fictional content). I believe such papers are truly among the most important in a field, from which all can learn.Whilst I understand the frustration of calling out bad science but only seeing results when someone lies on their CV, the root of the problem the perceived complexity and opacity of research to the managers of scientific institutions.
I think the reason the resume tipped the balance is that it is a piece of information a manager can seize upon and understand to be definitively true or false. The implication, then, is that university management feels like scientists speak only so much gobbledegook. Listening to arguing politicians, in the same way, it can be hard to know what is real when everyone is speaking very confidently about something you don’t understand, yet adamantly disagreeing.