by John on February 9, 2012
by John on September 12, 2011
The Economist posted an article online this weekend about the scandal over irreproducible cancer research by Anil Potti. My colleagues Keith Baggerly and Kevin Coombes have been crying foul about this since 2007. I first blogged about it in January 2008.
The story started getting wide-spread attention last summer when the Cancer Letter reported that Dr. Potti had lied on grant applications. Since then there have been articles in the popular press, and people are staring to file lawsuits.
Apparently the tipping point in the story was finding a fib on Potti’s resume. According to The Economist
He falsely claimed to have been a Rhodes Scholar in Australia (a curious claim in any case, since Rhodes scholars only attend Oxford University).
So what finally got people to pay attention was not accusations of incompetent or fraudulent science, but résumé padding. As Keith Baggerly commented,
I find it ironic that we have been yelling for three years about the science, which has the potential to be very damaging to patients, but that was not what has started things rolling.
Related posts:
Popular research areas produce more false results
Using Photoshop on research results
Highlights from Reproducible Ideas
by John on January 17, 2011
A recent article in The New Yorker gives numerous examples of scientific results fading over time. Effects that were large when first measured become smaller in subsequent studies. Firmly established facts become doubtful. It’s as if scientific laws are being gradually repealed. This phenomena is known as “the decline effect.” The full title of the article is The decline effect and the scientific method.
The article brings together many topics that have been discussed here: regression to the mean, publication bias, scientific fashion, etc. Here’s a little sample.
“… when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … After a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
This excerpt happens to be talking about “fluctuating asymmetry,” the idea that animals prefer more symmetric mates because symmetry is a proxy for good genes. (I edited out references to fluctuating asymmetry from the quote to emphasize that the remarks could equally apply to any number of topics. ) Fluctuating asymmetry was initially confirmed by numerous studies, but then the tide shifted and more studies failed to find the effect.
When such a shift happens, it would be reassuring to believe that the initial studies were simply wrong and that the new studies are right. But both the positive and negative results confirmed the prevailing view at the time they were published. There’s no reason to believe the latter studies are necessarily more reliable.
Related posts:
Why microarray study conclusions are so often wrong
Popular research areas produce more false results
Five criticisms of significance testing
When I was in college, a friend of mine told me he liked to take his code out for a walk every now and then. By that he meant recompiling and running all of his programs, say once a week. I asked him why he would want to do that. If a program compiled and ran the last time you touched it, why shouldn’t it compile and run now? He simply said I might be surprised.
Even when your source code isn’t changing, the environment around it is changing. That’s why your code can break without anyone touching it. Peter Deutsch made this observation in the context of networks in his Eight Fallacies of Distributed Computing.
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
Kevin Kelly made the same observation in the context of data storage. Because data formats change and physical media decay, you’ve got to keep copying your data to save it. He coined the term movage to describe the active process of preserving data.
The only way to archive digital information is to keep it moving. I call this movage instead of storage. Proper movage means transferring the material to current platforms on a regular basis … anything you want moved to the future has to be given attention to keep it moving forward.
This morning I had problems running LaTeX (with Beamer) on an old presentation and that made me think a post I wrote for the Reproducible Ideas blog that I’ve since shut down. In the spirit of Kevin Kelly’s movage, I’ve kept the ideas in my old post alive by updating them here.
(The original post is archived on the Reproducible Research Ideas blog. When I gave the ReproducibleResearch.org domain to the folks running ReproducibleResearch.net they loaded my old posts into their new blog.)
Related links
Maintenance costs
Preserving (the memory of) documents
Highlights from Reproducible Ideas
Reproducible Research editorial by Sergey Fomel and Jon Claerbout
by John on February 19, 2010
The more active a research area is, the less reliable its results are.
John Ioannidis suggested popular areas of research publish a greater proportion of false results in his paper Why most published research findings are false. Of course popular areas produce more results, and so they will naturally produce more false results. But Ioannidis is saying that they also produce a greater proportion of false results.
Now Thomas Pfeiffer and Robert Hoffmann have produced empirical support for Ioannidis’s theory in the paper Large-Scale Assessment of the Effect of Popularity on the Reliability of Research. Pfeiffer and Hoffmann review two reasons why popular areas have more false results.
First, in highly competitive fields there might be stronger incentives to ‘‘manufacture’’ positive results by, for example, modifying data or statistical tests until formal statistical significance is obtained. This leads to inflated error rates for individual findings: actual error probabilities are larger than those given in the publications. … The second effect results from multiple independent testing of the same hypotheses by competing research groups. The more often a hypothesis is tested, the more likely a positive result is obtained and published even if the hypothesis is false.
In other words,
- In a popular area there’s more temptation to fiddle with the data or analysis until you get what you expect.
- The more people who test an idea, the more likely someone is going to find data in support of it by chance.
The authors produce evidence of the two effects above in the context of papers written about protein interactions in yeast. They conclude that “The second effect is about 10 times larger than the first one.”
Related posts:
Why microarray conclusions are so often wrong
Using Photoshop on experimental results
Irreproducible analysis
Make up your own rules of probability
by John on January 10, 2010
Last week .NET Rocks mentioned a good idea in passing: start a screencast tool like Camtasia before you do a software install. Michael Learned, told the story of a client that asked him to take screen shots of every step in the installation of Microsoft’s Team Foundation Server. Carl Franklin commented “What a great idea to throw Camtasia on there and record the whole process.”
It would be better if the installation process were scripted and not just recorded, but sometimes that’s not practical. Sometimes clicking a few buttons is absolutely necessary or at least far easier than writing a script. And even if you think your entire process is automated with a script, a screencast might be a good idea. It could record little steps you have to do in order to run your script, details that are easily forgotten.
Another way to use this idea would be to have one person do a practice install on a test server while recording the process. Then another person could document and script the process by studying the video. This would be helpful when the person who knows how to do the installation lacks either the verbal skills to explain the process or the scripting skills to automate it.
Related posts:
Rotating programmers
Automated software builds
Programming the last mile
by John on January 6, 2009
by John on October 1, 2008
I just posted an article on my other blog, Reproducible Ideas, called Musical chairs and reproducibility drills. The post is about rotating programmers, in classes and in professional software development. The post ends with some thoughts on having a build master and rotating that position.
by John on August 29, 2008
Yesterday I added a blog to the ReproducibleResearch.org web site. You can visit the site here or subscribe via RSS.
I’d like a couple people to join me in writing this blog, and I would greatly appreciate suggestions, guest posts, etc. If you’re interested, please send a note to contribute at the domain name.
Greg Wilson pointed out an article in The Chronicle of Higher Education about scientists using Photoshop to manipulate the graphs of their results. The article has this to say about The Journal of Cell Biology.
So far the journal’s editors have identified 250 papers with questionable figures. Out of those, 25 were rejected because the editors determined the alterations affected the data’s interpretation.
This immediately raises suspicions of fraud which is, of course. However, I’m more concerned about carelessness than fraud. As Goethe once said,
…misunderstandings and neglect create more confusion in this world than trickery and malice. At any rate, the last two are certainly much less frequent.
Even if researchers had innocent motivations for manipulating their graphs, they’ve made it impossible for someone else to reproduce their results and have cast doubts on their integrity.
I started a new web site this week, http://www.reproducibleresearch.org, to promote reproducible research.
I’d like to see this become a community site. Depending on how much interest the site stirs up, I may add a blog, a Wiki, etc. For now, if you’d like to contribute, send me articles or links and I’ll add them to the site. You can send email to “contribute” at the domain name.
Greg Wilson gave a great interview on the IT Conversations podcast recently. He says the emphasis on HPC draws time and energy away from quality concerns, and may not even help scientists get their results faster. While some problems definitely require HPC, most could be solved faster by developing software in the simpler environment of a single PC and waiting longer for it to run.
I’ve written here about reproducibility problems in statistics and in general software development. Apparently there are similar problems in every area of scientific computing. For example, Wilson quotes a survey of computational economics articles that found that 70% of the results could not be reproduced a year after publication. I doubt that computational economics is worse than other fields.
Wilson says he wants to make raise the reproducibility expectations in computational research closer to those common in physical research. I admire his efforts, but it’s a sad commentary that reproducibility standards are lower in computational science than in physical science.