Irreproducible research on 60 Minutes

If your research cannot be reproduced, you might end up on 60 Minutes. Two days ago the new show ran a story about irreproducible research at Duke. You can find the video clip here.

I believe the 60 Minutes piece was somewhat misleading. It focused on data manipulation and implied that the controversial results followed from the manipulated data. As Keith Baggerly explains here, that is not the case. The conclusions do not follow from the (erroneous) data. The analysis itself was irreproducible. That discovery started the whole saga.

Update: Here’s some footage that 60 Minutes recorded but did not include on Sunday. “The systems we have in academia, especially with something this complicated, shield sloppy science and fraud.”

Related posts:

Bad science is tolerable, résumé padding is not

Using Photoshop on experimental results

Running Python and R inside Emacs

Emacs org-mode lets you manage blocks of source code inside a text file. You can execute these blocks and have the output display in your text file. Or you could export the file, say to HTML or PDF, and show the code and/or the results of executing the code.

Here I’ll show some of the most basic possibilities. For much more information, see  orgmode.org. And for the use of org-mode in research, see A Multi-Language Computing Environment for Literate Programming and Reproducible Research.

Continue reading

Bad science is tolerable, résumé padding is not

The Economist posted an article online this weekend about the scandal over irreproducible cancer research by Anil Potti. My colleagues Keith Baggerly and Kevin Coombes have been crying foul about this since 2007. I first blogged about it in January 2008.

The story started getting wide-spread attention last summer when the Cancer Letter reported that Dr. Potti had lied on grant applications. Since then there have been articles in the popular press, and people are staring to file lawsuits.

Apparently the tipping point in the story was finding a fib on Potti’s resume. According to The Economist

He falsely claimed to have been a Rhodes Scholar in Australia (a curious claim in any case, since Rhodes scholars only attend Oxford University).

So what finally got people to pay attention was not accusations of incompetent or fraudulent science, but résumé padding. As Keith Baggerly commented,

I find it ironic that we have been yelling for three years about the science, which has the potential to be very damaging to patients, but that was not what has started things rolling.

Related posts:

Popular research areas produce more false results
Using Photoshop on research results
Highlights from Reproducible Ideas

Scientific results fading over time

A recent article in The New Yorker gives numerous examples of scientific results fading over time. Effects that were large when first measured become smaller in subsequent studies. Firmly established facts become doubtful. It’s as if scientific laws are being gradually repealed. This phenomena is known as “the decline effect.” The full title of the article is The decline effect and the scientific method.

The article brings together many topics that have been discussed here: regression to the mean, publication bias, scientific fashion, etc. Here’s a little sample.

“… when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … After a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.

This excerpt happens to be talking about “fluctuating asymmetry,” the idea that animals prefer more symmetric mates because symmetry is a proxy for good genes. (I edited out references to fluctuating asymmetry from the quote to emphasize that the remarks could equally apply to any number of topics. ) Fluctuating asymmetry was initially confirmed by numerous studies, but then the tide shifted and more studies failed to find the effect.

When such a shift happens, it would be reassuring to believe that the initial studies were simply wrong and that the new studies are right. But both the positive and negative results confirmed the prevailing view at the time they were published. There’s no reason to believe the latter studies are necessarily more reliable.

Related posts:

Why microarray study conclusions are so often wrong
Popular research areas produce more false results
Five criticisms of significance testing

Taking your code for a walk

When I was in college, a friend of mine told me he liked to take his code out for a walk every now and then. By that he meant recompiling and running all of his programs, say once a week. I asked him why he would want to do that. If a program compiled and ran the last time you touched it, why shouldn’t it compile and run now? He simply said I might be surprised.

Even when your source code isn’t changing, the environment around it is changing. That’s why your code can break without anyone touching it. Peter Deutsch made this observation in the context of networks in his Eight Fallacies of Distributed Computing.

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn’t change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

Kevin Kelly made the same observation in the context of data storage. Because data formats change and physical media decay, you’ve got to keep copying your data to save it. He coined the term movage to describe the active process of preserving data.

The only way to archive digital information is to keep it moving. I call this movage instead of storage. Proper movage means transferring the material to current platforms on a regular basis … anything you want moved to the future has to be given attention to keep it moving forward.

This morning I had problems running LaTeX (with Beamer) on an old presentation and that made me think a post I wrote for the for Reproducible Ideas blog that I’ve since shut down. In the spirit of Kevin Kelly’s movage, I’ve kept the ideas in my old post alive by updating them here.

Related links

Maintenance costs
Preserving (the memory of) documents
Why does software need to be maintained?

Popular research areas produce more false results

The more active a research area is, the less reliable its results are.

John Ioannidis suggested popular areas of research publish a greater proportion of false results in his paper Why most published research findings are false. Of course popular areas produce more results, and so they will naturally produce more false results. But Ioannidis is saying that they also produce a greater proportion of false results.

Now Thomas Pfeiffer and Robert Hoffmann have produced empirical support for Ioannidis’s theory in the paper Large-Scale Assessment of the Effect of Popularity on the Reliability of Research. Pfeiffer and Hoffmann review two reasons why popular areas have more false results.

First, in highly competitive fields there might be stronger incentives to ‘‘manufacture’’ positive results by, for example, modifying data or statistical tests until formal statistical significance is obtained. This leads to inflated error rates for individual findings: actual error probabilities are larger than those given in the publications. … The second effect results from multiple independent testing of the same hypotheses by competing research groups. The more often a hypothesis is tested, the more likely a positive result is obtained and published even if the hypothesis is false.

In other words,

  1. In a popular area there’s more temptation to fiddle with the data or analysis until you get what you expect.
  2. The more people who test an idea, the more likely someone is going to find data in support of it by chance.

The authors produce evidence of the two effects above in the context of papers written about protein interactions in yeast. They conclude that “The second effect is about 10 times larger than the first one.”

Related posts:

Why microarray conclusions are so often wrong
Using Photoshop on experimental results
Irreproducible analysis
Make up your own rules of probability

Camtasia as a software deployment tool

Last week .NET Rocks mentioned a good idea in passing: start a screencast tool like Camtasia before you do a software install. Michael Learned, told the story of a client that asked him to take screen shots of every step in the installation of Microsoft’s Team Foundation Server. Carl Franklin commented “What a great idea to throw Camtasia on there and record the whole process.”

It would be better if the installation process were scripted and not just recorded, but sometimes that’s not practical. Sometimes clicking a few buttons is absolutely necessary or at least far easier than writing a script. And even if you think your entire process is automated with a script, a screencast might be a good idea. It could record little steps you have to do in order to run your script, details that are easily forgotten.

Another way to use this idea would be to have one person do a practice install on a test server while recording the process. Then another person could document and script the process by studying the video. This would be helpful when the person who knows how to do the installation lacks either the verbal skills to explain the process or the scripting skills to automate it.

Related posts:

Rotating programmers
Automated software builds
Programming the last mile

Highlights from Reproducible Ideas

Here are some of my favorite posts from the Reproducible Ideas blog.

Three reasons to distrust microarray results
Provenance in art and science
Forensic bioinformatics (continued)
Preserving (the memory of) documents
Programming is understanding
Musical chairs and reproducibility drills
Taking your code out for a walk

The most popular and most controversial was the first in the list, reasons to distrust microarray results.

The emphasis shifts from science to software development as you go down the list, though science and software are intertwined throughout the posts.

Blogging about reproducible research

I’m in the process of folding ReproducibleResearch.org into the new ReproducibleResearch.net site. I will be giving the .org domain name to the folks now running the .net site. (See the announcement for a little more information.)

As part of this process, I’m winding down the blog that I started last July as part of the ReproducibleResearch.org site. I plan to keep the links to my old posts valid, but I do not know whether the new site will have a new blog. I wrote about reproducible research on this blog before starting the ReproducibleResearch.org site, and I will go back to writing about reproducible research here. (See reproducibility in the tag cloud.)

I wanted to point out an article by Steve Eddins posted this morning: Reproducible research in signal processing. His article comments on the article by Patrick Vandewalle, Jelena Kovačević, and Martin Vetterli announced recently on ReproducibleResearch.org.

Readers interested in reproducible research may also want to take a look at the Science in the open blog.

Related posts:

Irreproducible analysis
Using Photoshop on experimental results
Highlights from Reproducible Ideas

Two posts on reproducibility: art and oceanography

On my other blog, Reproducible Ideas, I wrote two short posts about this morning about reproducibility.

The first post is a pointer to an interview with Roger Barga about Trident, a workflow system for reproducible oceanographic research using Microsoft’s workflow framework.

The second post highlights a paragraph from the interview explaining the idea of provenance in art and scientific research.

Using Photoshop on experimental results

Greg Wilson pointed out an article in The Chronicle of Higher Education about scientists using Photoshop to manipulate the graphs of their results. The article has this to say about The Journal of Cell Biology.

So far the journal’s editors have identified 250 papers with questionable figures. Out of those, 25 were rejected because the editors determined the alterations affected the data’s interpretation.

This immediately raises suspicions of fraud which is, of course. However, I’m more concerned about carelessness than fraud. As Goethe once said,

…misunderstandings and neglect create more confusion in this world than trickery and malice. At any rate, the last two are certainly much less frequent. 

Even if researchers had innocent motivations for manipulating their graphs, they’ve made it impossible for someone else to reproduce their results and have cast doubts on their integrity.

ReproducibleResearch.org

I started a new web site this week, http://www.reproducibleresearch.org, to promote reproducible research.

I’d like to see this become a community site. Depending on how much interest the site stirs up, I may add a blog, a Wiki, etc. For now, if you’d like to contribute, send me articles or links and I’ll add them to the site. You can send email to “contribute” at the domain name.