Reproducible Analysis

Imagine someone asking you one of the following requests. If any of these give you a sinking feeling that this is going to be more difficult than it should be, you see the need for reproducible analysis.

  1. I just read this interesting paper. Can you perform the same analysis on my data?
  2. Remember that microarray analysis you did six months ago? We ran a few more arrays. Can you add them to the project and repeat the same analysis?
  3. The statistical analyst who looked at the data I generated previously is no longer available. Can you get someone else to analyze my new data set using the same methods (and thus producing a report I can expect to understand)?
  4. Please write/edit the methods sections for the abstract/paper/grant proposal I am submitting based on the analysis you did several months ago.

The scenarios above come from the presentation Sweave: First Steps Toward Reproducible Analyses by Kevin Coombes. This presentation motivates the need for reproducible analysis and gives an introduction to Sweave, a tool designed to make it easier to reproduce statistical analyses. The talk itself was created from an Sweave document. The source is available at the same link as the talk.

Example of irreproducible analysis

“Microarrays: retracing steps” by Kevin Coombes, Jing Wang, and Keith Baggerly in Nature Medicine, November 2007, pp 1276-1277. The authors report their experience trying to reconstruct the analysis that went into a previous article in the same journal. They recount some of the errors they believe must have occurred and explain why the conclusions are unsupported.

Articles and other resources

See for a list of articles and other resources related to reproducible research.