The purpose of Sweave is to make statistical analyses reproducible. However, technical difficulties can keep the Sweave document from being itself reproducible. Here are some things that could go wrong.
- Session contamination
- Missing R packages
- Missing LaTeX packages
- Missing or modified data
- Referenced files and paths
These notes are based on my experience using Sweave on Windows XP using R version 2.4.1 and MiKTeX version 2.5.
Session contamination
When you launch Sweave from an R session, Sweave inherits your session state. Any libraries that are loaded, any variables defined, etc. This makes it easy to mislead yourself into thinking that an Sweave file is self-contained, when in fact it implicitly depends on session state.
You can protect yourself by running Sweave in batch mode or starting a new session before running Sweave. Here’s a batch file to run Sweave from a Windows command line. Assuming R
and pdflatex
are in your path, you can save this code to a file, say sw.bat
, and process a file foo.Rnw
with the command sw foo
.
R.exe -e "Sweave('%1.Rnw')" pdflatex.exe %1.tex
This will protect you from writing Sweave files that appear to be self-contained but implicitly depend on session state.
However, it’s still possible that someone else receiving your Sweave file could start Sweave with a session state that interferes with your file. One precaution would be to include sessionInfo()
and ls()
R commands at the top of your file. That wouldn’t prevent contamination, but it would make it obvious.
A more aggressive approach would be to include something like the command rm(list=ls())
at the top of the file to clean out the environment. That would be effective, but might upset the person running Sweave. You could also put the rm
command at the bottom of your file to clean out the changes that Sweave makes to the environment that launched it.
Missing R packages
The Sweave package itself may be missing from R, although it is now a standard part of the R distribution. (At least as of version 2.4.1, possibly further back.)
If you’re missing a library named foo
, you will get an error message saying
Error in library(foo) : there is no package called 'foo'.
Installing a package depends on where it came from. Standard R packages can be installed from CRAN. These are trivial to install from the R user interface by clicking on the Packages menu.
BioConductor packages can also be installed from the Packages menu if you go to “Select Repositories” and add BioConductor to the list.
Also, BioConductor packages can be installed from the command line by typing
source('https://bioconductor.org/biocLite.R')
;biocLite('foo')
Do not install BioConductor packages by going to the BioConductor site and manually downloading files. For one, the search feature is buggy (as of February 2007). For another, the automatic installation options make sure the right version is downloaded along with its dependencies.
Other packages, such as OOMPA for example, are distributed as binary libraries. To install such libraries, first save the binary files to your local disk. Then from the Packages menu select “Install package(s) from local zip files…”
Then browse the location where the downloaded files were saved and control-click on each package you wish to install.
Missing LaTeX packages
MiKTeX version 2.5 does an excellent job of automatically installing LaTeX packages as needed. Installing packages with earlier versions was much more work.
Missing or modified data
One way to reduce the problem of missing data would be to have a version control system or at least a standard file system location for data sets. The Sweave file could extract the data from version control as its first step.
Modifications to data can be detected by a checksum. An Sweave document could assert the checksum of the data file before doing any further processing.
Referenced files and paths
Complex LaTeX files often reference external files, assuming a given location for these files. Absolute paths are not portable, unless everyone organizes their local hard disk the same way. Such standardization might work within an organization, with effort, but will not work if you want to share files with the world at large. With relative paths, you can zip up your main file and all its dependencies, and anyone who unzips the bundle will have the file structure they need.
Unfortunately, Sweave hard-codes the absolute path to its style files in its LaTeX output. The path to the style files depends on where R was installed on the local system. This means that while an Sweave document may be portable, the LaTeX file it produces is not. If you receive a LaTeX document produced by Sweave on someone else’s computer, you will need to edit the path inside the \usepackage
statement so that it points to the Sweave style file location on your computer. If you want to give your LaTeX file to someone who does not have R, you can delete the reference to Sweave and paste in the contents of the Sweave.sty
file.
If R is installed under “Program Files”, you will not be able to run LaTeX even on your own Sweave output without modification. Since the path to your R installation contains spaces, Sweave will insert a DOS-mangled path that LaTeX will choke on. Installing R in a location without spaces in the path, something like C:\bin\R
, avoids the problem.
Compiling to LaTeX
Once everything is set up, you can compile an Sweave file to LaTeX by calling Sweave from R with the program file as an argument. For example, Sweave("C:/temp/foo.Rnw")
. This will produce a LaTeX file foo.tex
, but the file may not appear where you expect. Rather than creating the file in the same directory as its source, R drops the file in R’s current working directory.