Bundled versus unbundled version history

The other day I said to a colleague that an advantage to LaTeX over Microsoft Word is that it’s easy to version LaTeX files because they’re just plain text. My colleague had the opposite view. He said that LaTeX was impossible to version because its files are just plain text. How could we interpret the same facts so differently?

I was thinking about checking files in and out of a version control system. With a text file, the version control system can tell you exactly how two versions differ. But with something like a Word document, the system will give an unhelpful message like “binary files differ.”

My colleague was thinking about using the change tracking features of Microsoft Word. He’s accustomed to seeing documents in isolation, such as a file attachment in an email. In that setting, a plain text file has no version history, but a Word document may.

I assumed version information would be external to the document. He assumed the version information would be bundled with the document. My view is typical of software developers. His is typical of everyone else.

These two approaches are analogous to functional programming versus object oriented programming. Version control systems have a functional view of files. The versioning functionality is unbundled from the file content, in part because the content (typically source code files) could be used by many different applications. Word provides a sort of object oriented versioning system, bundling versioning functionality with the data.

As with functional versus object oriented programming, there’s no “right” way to solve this problem, only approaches that work better in different contexts. I much prefer using a version control system to track changes to my files, but that approach won’t fly with people who don’t share a common version control system or don’t use version control at all.

Related posts

11 thoughts on “Bundled versus unbundled version history

  1. I recently moved my dissertation onto Github to get up to speed with versioning LaTeX source, and to play with branching chapters into papers. So far, though, I’ve never had the chance to collaborate with other LaTeX users because Word is so ubiquitous in my area. The possibilities of the git/LaTeX combination are truly fascinating.

  2. I have a question regarding version controlling and LaTeX. I typically write continuously and only utilize a new line when I have to (new paragraph, etc). Version control systems look at differences in lines, so when I diff two versions, I might not be able to spot a single change immediately in a given paragraph. How do you write your LaTeX files so that diff’ing can be more informative? I use Emacs. Thanks.

  3. I write prose and math differently. In prose, I also will typically only have a newline when I break a paragraph. But I format math like source code: generous white space, nested levels of indention, etc. So diffing math sections is easy.

    For prose, you might try a better diff tool. I like Source Gear DiffMerge. You might find a 3rd party diff easier to use than your version control system’s diff.

  4. When I write LaTeX, I tend to manually end all lines (prose and math) around the 80th column, to keep everything a bit manageable and two line ends after most paragraphs.

    I’ve run into the same problem as Vinh and John: when diffing line-by-line you get huge diffs even when only a single word has changed. Apparently, some tools exist to cope with this: e.g. wdiff. I haven’t set this up for my version control system, but if you happen to use git, you can find a short guide on how to do so for tex files.

    With regard to versioning info internal in the files or external, I too find it more convenient to have the version control outside of my files. Putting everything inside the files is risky as that puts too much faith into a single piece of software. If your versioning information is stored within the file, you depend upon the vendor to provide the right (exporting) features should you want to switch to another tool or extract a `checkout’ of some latest version. Most regular version control system have these features and can cooperate with each other to some extent (e.g. using git, you can collaborate on files inside a Subversion repository, similar things are bound to exist for other tools).

  5. SteveBrooklineMA

    My versioning for LaTeX: paper.tex, paper.tex.old, paper.tex.ol2, paper.tex.ol3…

    Tracking changes in Word is nice for collaboration. In my experience, this has little to do with maintaining the ability to revert to old versions. Mostly it is for seeing what your collaborators have worked on, and for merging your changes with someone else’s in those cases where you have both made changes to the same document. When the document gets too messy, you can “accept all changes” and start fresh. This discards the ability to revert to earlier versions, but often nobody cares about that.

    Has anybody else worked on a LaTeX document with someone who uses Word but doesn’t have/know LaTeX? He edits in Word, putting in pidgin LaTeX along with text. He sends it back to you, you save it as text .tex, edit, send back. Not as bad as it sounds.

  6. I have used the following macro for some time now to get revision control information in all of my documents. This version also makes a glossary entry for generating an “effective sections” section of the document.

    % Calling sequence: rcsdocrev{$Header$}
    footnotetext{tt#1 $tt$}
    glossary{tt#1 $tt$}}

  7. I have used many times latexdiff to create a tex showing deletions and additions. Although it has some problems with same commands, which require some manual reparation, e.g. with tikz code.

  8. Vinh Nguyen:

    I can only talk about Git as I haven’t used any of the rest that much. When I want to see changed words instead of entire paragraphs I am using the –color-words parameter.

  9. Please don’t spread the idea that “Object Orientation” means “bundling functionality with data”. I know this is a very widespread misconception (because most regular OO systems out there are accidentally built that way) but this is still wrong.

    OO is about two things: reusability through inheritance (wrt data) and genericity through polymorphism (wrt behavior). Some people even claim that OO is about only one thing: removing conditionals :-)

    But nothings says that these things need to be bundled together. Some less known (but more powerful) OO systems like CLOS completely decouple methods (generic functions) from classes.

  10. It seems possible, as you’ve put it, to bundle the LaTeX version control information; see, for example, http://stackoverflow.com/questions/888347/track-changes-svn-latex, but I don’t think it is possible to unbundle the VC information from Word. And, I agree, Word’s track changes feature is terrible, particularly when many author/editors are making/suggesting changes to the document at once. I think SVN and git offer real advantages over passing Word docs around by email.

Leave a Reply

Your email address will not be published. Required fields are marked *