People best understand computer programs in a different order than compilers do. This is a key idea of literate programming, and one that distinguishes literate programs from heavily commented programs.
Traditional source code, no matter how heavily commented, is presented in the order dictated by the compiler. The computer is the primary audience. Literate programming is more humanistic in the sense that the primary audience is a human. The computer has to go to extra effort to arrange the code for its needs. As Donald Knuth describes it in his book on literate programming,
The practitioner of literate programming … strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that nicely reinforce each other. [emphasis added]
There are two steps in processing literate programs: weaving and tangling. You take files containing prose and code, and weave them into documentation and tangle them into source code. Tools like Sweave and Pweave focus on the weave process, as their names imply. The weave side of literate programming has gotten the most attention.
A half-hearted approach to literate programming doesn’t require much of a tangle process. A well-commented program has no tangle step at all. A *weave document that follows the order of the source code has a trivial tangle step: save the code to its own file, manually or automatically, but don’t rearrange it. But a full-fledged literate program may make the tangle program work harder, rearranging code fragments from human-friendly to compiler-friendly order.
Careful explanation vs. unit tests
The most obvious feature of literate programming is that it requires careful explanation. Here’s more from the paragraph I quoted above, filling in the part I left out.
The practitioner of literate programming can be regarded as an essayist, whose main concern is with explanation and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible …
The discipline of explaining every piece of code leads to better code. It serves a similar purpose to writing unit tests. I saw somewhere—I can’t remember where now— that Knuth hates unit testing and sees it as redundant effort. Presumably this is because unit testing and literate programming overlap. Unit tests are a kind of commentary on code, explaining how it used, exploring its limitations, etc.
Knuth understands that literate programming doesn’t replace the need for testing, just unit testing. He explained somewhere—again I forget where—that he would test TeX by spending days at a time trying fiendishly to break it.
My misunderstanding and experience
When I read Knuth’s book, I could see the value of carefully explaining code. What I didn’t appreciate was the value of presenting code in a different order than the order of source code.
I’m working on a project now where a few lines of code may require a few paragraphs of explanation. That’s what got me thinking about literate programming. My intention was to write my documentation in the same order as the code. It took a while to realize I had stumbled on an ideal application of literate programming: a complicated algorithm that needs to be explained carefully, both in order to hand over to the client and to reduce the chances of errors. The best order to understand this algorithm is definitely not top-down going through the code.
Why literate programming has not taken off
I think I understand better now why literate programming hasn’t gained much of an audience. I used to think that it was because developers hate writing prose. That’s part of it. Most programmers I’ve worked with would much rather write a hundred lines of unit tests than write one complete sentence.
But that’s not the whole story. There are quite a few programmers who are willing and able to write prose. Why don’t more of them use literate programming?
I think part of the reason is that having a non-trivial tangle process is a barrier to adoption. A programmer can decide to start writing more extensive comments, gradually working up to essay-like explanations. But it’s one thing to say “I’m going to heavily comment my C code” and quite another to say “I’m not going to write C per se any more. I’m going to write CWEB files that compile to C.” Even if a programmer wants to write CWEB in secret, just checking in the tangled C files, other programmers will edit these C files and the changes won’t be reflected in the CWEB source. Also, the output of tangle is less readable than ordinary code. The programmer secretly using CWEB as a preprocessor would appear to be writing undocumented code.
Tricky code benefits from a literate presentation, but routine code does not benefit so much. You either have to have two ways of writing code—straight source for routine code and literate programs for the tricky parts—or impose the overhead of literate programming everywhere. Most code is mundane and repetitive, challenging because of its volume rather than its cleverness. Knuth doesn’t write this kind of code. He only writes code that benefits from a literate presentation.
To write a good literate program, not only do you need to be able to program, and need to be willing and able to write good prose, on top of that you need to have a good sense for story telling, arranging the code for the benefit of other readers. If this is done poorly, the result is harder to understand than traditional programs.
I may use literate programming more now that I’m starting to understand it, at least for my own benefit and hopefully for the benefit of clients. I usually deliver algorithms or libraries, not large applications, and so it wouldn’t be too much work to create two versions of my results. I could create a literate program, then weave a report, and manually edit the tangled code into a traditional form convenient for the client.