Posts Tagged ‘Knuth’

Reusable code vs. re-editable code

Saturday, May 3rd, 2008

In a recent interview, Donald Knuth made this comment about reusable code.I also must confess to a strong bias against the fashion for reusable code.

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.

Knuth didn’t elaborate on what he means by “re-editable” code, but I assume he means code that is easy to maintain. The best chance most code has at reuse is remaining useful in its original project over multiple versions, so maybe we’d get more reuse if we focused more on maintainability.

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.

Tricky code

Monday, April 7th, 2008

I found the following comment inside the source code for TeX in the preface to a function creating Roman numeral representations:

Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given.

Even with Knuth does not leave a comment, he leaves a comment!

Selective use of technology

Monday, January 21st, 2008

In his book The Nine, Jeffrey Toobin gives a few details of Supreme Court Justice David Souter’s decidedly low-tech life. Souter has no cell phone or voice mail. He does not use email. He was given a television once but never turned it on. He moves his chair around his office throughout the day so he can read by natural light. Toobin says Souter lives something like an eighteenth century gentleman.

I find it refreshing to read of someone like Justice Souter with such independence of mind that he chooses not to use much of the technology that our world takes for granted. I also find it encouraging to see examples of people who do not reject electronics entirely but selectively decide how and whether to use them.

I once had the opportunity to see a talk by the father of computer typography, Donald Knuth. Much to my surprise, his slides were handwritten. The author of TeX didn’t see the need to use TeX for his slides. While he cares about the fine details of how math looks in print, he apparently didn’t feel it was worth the effort to typeset his notes for an informal presentation.

I also think of the late computer scientist Edsgar Dijkstra who wrote with pen and ink, even when writing computer programs.

If you’re reading legal briefs by sunlight, your thoughts will not be exactly the same as they would be if you were reading by fluorescent light. If you’re writing computer programs by hand, you’re not going to think the same way you would if you are pecking on a computer keyboard. (And if you do use a computer, your thinking is subtlety different depending on what program you use.) Technology effects the way you think. The effect is not uniformly better or worse, but it is certainly real.

To shake up your thinking, try going low-tech for a day.

Literate programming and statistics

Tuesday, January 15th, 2008

Sweave, mentioned in my previous post, is a tool for literate programming. Donald Knuth invented literate programming and gives this description of the technique in his book by the same name:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Knuth says the quality of his code when up dramatically when he started using literate programming. When he published the source code for TeX as a literate program and a book, he was so confident in the quality of the code that he offered cash rewards for bug reports, doubling the amount of the reward with each edition. In one edition, he goes so far as to say “I believe that the final bug in TeX was discovered and removed on November 27, 1985.” Even though TeX is a large program, this was not an idle boast. A few errors were discovered after 1985, but only after generations of Stanford students studied the source code carefully and multitudes of users around the world put TeX through its paces.

While literate programming is a fantastic idea, it has failed to gain a substantial following. And yet Sweave might catch on even though literate programming in general has not.

In most software development, documentation is an after thought. When push comes to shove, developers are rewarded for putting buttons on a screen, not for writing documentation. Software documentation can be extremely valuable, but it’s most valuable to someone other than the author. And the benefit of the documentation may only be realized years after it was written.

But statisticians are rewarded for writing documents. In a statistical analysis, the document is the deliverable. The benefits of literate programming for a statistician are more personal and more immediate. Statistical analyses are often re-run, with just enough time between runs for the previous work to be completely flushed from term memory. Data is corrected or augmented, papers come back from review with requests for changes, etc. Statisticians have more self-interest in making their work reproducible than do programmers.

Patrick McPhee gives this analysis for why literate programming has not caught on.

Without wanting to be elitist, the thing that will prevent literate programming from becoming a mainstream method is that it requires thought and discipline. The mainstream is established by people who want fast results while using roughly the same methods that everyone else seems to be using, and literate programming is never going to have that kind of appeal. This doesn’t take away from its usefulness as an approach.

But statisticians are more free to make individual technology choices than programmers are. Programmers typically work in large teams and have to use the same tools as their colleagues. Statisticians often work alone. And since they deliver documents rather than code, statisticians are free to use use Sweave without their colleagues’ knowledge or consent. I doubt whether a large portion of statisticians will ever be attracted to literate programming, but technological minorities can thrive more easily in statistics than in mainstream software development.

Complementary validation

Thursday, January 10th, 2008

Edsgar Dijkstra quipped that software testing can only prove the existence of bugs, not the absense of bugs. His research focused on formal techniques for proving the correctness of software, with the implicit assumption that proofs are infallible. But proofs are written by humans, just as software is, and are also subject to error. Donald Knuth had this in mind when he said “Beware of bugs in the above code; I have only proved it correct, not tried it.” The way to make progress is to shift from thinking about the possibility of error to thinking about the probability of error.

Testing software cannot prove the impossibility of bugs, but it can increase your confidence that there are no bugs, or at least lower your estimate of the probability of running into a bug. And while proofs can contain errors, they’re generally less error-prone than source code. (See a recent discussion by Mark Dominus about how reliable proofs have been.) At any rate, people tend to make different kinds of errors when proving theorems than when writing software. If software passes tests and has a formal proof of correctness, it’s more likely to be correct. And if theoretical results are accompanied by numerical demonstrations, they’re more believable.

Leslie Lamport wrote an article entitled How to Write a Proof where he addresses the problem of errors in proofs and recommends a pattern of writing proofs which increases the probability of the proof being valid. Interestingly, his proofs resemble programs. And while Lamport is urging people to make proofs more like programs, the literate programming folks are urging us to write programs that are more like prose. Both are advocating complementary modes of validation, adding machine-like validation to prosaic proofs and adding prosaic explanations to machine instructions.