One practical application of functional programming

Arguments in favor of functional programming are often unconvincing. For example, the most common argument is that functional programming makes it easier to “reason about your code.” That’s true to some extent. All other things being equal, it’s easier to understand a function if all its inputs and outputs are explicit. But all other things are not equal. In order to make one function easier to understand, you may have to make something else harder to understand.

Here’s an argument from Brian Beckman for using a functional style of programming in a particular circumstance that I find persuasive. The immediate context is Kalman filtering, but it applies to a broad range of mathematical computation.

By writing a Kalman filter as a functional fold, we can test code in friendly environments and then deploy identical code with confidence in unfriendly environments. In friendly environments, data are deterministic, static, and present in memory. In unfriendly, real-world environments, data are unpredictable, dynamic, and arrive asynchronously.

If you write the guts of your mathematical processing as a function to be folded over your data, you can isolate the implementation of your algorithm from the things that make code hardest to test, i.e. the data source. You can test your code in a well-controlled “friendly” test environment and deploy exactly the same code into production, i.e. an “unfriendly” environment.

Brian continues:

The flexibility to deploy exactly the code that was tested is especially important for numerical code like filters. Detecting, diagnosing and correcting numerical issues without repeatable data sequences is impractical. Once code is hardened, it can be critical to deploy exactly the same code, to the binary level, in production, because of numerical brittleness. Functional form makes it easy to test and deploy exactly the same code because it minimizes the coupling between code and environment.

I ran into this early on when developing clinical trial methods first for simulation, then for production. Someone would ask whether we were using the same code in production as in simulation.

“Yes we are.”

Exactly the same code?”

“Well, essentially.”

“Essentially” was not good enough. We got to where we would use the exact same binary code for simulation and production, but something analogous to Kalman folding would have gotten us there sooner, and would have made it easier to enforce this separation between numerical code and its environment across applications.

Why is it important to use the exact same binary code in test and production, not just a recompile of the same source code? Brian explains:

Numerical issues can substantially complicate code, and being able to move exactly the same code, without even recompiling, between testing and deployment can make the difference to a successful application. We have seen many cases where differences in compiler flags, let alone differences in architectures, even between different versions of the same CPU family, introduce enough differences in the generated code to cause qualitative differences in the output. A filter that behaved well in the lab can fail in practice.

Emphasis added, here and in the first quote above.

Note that this post gives an argument for a functional style of programming, not necessarily for the use of functional programming languages. Whether the numerical core or the application that uses it would best be written in a functional language is a separate discussion.

Related posts:

Building software the right way

Yesterday a friend told me about a software project whose owners said “We’re going to do this the right way.” I told him I have two opposite reactions when I hear that:

  • Ooh, that sounds like fun!
  • Run away!

I’ve been on several projects where the sponsors have identified some aspect of the status quo that clearly needs improving. Working on a project that realizes these problems and is willing to address them sounds like fun. But then the project runs to the opposite extreme and creates something worse.

For example, most software development is sloppy. So a project reacts against this and becomes so formal that nobody can get any work done. In those settings I like to say “Hold off on buying a tux. Just start by tucking your shirt tail in. Maybe that’s as formal as you need to be.”

I’d be more optimistic about a project with more modest goals, say one that wants to move 50% of the way toward an ideal, rather than wanting to jump from one end of a spectrum to the other. Or even better, a project that has identified a direction they want to move in, and thinks in terms of experimenting to find their optimal position along that direction.

Unnatural language processing

Japanese Russian dictionary

Larry Wall, creator of the Perl programming language, created a custom degree plan in college, an interdisciplinary course of study in natural and artificial languages, i.e. linguistics and programming languages. Many of the features of Perl were designed as an attempt to apply natural language principles to the design of an artificial language.

I’ve been thinking of a different connection between natural and artificial languages, namely using natural language processing (NLP) to reverse engineer source code.

The source code of computer program is text, but not a text. That is, it consists of plain text files, but it’s not a text in the sense that Paradise Lost or an email is a text. The most efficient way to parse a programming language is as a programming language. Treating it as an English text will loose vital structure, and wrongly try to impose a foreign structure.

But what if you have two computer programs? That’s the problem I’ve been thinking about. I have code in two very different programming languages, and I’d like to know how functions in one code base relate to those in the other. The connections are not ones that a compiler could find. The connections are more psychological than algorithmic. I’d like to reverse engineer, for example, which function in language A a developer had in mind when he wrote a function in language B.

Both code bases are in programming language, but the function names are approximately natural language. If a pair of functions have the same name in both languages, and that name is not generic, then there’s a good chance they’re related. And if the names are similar, maybe they’re related.

I’ve done this sort of thing informally forever. I imagine most programmers do something like this from time to time. But only recently have I needed to do this on such a large scale that proceeding informally was not an option. I wrote a script to automate some of the work by looking for fuzzy matches between function names in both languages. This was far from perfect, but it reduced the amount of sleuthing necessary to line up the two sets of source code.

Around a year ago I had to infer which parts of an old Fortran program corresponded to different functions in a Python program. I also had to infer how some poorly written articles mapped to either set of source code. I did all this informally, but I wonder now whether NLP might have sped up my detective work.

Another situation where natural language processing could be helpful in software engineering is determining code authorship. Again this is something most programmers have probably done informally, saying things like “I bet Bill wrote this part of the code because it looks like his style” or “Looks like Pat left her fingerprints here.” This could be formalized using NLP techniques, and I imagine it has been. Just as Frederick Mosteller and colleagues did a statistical analysis of The Federalist Papers to determine who wrote which paper, I’m sure there have been similar analyses to try to find out who wrote what code, say for legal reasons.

Maybe this already has a name, but I like “unnatural language processing” for the application of natural language processing to unnatural (i.e. programming) languages. I’ve done a lot of ad hoc unnatural language processing, and I’m curious how much of it I could automate in the future.

Improving on the Unix shell

Yesterday I ran across Askar Safin’s blog post The Collapse of the UNIX Philosophy. Two quotes from the post stood out. One was from Rob Pike about the Unix ideal of little tools that each do one job:

Those days are dead and gone and the eulogy was delivered by Perl.

The other was a line from James Hague:

… if you romanticize Unix, if you view it as a thing of perfection, then you lose your ability to imagine better alternatives and become blind to potentially dramatic shifts in thinking.

This brings up something I’ve long wondered about: What did the Unix shell get right that has made it so hard to improve on? It has some truly awful quirks, and yet people keep coming back to it. Alternatives that seem more rational don’t work so well in practice. Maybe it’s just inertia, but I don’t think so. There are other technologies from the 1970’s that had inertia behind them but have been replaced. The Unix shell got something so right that it makes it worth tolerating the flaws. Maybe some of the flaws aren’t even flaws but features that serve some purpose that isn’t obvious.

(By the way, when I say “the Unix shell” I have in mind similar environments as well, such as the Windows command line.)

On a related note, I’ve wondered why programming languages and shells work so differently. We want different things from a programming language and from a shell or REPL. Attempts to bring a programming language and shell closer together sound great, but they inevitably run into obstacles. At some point, we have different expectations of languages and shells and don’t want the two to be too similar.

Anthony Scopatz and I discussed this in an interview a while back in the context of xonsh, “a Python-powered, cross-platform, Unix-gazing shell language and command prompt.” While writing this post I went back to reread Anthony’s comments and appreciate them more now than I did then.

Maybe the Unix shell is near a local optimum. It’s hard to make much improvement without making big changes. As Anthony said, “you quickly end up where many traditional computer science people are not willing to go.”

Related postWhat’s your backplane?

Big data and the law

computer law

Excerpt from the new book Big Data of Complex Networks:

Big Data and data protection law provide for a number of mutual conflicts: from the perspective of Big Data analytics, a strict application of data protection law as we know it today would set an immediate end to most Big Data applications. From the perspective of the law, Big Data is either a big threat … or a major challenge for international and national lawmakers to adopt today’s data protection laws to the latest technological and economic developments.

Emphasis added.

The author of the chapter on legal matters is Swiss and writes primarily in a European context, though all countries face similar problems.

I’m not a lawyer, though I sometimes work with lawyers as a technical expert, and sometimes help companies with the statistical aspects of HIPAA law. But as a layman the observation above sounds reasonable to me, that strict application of the law could bring many applications to a halt, for better and for worse.

In my opinion the regulations around HIPAA and de-identification are mostly reasonable. The things it prohibits mostly should be prohibited. And it has a common sense provision in the form of expert determination. If your data uses fall outside the regulation’s specific recommendations but don’t endanger privacy, you can have an expert can certify that this is the case.

Related:

Branch cuts and Common Lisp

“Nearly everything is really interesting if you go into it deeply enough.” — Richard Feynman

If you thumb through Guy Steele’s book Common Lisp: The Language, 2nd Edition, you might be surprised how much space is devoted to defining familiar functions: square root, log, arcsine, etc. He gives some history of how these functions were first defined in Lisp and then refined by the ANSI (X3JI3) standardization committee in 1989.

There are three sources of complexity:

  1. Complex number arguments
  2. Multi-valued functions
  3. +0 and -0

Complex arguments

The functions under discussion are defined for either real or complex inputs. This does not complicate things much in itself. Defining some functions for complex arguments, such as the exp function, is simple. The complication comes from the interaction of complex arguments with multi-valued functions and floating point representations of zero.

Multi-valued functions

The tricky functions to define are inverse functions, functions where we have to make a choice of range.

Real multi-valued functions

Let’s restrict our attention to real numbers for a moment. How do you define the square root of a positive number x? There are two solutions to the equation y2 = x, and √x is defined to be the positive solution.

What about the arcsine of x? This is the number whose sine is x. Except there is a “the” number. There are infinitely many numbers whose sine is x, so we have to make a choice. It seems natural to chose values in an interval symmetric about 0, so we take arcsine of x to be the number between -π/2 and π/2 whose sine is x.

Now what about arctangent? As with arcsine, we have to make a choice because for any x there are infinitely many numbers y whose tangent is x. Again it’s convenient to define the range to be in an interval symmetric about zero, so we define the arctangent of x to be the number y between -π/2 and π/2 whose tangent is x. But now we have a subtle complication with tangent we didn’t have with sine because tangent is unbounded. How do we want to define the tangent of a vertical angle? Should we call it ∞ or -∞? What do we want to return if someone asks for the arctangent of ±∞? Should we return π/2 or -π/2?

Complex multi-valued functions

The discussion shows there are some minor complications in defining inverse functions on the real line. Things get more complicated when working in the complex plane. To take the square root example, it’s easy to say we’re going to define square root so that the square root of a positive number is another positive number. Fine. But which solution to z2 = w should we take to be the square root of a complex number w, such as 2 + 3i or -5 + 17i?

Or consider logarithms. For positive numbers x there is only one real number y such at exp(y) = x. But what if we take a negative value of x such as -1? There’s no real number whose exponential is -1, but there is a complex number. In fact, there are infinitely many complex numbers whose exponential is -1. Which one should we choose?

Floating point representations of zero

A little known feature of floating point arithmetic (specified by the IEEE 754 standard) is that there are two kinds of zero: +0 and -0. This sounds bizarre at first, but there are good reasons for this, which I explain here. But defining functions to work properly with  two kinds of zero takes a lot of work. This was the main reason the ANSI Common Lisp committee had to revise their definitions of several transcendental functions. If a function has a branch cut discontinuity along the real axis, for example, you want your definition to be continuous as you approach x + 0i from above and as you approach x -0i from below.

The Common Lisp solution

I’ll cut to the chase and present the solution the X3J13 came up with. For a discussion of the changes this required and the detailed justifications, see Guy Steele’s book.

The first step is to carefully define the two-argument arctangent function (atan y x) for all 16 combinations of y and x being positive, negative, +0, or -0. Then other functions are defined as follows.

  1. Define phase in terms of atan.
  2. Define complex abs in terms of real sqrt.
  3. Define complex log in terms of phase, complex abs, and real log.
  4. Define complex sqrt in terms of complex log.
  5. Define everything else in terms of the functions above.

The actual implementations may not follow this sequence, but they have to produce results consistent with this sequence.

The phase of z is defined as the arctangent with arguments the imaginary and real parts of z.

The complex log of z is defined as log |z| + i phase(z).

Square root of z is defined as exp( log(z) / 2 ).

The inverses of circular and hyperbolic functions are defined as follows.

\arcsin z &=& -i \log \left( iz + \sqrt{1 - z^2} \right) \\ \arccos z &=& -i \log \left( z + i \sqrt{1 - z^2} \right) \\ \mbox{arcsinh} z &=& \log \left( z + \sqrt{1 + z^2} \right) \\ \mbox{arccosh} z &=& 2 \log\left( \sqrt{(z+1)/2} + \sqrt{(z-1)/2} \right) \\ \mbox{arctanh} z &=& \log \left( (1+z) \sqrt{1/(1 - z^2)}\right)

Note that there are many ways to define these functions that seem to be equivalent, and are equivalent in some region. Getting the branch cuts right is what makes this complicated.

Click to find out more about consulting for numerical computing

Ultra-reliable software

From a NASA page advocating formal methods:

We are very good at building complex software systems that work 95% of the time. But we do not know how to build complex software systems that are ultra-reliably safe (i.e. P_f < 10^-7/hour).

Emphasis added.

Developing medium-reliability and high-reliability software are almost entirely different professions. Using typical software development procedures on systems that must be ultra-reliable would invite disaster. But using extremely cautious development methods on systems that can afford to fail relatively often would be an economic disaster.

Related post: Formal validation methods let you explore the corners

 

Technological allegiances

I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from one to the other. In the words of Stephen Sondheim, “Is it always or, and never and?”

Rivalries seem sillier to outsiders the more similar the two options are. And yet this makes sense. I’ve forgotten the psychological term for this, but it has a name: Similar things compete for space in your brain more than things that are dissimilar. For example, studying French can make it harder to spell English words. (Does literature have two t’s in French and one in English or is it the other way around?) But studying Chinese doesn’t impair English orthography.

It’s been said that academic politics are so vicious because the stakes are so small [1]. Similarly, there are fierce technological loyalties because the differences with competing technologies are so small, small enough to cause confusion. My favorite example: I can’t keep straight which languages use else if, elif, elseif, … in branching.

If you have to learn two similar technologies, it may be easier to devote yourself exclusively to one, then to the other, then use both and learn to keep them straight.

Related post: Ford-Chevy arguments in technology

[1] I first heard this attributed to Henry Kissinger, but there’s no agreement on who first said it. Several people have said similar things.

 

Gentle introduction to R

The R language is closely tied to statistics. It’s ancestor was named S, because it was a language for Statistics. The open source descendant could have been named ‘T’, but its creators chose to call it’R.’

Most people learn R as they learn statistics: Here’s a statistical concept, and here’s how you can compute it in R. Statisticians aren’t that interested in the R language itself but see it as connective tissue between commands that are their primary interest.

This works for statisticians, but it makes the language hard for non-statisticians to approach. Years ago I managed a group of programmers who supported statisticians. At the time, there were no books for learning R without concurrently learning statistics. This created quite a barrier to entry for programmers whose immediate concern was not the statistical content of an R program.

Now there are more books on R, and some are more approachable to non-statisticians. The most accessible one I’ve seen so far is Learning Base R by Lawrence Leemis. It gets into statistical applications of R—that is ultimately why anyone is interested in R—but it doesn’t start there. The first 40% or so of the book is devoted to basic language features, things you’re supposed to pick up by osmosis from a book focused more on statistics than on R per se. This is the book I wish I could have handed my programmers who had to pick up R.

Efficiency of C# on Linux

This week I attended Mads Torgersen’s talk Why you should take another look at C#. Afterward I asked him about the efficiency of C# on Linux. When I last looked into it, it wasn’t good. A few years ago I asked someone on my team to try running some C# software on Linux using Mono. The code worked correctly, but it ran several times slower than on Windows.

Mads said that now with .NET Core, C# code runs about as fast on Linux as Windows. Maybe a few percent slower on Linux. Scott Hanselman joined the conversation and explained that with .NET Core, the same code runs on every platform. The Mono project duplicated the much of the functionality of the .NET framework, but it was an entirely independent implementation and not as efficient.

I had assumed the difference was due to compiler optimizations or lack thereof, but Scott and Mads said that the difference was mostly the code implementation. There are some compiler optimizations that are better on the Windows side, and so C# might run a little faster on Windows, but the performance is essentially the same on both platforms.

I could recommend C# to a client running Linux if there’s a 5% performance penalty, but a 500% performance penalty was a show-stopper. Now I’d consider using C# on a project where I need more performance than Python or R, but wanted to use something easier to work with than C++.

Years ago I developed software with the Microsoft stack, but I moved away from Microsoft tools when I quit doing the kind of software development the tools are geared for. So I don’t write C# much any more. It’s been about a year since I had a project where I needed to write C# code. But I like the C# language. You can tell that a lot of thought has gone into the design, and the tool support is great. Now that the performance is better on Linux I’d consider using it for numerical simulations.

Click to find out more about consulting for numerical computing

Humble Lisp programmers

Maybe from the headline you were expecting a blank post? No, that’s not where I’m going.

Yesterday I was on Amazon.com and noticed that nearly all the books they recommended for me were either about Lisp or mountain climbing. I thought this was odd, and mentioned it on Twitter. Carl Vogel had a witty reply: “I guess they weren’t sure whether you want to figuratively or literally look down on everyone.”

The stereotype Lisp programmer does look down on everyone. But this is based on a tiny, and perhaps unrepresentative, sample of people writing about Lisp compared to the much larger number of people who are writing in Lisp.

Lisp has been around for over 50 years and shows no signs of going away. There are a lot of people writing Lisp in obscurity. Kate Gregory said something similar about C++ developers, calling them the dark matter of programmers because there are lot of them but they don’t make much of a splash. They’re quietly doing their job, not speaking at conferences or writing much about their language.

I imagine there are a lot of humble Lisp programmers. It takes some humility to commit to an older technology, especially given the pervasive neomania of the programmer community. It also takes some humility to work on projects that have been around for years or that are deep within the infrastructure of more visible projects, which is where I expect a lot of Lisp lives.

You can do very clever things in Lisp, but you don’t have to. As Ed Post famously said, “The determined Real Programmer can write FORTRAN programs in any language.” There must be a lot of code out there that writes (f x) instead of f(x) but otherwise isn’t that different from FORTRAN.

Kalman filters and functional programming

A few weeks ago I started a series of posts on various things you could do with a functional fold. In the first post I mentioned that the idea came from a paper by Brian Beckman on Kalman filters and folds:

This post was inspired by a paper by Brian Beckman (in progress) that shows how a Kalman filter can be implemented as a fold. From a Bayesian perspective, the thing that makes the Kalman filter work is that a certain multivariate normal model has a conjugate prior. This post shows that conjugate models more generally can be implemented as folds over the data. That’s interesting, but what does it buy you? Brian’s paper discusses this in detail, but one advantage is that it completely separates the accumulator function from the data, so the former can be tested in isolation.

At the time Brian was working on one big paper in private. This has since been split into several papers and they’re now public.

See also: Kalman filters, particle filters, and tracking

 

Formal methods let you explore the corners

I heard someone say the other day that the advantage of formal software validation methods is that they let you explore the corners, cases where intuition doesn’t naturally take you.

This made me think of corners in the geometric sense. If you have a sphere in a box in high dimensions, nearly all the volume is in the corners, i.e. outside the sphere. This is more than a metaphor. You can think of software options geometrically, with each independent choice corresponding to a dimension. Paths through a piece of software that are individually rare may account for nearly all use when considered together.

With a circle inside a square, nearly 78.5% of the area is inside the circle. With a ball sitting inside a 3-D box, 52.4% of the volume is inside the ball. As the dimension increases, the proportion of volume inside the sphere rapidly decreases. For a 10-dimensional sphere sitting in a 10-dimensional box, 0.25% of the volume is in the sphere. Said another way, 99.75% of the volume is in the corners.

When you go up to 100 dimensions, the proportion of volume inside the sphere is about 2 parts in 1070, a 1 followed by 70 zeros [1]. If 100 dimensions sounds like pure fantasy, think about a piece of software with more than 100 features. Those feature combinations multiply like geometric dimensions [2].

Here’s a little Python code you could use to see how much volume is in a sphere as a function of dimension.

    from scipy.special import gamma
    from math import pi

    def unit_sphere_volume(n): 
        return pi**(0.5*n)/gamma(0.5*n + 1)

    def unit_cube_volume(n): 
        return 2**n

    def ratio(n):
        return unit_sphere_volume(n) / unit_cube_volume(n)

    print( [ratio(n) for n in range(1, 20)] )

* * *

[1] There are names for such extremely large numbers. These names are hardly ever used—scientific notation is much more practical— but they’re fun to say. 1070 is ten duovigintillion in American nomenclature, ten undecilliard in European.

[2] Geometric dimensions are perfectly independent, but software feature combinations are not. In terms of logic, some combinations may not be possible. Or in terms of probability, the probability of exploring some paths is conditional on the probability of exploring other paths. Even so, there are inconceivably many paths through any large software system. And in large-scale operations, events that should “never happen” happen regularly.

Literate programming: presenting code in human order

Presentation order

People best understand computer programs in a different order than compilers do. This is a key idea of literate programming, and one that distinguishes literate programs from heavily commented programs.

Traditional source code, no matter how heavily commented, is presented in the order dictated by the compiler. The computer is the primary audience. Literate programming is more humanistic in the sense that the primary audience is a human. The computer has to go to extra effort to arrange the code for its needs. As Donald Knuth describes it in his book on literate programming,

The practitioner of literate programming … strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that nicely reinforce each other. [emphasis added]

There are two steps in processing literate programs: weaving and tangling. You take files containing prose and code, and weave them into documentation and tangle them into source code. Tools like Sweave and Pweave focus on the weave process, as their names imply. The weave side of literate programming has gotten the most attention.

A half-hearted approach to literate programming doesn’t require much of a tangle process. A well-commented program has no tangle step at all. A *weave document that follows the order of the source code has a trivial tangle step: save the code to its own file, manually or automatically, but don’t rearrange it. But a full-fledged literate program may make the tangle program work harder, rearranging code fragments from human-friendly to compiler-friendly order.

Careful explanation vs. unit tests

The most obvious feature of literate programming is that it requires careful explanation. Here’s more from the paragraph I quoted above, filling in the part I left out.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with explanation and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible …

The discipline of explaining every piece of code leads to better code. It serves a similar purpose to writing unit tests. I saw somewhere—I can’t remember where now— that Knuth hates unit testing and sees it as redundant effort. Presumably this is because unit testing and literate programming overlap. Unit tests are a kind of commentary on code, explaining how it used, exploring its limitations, etc.

Knuth understands that literate programming doesn’t replace the need for testing, just unit testing. He explained somewhere—again I forget where—that he would test TeX by spending days at a time trying fiendishly to break it.

My misunderstanding and experience

When I read Knuth’s book, I could see the value of carefully explaining code. What I didn’t appreciate was the value of presenting code in a different order than the order of source code.

I’m working on a project now where a few lines of code may require a few paragraphs of explanation. That’s what got me thinking about literate programming. My intention was to write my documentation in the same order as the code. It took a while to realize I had stumbled on an ideal application of literate programming: a complicated algorithm that needs to be explained carefully, both in order to hand over to the client and to reduce the chances of errors. The best order to understand this algorithm is definitely not top-down going through the code.

Why literate programming has not taken off

I think I understand better now why literate programming hasn’t gained much of an audience. I used to think that it was because developers hate writing prose. That’s part of it. Most programmers I’ve worked with would much rather write a hundred lines of unit tests than write one complete sentence.

But that’s not the whole story. There are quite a few programmers who are willing and able to write prose. Why don’t more of them use literate programming?

I think part of the reason is that having a non-trivial tangle process is a barrier to adoption. A programmer can decide to start writing more extensive comments, gradually working up to essay-like explanations. But it’s one thing to say “I’m going to heavily comment my C code” and quite another to say “I’m not going to write C per se any more. I’m going to write CWEB files that compile to C.” Even if a programmer wants to write CWEB in secret, just checking in the tangled C files, other programmers will edit these C files and the changes won’t be reflected in the CWEB source. Also, the output of tangle is less readable than ordinary code. The programmer secretly using CWEB as a preprocessor would appear to be writing undocumented code.

Tricky code benefits from a literate presentation, but routine code does not benefit so much. You either have to have two ways of writing code—straight source for routine code and literate programs for the tricky parts—or impose the overhead of literate programming everywhere. Most code is mundane and repetitive, challenging because of its volume rather than its cleverness. Knuth doesn’t write this kind of code. He only writes code that benefits from a literate presentation.

To write a good literate program, not only do you need to be able to program, and need to be willing and able to write good prose, on top of that you need to have a good sense for story telling, arranging the code for the benefit of other readers. If this is done poorly, the result is harder to understand than traditional programs.

I may use literate programming more now that I’m starting to understand it, at least for my own benefit and hopefully for the benefit of clients. I usually deliver algorithms or libraries, not large applications, and so it wouldn’t be too much work to create two versions of my results. I could create a literate program, then weave a report, and manually edit the tangled code into a traditional form convenient for the client.