Beautiful Testing

Posted on 2 June 2009 by John

Beautiful Testing is available for pre-order at Amazon. Proceeds from the book will go to Nothing But Nets, a project to distribute anti-malaria bed nets. I contributed a chapter on how to test random number generators.

Broken windows theory and programming

Posted on 31 December 2008 by John

The broken windows theory says that cracking down on petty crime reduces more serious crime. The name comes from the explanation that if a building has a few broken windows, it invites vandals to break more windows and eventually burn down the building. Turned around, this suggests that punishing vandalism could lead to a reduction in violent crime. Rudy Giuliani is perhaps the most visible proponent of the theory. His first initiative as mayor of New York was to go after turnstile jumpers and squeegeemen as a way of reducing crime in city. Crime rates dropped dramatically during his tenure.

In the book Pragmatic Thinking and Learning, Andy Hunt applies the broken windows theory to software development.

Known problems (such as bugs in code, bad process in an organization, poor interfaces, or lame management) that are uncorrected have a debilitating, viral effect that ends up causing even more damage.

I’ll add a couple of my pet peeves to Andy Hunt’s list.

The first is compiler warnings. I can’t understand why some programmers are totally comfortable with their code having dozens of compiler warnings. They’ll say “Oh yeah, I know about that. It’s not a problem.” But then when a warning shows up that is trying to tell them something important, the message gets lost in the noise. My advice: Just fix the code. In very exceptional situations, explicitly turn off the warning.

The second is similar. Many programmers blithely ignore run-time exceptions that are written to an event log. As with compile warnings, they justify that these exceptions are not really a problem. My advice: If it’s not really a problem, then don’t log it. Otherwise, fix it.

Michael Feathers on refactoring

Posted on 1 December 2008 by John

Michael Feathers wrote my favorite book on unit testing: Working Effectively with Legacy Code (ISBN 0131177052). Some books on unit testing just give abstract platitudes. Feather’s book wrestles with the hard, messy problem of retrofitting unit tests to existing code.

The .NET Rocks podcast had an interview with Michael Feathers recently. The whole interview is worth listening to, but here I’ll just recap a couple things he said about refactoring that I thought were insightful. First, most people agree that you need to have unit tests in place before you can do much refactoring. The unit tests give you the confidence to refactor without worrying that you’ll break something in the process and not know that you broke it. But Feathers adds that you might have to do some light refactoring before you can put the unit tests in place to allow more aggressive refactoring.

The second thing he mentioned about refactoring was the technique called “scratch refactoring.” With this approach, you refactor quickly without worrying about whether you are introducing bugs in order to see where you want to go. But then you completely throw away those changes and refactor carefully. Sometimes you need to do a dry run first to see what patterns emerge and determine where you want to go.

Both of these observations are ways to break out of a chicken-and-egg cycle, needing to refactor before you can refactor.

Errors in math papers not a big deal?

Posted on 11 November 2008 by John

Daniel Lemire wrote a blog post this morning that ties together a couple themes previously discussed here.

Most published math papers contain errors, and yet there have been surprisingly few “major screw-ups” as defined by Mark Dominus. Daniel Lemire’s post quotes Doron Zeilberger on why these frequent errors are often benign.

Most mathematical papers are leaves in the web of knowledge, that no one reads, or will ever use to prove something else. The results that are used again and again are mostly lemmas, that while a priori non-trivial, once known, their proof is transparent. (Zeilberger’s Opinion 91)

Those papers that are “branches” rather than “leaves” receive more scrutiny and are more likely to be correct.

Zeilberger says lemmas get reused more than theorems. This dovetails with Mandelbrot’s observation mentioned a few weeks ago.

Many creative minds overrate their most baroque works, and underrate the simple ones. When history reverses such judgments, prolific writers come to be best remembered as authors of “lemmas,” of propositions they had felt “too simple” in themselves and had to be published solely as preludes to forgotten theorems.

There are obvious analogies to software. Software that many people use has fewer bugs than software that few people use, just as theorems that people build on have fewer bugs than “leaves in the web of knowledge.” Useful subroutines and libraries are more likely to be reused than complete programs. And as Donald Knuth pointed out, re-editable code is better than black-box reusable code.

Everybody knows that software has bugs, but not everyone realizes how buggy theorems are. Bugs in software are more obvious because paper doesn’t abort. Proofs and programs are complementary forms of validation. Attempting to prove the correctness of an algorithm certainly reduces the chances of a bug, but proofs are fallible as well. Again quoting Knuth, he once said “Beware of bugs in the above code; I have only proved it correct, not tried it.” Not only can programs benefit from being more proof-like, proofs can benefit from being more program-like.

Why 90% solutions may beat 100% solutions

Posted on 7 November 2008 by John

I’ve never written a line of Ruby, but I find Ruby on Rails fascinating. From all reports, the Rails framework lets you develop a website much faster than you could using other tools, provided you can live with its limitations. Rails emphasizes consistency and simplicity, deliberately leaving out support for some contingencies.

I listened to an interview last night with Ruby developer Glenn Vanderburg. Here’s an excerpt that I found insightful.

In the Java world, the APIs and libraries … tend to be extremely thorough in trying to solve the entire problem that they are addressing and [are] somewhat complicated and difficult to use. Rails, in particular, takes exactly the opposite philosophy … Rails tries to solve the 90% of the problem that everybody has and that can be solved with 10% of the code. And it punts on that last 10%. And I think that’s the right decision, because the most complicated, odd, corner cases of these problems tend to be the things that can be solved by the team in a specific and rather simple way for one application. But if you try to solve them in a completely general way that everybody can use, it leads to these really complicated APIs and complicated underpinnings as well.

The point is not to pick on Java. I believe similar remarks apply to Microsoft’s libraries, or the libraries of any organization under pressure to be all things to all people. The Ruby on Rails community is a small, voluntary association that can turn away people who don’t like their way of doing things.

At first it sounds unprofessional to develop a software library does anything less than a thorough solution to the problem it addresses. And in some contexts that is true, though every library has to leave something out. But in other contexts, it makes sense to leave out the edge cases that users can easily handle in their particular context. What is an edge case to a library developer may be bread and butter to a particular set of users. (Of course the library provider should document explicitly just what part of the problem their code does and does not solve.)

Suppose that for some problem you really can write the code that is sufficient for 90% of the user base with 10% of the effort of solving the entire problem. That means a full solution is 10 times more expensive to build than a 90% solution.

Now think about quality. The full solution will have far more bugs. For starters, the extra code required for the full solution will have a higher density of bugs because it deals with trickier problems. Furthermore, it will have far fewer users per line of code — only 10% of the community cares about it in the first place, and of that 10%, they all care about different portions. With fewer users per line of code, this extra code will have more unreported bugs. And when users do report bugs in this code, the bugs will be a lower priority to fix because they impact fewer people.

So in this hypothetical example, the full solution costs an order of magnitude more to develop and has maybe two orders of magnitude more bugs.

Programmers aren’t reading programming books

Posted on 23 September 2008 by John

In the interview with Charles Petzold I mentioned in my previous post, Petzold talks about the sharp decline in programming book sales. At one time, nearly every Windows programmer owned a copy of Petzold’s first book, especially in its earlier editions. But he said that now only 4,000 people have purchased his recent 3D programming book.

Programming book sales have plummeted, not because there is any less to learn, but because there is too much to learn. Developers don’t want to take the time to thoroughly learn any technology they suspect will become obsolete in a couple years, especially if its only one of many technologies they have to use. So they plunge ahead using tools they have never systematically studied. And when they get stuck, they Google for help and hope someone else has blogged about their specific problem.

Companies have cut back on training at the same time that they’re expecting more from software. So programmers do the best they can. They jump in and write code without really understanding what they’re doing. They guess and see what works. And when things don’t work, they Google for help. It’s the most effective thing to do in the short term. In the longer term it piles up technical debt that leads to a quality disaster or a maintenance quagmire.

Writes large correct programs

Posted on 19 September 2008 by John

I had a conversation yesterday with someone who said he needed to hire a computer scientist. I replied that actually he needed to hire someone who could program, and that not all computer scientists could program. He disagreed, but I stood by my statement. I’ve known too many people with computer science degrees, even advanced degrees, who were ineffective software developers. Of course I’ve also known people with computer science degrees, especially advanced degrees, that were terrific software developers. The most I’ll say is that programming ability is positively correlated with computer science achievement.

The conversation turned to what it means to say someone can program. My proposed definition was someone who could write large programs that have a high probability of being correct. Joel Spolsky wrote a good book last year called Smart and Gets Things Done about recruiting great programmers. I agree with looking for someone who is “smart and gets things done,” but “writes large correct programs” may be easier to explain. The two ideas overlap a great deal.

People who are not professional programmers often don’t realize how the difficulty of writing software increases with size. Many people who wrote 100-line programs in college imagine that they could write 1,000-line programs if they worked at it 10 times longer. Or even worse, they imagine they could write 10,000-line programs if they worked 100 times longer. It doesn’t work that way. Most people who can write a 100-line program could never finish a 10,000-line program no matter how long they worked on it. They would simply drown in complexity. One of the marks of a professional programmer is knowing how to organize software so that the complexity remains manageable as the size increases. Even among professionals there are large differences in ability. The programmers who can effectively manage 100,000-line projects are in a different league than those who can manage 10,000-line projects.

(When I talk about a program that is so many lines long, I mean a program that needs to be about that long. It’s no achievement to write 1,000 lines of code for a problem that would be reasonable to solve in 10.)

Writing large buggy programs is hard. To say a program is buggy is to imply that it is at least of sufficient quality to approximate what it’s supposed to do much of the time. For example, you wouldn’t say that Notepad is a buggy web browser. A program has got to display web pages at least occasionally to be called a buggy browser.

Writing large correct programs is much harder. It’s even impossible, depending on what you mean by “large” and “correct.” No large program is completely bug-free, but some large programs have a very small probability of failure. The best programmers can think of a dozen ways to solve any problem, and they choose the way they believe has the best chance of being implemented correctly. Or they choose the way that is most likely to make an error obvious if it does occur. They know that software needs to be tested and they design their software to make it easier to test.

If you ask an amateur whether their program is correct, they are likely to be offended. They’ll tell you that of course it’s correct because they were careful when they wrote it. If you ask a professional the same question, they may tell you that their program probably has bugs, but then go on to tell you how they’ve tested it and what logging facilities are in place to help debug errors when they show up later.

You do pay for what you don’t use

Posted on 1 September 2008 by John

Modern operating systems are huge, and their size comes at a cost. When I worry out loud about the size of operating systems (or applications, or programming languages) I often get the response “What do you care? If you don’t like the new features, just don’t use them.” The objection seems to be that you don’t pay for what you don’t use. But you do. Every feature comes at some cost. Every feature is a potential source of instability. Every feature takes up developer resources and computer resources. Often the extra cost is worth it for the extra benefit, but not always. And costs can be more subtle than benefits.

Suppose a developer has a great idea for a new feature. He’s so excited that he puts in voluntary overtime to develop his feature, so the cost of his extra contribution is zero. Or is it? Not unless his enthusiasm spills over to everyone else involved so that they volunteer overtime as well. The testers, tech writers, and others who now have more work to do because of this feature are unlikely to be as excited as the developer. What was a labor of love for the developer is just plain labor for everyone else. So the new feature now takes a little time away from everything else that needs to be documented, tested, and otherwise managed, diluting overall quality.

This post was prompted by a discussion with Codewiz in the comments to his post about his woes recovering operating system problems. Along the way he mentioned a remarkably stable FreeBSD server he had and attributed its stability to the fact that he never installed any GUI on the box. Lest anyone think that only the Unix world would create a minimalist operating system, take a look at Windows Server Core. Microsoft also realizes that the features that aren’t there can’t cause problems.

New blog on reproducible research

Posted on 24 July 2008 by John

Yesterday I added a blog to the ReproducibleResearch.org website.

I’d like a couple people to join me in writing this blog, and I would greatly appreciate suggestions, guest posts, etc. If you’re interested, please send a note to contribute at the domain name.

Unit test boundaries

Posted on 23 July 2008 by John

Phil Haack has a great article on unit test boundaries. A unit test must not touch the file system, interact with a database, or communicate across a network. Tests that break these rules are necessary, but they’re not unit tests. With some hard thought, the code with external interactions can be isolated and reduced. This applies to both production and test code

As with most practices related to test-driven development, the primary benefit of unit test boundaries is the improvement in the design of the code being tested. If your unit test boundaries are hard to enforce, your production code may have architectural boundary problems. Refactoring the production code to make it easier to test will make the code better.