Posts tagged as:

Quality

Maybe NASA could use some buggy software

by John on October 8, 2009

In Coders at Work, Peter Norvig quotes NASA administrator Don Goldin saying

We’ve got to do the better, faster, cheaper. These space missions cost too much. It’d be better to run more missions and some of them would fail but overall we’d still get more done for the same amount of money.

NASA has extremely rigorous processes for writing software. They supposedly develop bug-free code; I doubt that’s true, thought I’m sure they do have exceptionally low bug rates. But this quality comes at a high price. Rumor has it that space shuttle software costs $1,500 per line to develop. When asked about the price tag, Norvig said “I don’t know if it’s optimal. I think they might be better off with buggy software.” At some point it’s certainly not optimal. If it doubles the price of a project to increase your probability of a successful mission from 98% to 99%, it’s not worth it; you’re better off running two missions with a 98% chance of success each.

Few people understand that software quality is all about probabilities of errors. Most people think the question is whether you’d rather have bug-free software or buggy software. I’d rather have bug-free software, thank you. But bug-free is not an option. Nothing humans do is perfect. All we can do is lower the probabilities of bugs. But as the probability of bugs goes to zero, the development costs go to infinity. (Actually it’s not all about probabilities of errors. It’s also about the consequences of errors. Sending back a photo with a few incorrect pixels is not the same as crashing a probe.)

Norvig’s comment makes sense regarding unmanned missions. But what about manned missions? Since one of the possible consequences of error is loss of life, the stakes are obviously higher. But does demanding flawless software increase the probability of a safe mission? One of the consequences of demanding extremely high quality software is that some tasks are too expensive to automate and so humans have to be trained to do those tasks. But astronauts make mistakes just as programmers do. If software has a smaller probability of error than an astronaut would have for a given task, it would be safer to rely on the software.

Related post:

Software in space

{ 12 comments }

Every time your software displays an error message, you risk losing credibility with your users. If the message is grammatically incorrect, your credibility definitely goes down a notch. And if the message is unhelpful, your credibility goes down at least one more notch. The same can be said for any message, but error messages are particularly important for three reasons.

  1. Users are in a bad mood when they see error messages; this is not the time to make things worse.
  2. Programmers are sloppy with error messages because they’re almost certain the messages will never be displayed.
  3. Error conditions are unusual by their very nature, and so it’s practically impossible to discover them all by black-box testing.

The best way to find error messages is to search the source code for text potentially displayed to users. I’ve advocated this practice for years and usually I encounter indifference or resistance. And yet nearly every time I extract the user-visible text from a software project I find dozens of spelling errors, grammar errors, and incomprehensible messages.

Last year I wrote an article for CodeProject on this topic and provided a script to strip text strings from source code. See PowerShell Script for Reviewing Text Shown to Users. The script looks for all quoted text and text inside XML entities. Then it tries to filter out text strings that are not displayed to users. For example, a string with no spaces is more likely to be a file name or some other code fragment than a user message. The script produces a report that a copy editor can then review. In addition to checking spelling and grammar, an editor can judge whether a message would be comprehensible and useful.

I admit that the parsing in the script is crude. It could miss some strings, and it could filter out some strings that it should keep. But the script is very simple, less than 100 lines. And it works on multiple source code types: C++, C#, XML, VB, etc. Writing a sophisticated parser for each of those languages would be a tremendous amount of work, but a quick-and-dirty script may be 98% as effective. Since most projects review 0% of their source code text, reviewing 98% is an improvement.

In addition to improving the text that a user sees, a text review gives some insight into a program’s structure. For example, if messages are repeated in multiple files, most likely the code has a lot of “clipboard inheritance,” code copied and pasted rather than isolated into reusable functions. A text review could also determine whether a program is concatenating strings to build SQL statements rather than calling stored procedures, possibly indicating a security vulnerability.

{ 6 comments }

Beautiful Testing

by John on June 2, 2009

Beautiful Testing is available for pre-order at Amazon. Proceeds from the book will go to Nothing But Nets, a project to distribute anti-malaria bed nets. I contributed a chapter on how to test random number generators.

Beautiful Testing: Leading Programmers Reveal How They Test

{ 1 comment }

Broken windows theory and programming

by John on December 31, 2008

The broken windows theory says that cracking down on petty crime reduces more serious crime. The name comes from the explanation that if a building has a few broken windows, it invites vandals to break more windows and eventually burn down the building. Turned around, this suggests that punishing vandalism could lead to a reduction in violent crime. Rudy Giuliani is perhaps the most visible proponent of the theory.  His first initiative as mayor of New York was to go after turnstile jumpers and squeegeemen as a way of reducing crime in city. Crime rates dropped dramatically during his tenure.

Rudy Giuliani photo

In the book Pragmatic Thinking and Learning, Andy Hunt applies the broken windows theory to software development.

Known problems (such as bugs in code, bad process in an organization, poor interfaces, or lame management) that are uncorrected have a debilitating, viral effect that ends up causing even more damage.

I’ll add a couple of my pet peeves to Andy Hunt’s list.

The first is compiler warnings. I can’t understand why some programmers are totally comfortable with their code having dozens of compiler warnings. They’ll say “Oh yeah, I know about that. It’s not a problem.” But then when a warning shows up that is trying to tell them something important, the message gets lost in the noise. My advice: Just fix the code. In very exceptional situations, explicitly turn off the warning.

The second is similar. Many programmers blithely ignore run-time exceptions that are written to an event log. As with compile warnings, they justify that these exceptions are not really a problem. My advice: If it’s not really a problem, then don’t log it. Otherwise, fix it.

{ 0 comments }

Michael Feathers on refactoring

by John on December 1, 2008

Michael Feathers wrote one of my favorite books on unit testing: Working Effectively with Legacy Code. Some books on unit testing just give abstract platitudes. Feather’s book wrestles with the hard, messy problem of retrofitting unit tests to existing code.

The .NET Rocks podcast had an interview with Michael Feathers recently. The whole interview is worth listening to, but here I’ll just recap a couple things he said about refactoring that I thought were insightful. First, most people agree that you need to have unit tests in place before you can do much refactoring. The unit tests give you the confidence to refactor without worrying that you’ll break something in the process and not know that you broke it. But Feathers adds that you might have to do some light refactoring before you can put the unit tests in place to allow more aggressive refactoring.

The second thing he mentioned about refactoring was the technique called “scratch refactoring.” With this approach, you refactor quickly without worrying about whether you are introducing bugs in order to see where you want to go. But then you completely throw away those changes and refactor carefully. Sometimes you need to do a dry run first to see what patterns emerge and determine where you want to go.

Both of these observations are ways to break out of a chicken-and-egg cycle, needing to refactor before you can refactor.

{ 1 comment }

Errors in math papers not a big deal?

by John on November 11, 2008

Daniel Lemire wrote a blog post this morning that ties together a couple themes previously discussed here.

Most published math papers contain errors, and yet there have been surprisingly few “major screw-ups” as defined by Mark Dominus. Daniel Lemire’s post quotes Doron Zeilberger on why these frequent errors are often benign.

Most mathematical papers are leaves in the web of knowledge, that no one reads, or will ever use to prove something else. The results that are used again and again are mostly lemmas, that while a priori non-trivial, once known, their proof is transparent. (Zeilberger’s Opinion 91)

Those papers that are “branches” rather than “leaves” receive more scrutiny and are more likely to be correct.

Zeilberger says lemmas get reused more than theorems. This dovetails with Mandelbrot’s observation mentioned a few weeks ago.

Many creative minds overrate their most baroque works, and underrate the simple ones. When history reverses such judgments, prolific writers come to be best remembered as authors of “lemmas,” of propositions they had felt “too simple” in themselves and had to be published solely as preludes to forgotten theorems.

There are obvious analogies to software.  Software that many people use has fewer bugs than software that few people use, just as theorems that people build on have fewer bugs than “leaves in the web of knowledge.” Useful subroutines and libraries are more likely to be reused than complete programs. And as Donald Knuth pointed out, re-editable code is better than black-box reusable code.

Everybody knows that software has bugs, but not everyone realizes how buggy theorems are. Bugs in software are more obvious because paper doesn’t abort. Proofs and programs are complementary forms of validation. Attempting to prove the correctness of an algorithm certainly reduces the chances of a bug, but proofs are fallible as well. Again quoting Knuth, he once said “Beware of bugs in the above code; I have only proved it correct, not tried it.” Not only can programs benefit from being more proof-like, proofs can benefit from being more program-like.

{ 2 comments }

Why 90% solutions may beat 100% solutions

by John on November 7, 2008

I’ve never written a line of Ruby, but I find Ruby on Rails fascinating. From all reports, the Rails framework lets you develop a web site much faster than you could using other tools, provided you can live with its limitations. Rails emphasizes consistency and simplicity, deliberately leaving out support for some contingencies.

I listened to an interview last night with Ruby developer Glenn Vanderburg. Here’s an excerpt that I found insightful.

In the Java world, the APIs and libraries … tend to be extremely thorough in trying to solve the entire problem that they are addressing and [are] somewhat complicated and difficult to use. Rails, in particular, takes exactly the opposite philosophy … Rails tries to solve the 90% of the problem that everybody has and that can be solved with 10% of the code. And it punts on that last 10%. And I think that’s the right decision, because the most complicated, odd, corner cases of these problems tend to be the things that can be solved by the team in a specific and rather simple way for one application. But if you try to solve them in a completely general way that everybody can use, it leads to these really complicated APIs and complicated underpinnings as well.

The point is not to pick on Java. I believe similar remarks apply to Microsoft’s libraries, or the libraries of any organization under pressure to be all things to all people. The Ruby on Rails community is a small, voluntary association that can turn away people who don’t like their way of doing things.

At first it sounds unprofessional to develop a software library does anything less than a thorough solution to the problem it addresses. And in some contexts that is true, though every library has to leave something out. But in other contexts, it makes sense to leave out the edge cases that users can easily handle in their particular context. What is an edge case to a library developer may be bread and butter to a particular set of users. (Of course the library provider should document explicitly just what part of the problem their code does and does not solve.)

Suppose that for some problem you really can write the code that is sufficient for 90% of the user base with 10% of the effort of solving the entire problem. That means a full solution is 10 times more expensive to build than a 90% solution.

Now think about quality. The full solution will have far more bugs. For starters, the extra code required for the full solution will have a higher density of bugs because it deals with trickier problems. Furthermore, it will have far fewer users per line of code — only 10% of the community cares about it in the first place, and of that 10%, they all care about different portions. With fewer users per line of code, this extra code will have more unreported bugs. And when users do report bugs in this code, the bugs will be a lower priority to fix because they impact fewer people.

So in this hypothetical example, the full solution costs an order of magnitude more to develop and has maybe two orders of magnitude more bugs.

{ 8 comments }

Programmers aren’t reading programming books

by John on September 23, 2008

In the interview with Charles Petzold I mentioned in my previous post, Petzold talks about the sharp decline in programming book sales. At one time, nearly every Windows programmer owned a copy of Petzold’s first book, especially in its earlier editions. But he said that now only 4,000 people have purchased his recent 3D programming book.

Programming book sales have plummeted, not because there is any less to learn, but because there is too much to learn. Developers don’t want to take the time to thoroughly learn any technology they suspect will become obsolete in a couple years, especially if its only one of many technologies they have to use. So they plunge ahead using tools they have never systematically studied. And when they get stuck, they Google for help and hope someone else has blogged about their specific problem.

Companies have cut back on training at the same time that they’re expecting more from software. So programmers do the best they can. They jump in and write code without really understanding what they’re doing. They guess and see what works. And when things don’t work, they Google for help. It’s the most effective thing to do in the short term. In the longer term it piles up technical debt that leads to a quality disaster or a maintenance quagmire.

{ 3 comments }

Writes large correct programs

by John on September 19, 2008

I had a conversation yesterday with someone who said he needed to hire a computer scientist.  I replied that actually he needed to hire someone who could program, and that not all computer scientists could program. He disagreed, but I stood by my statement.  I’ve known too many people with computer science degrees, even advanced degrees, who were ineffective software developers.  Of course I’ve also known people with computer science degrees, especially advanced degrees, that were terrific software developers.  The most I’ll say is that programming ability is positively correlated with computer science achievement.

The conversation turned to what it means to say someone can program.  My proposed definition was someone who could write large programs that have a high probability of being correct.  Joel Spolsky wrote a good book last year called Smart and Gets Things Done about recruiting great programmers.  I agree with looking for someone who is “smart and gets things done,” but “writes large correct programs” may be easier to explain. The two ideas overlap a great deal.

People who are not professional programmers often don’t realize how the difficulty of writing software increases with size.  Many people who wrote 100-line programs in college imagine that they could write 1,000-line programs if they worked at it 10 times longer.  Or even worse, they imagine they could write 10,000-line programs if they worked 100 times longer. It doesn’t work that way.  Most people who can write a 100-line program could never finish a 10,000-line program no matter how long they worked on it.  They would simply drown in complexity.  One of the marks of a professional programmer is knowing how to organize software so that the complexity remains manageable as the size increases.  Even among professionals there are large differences in ability.  The programmers who can effectively manage 100,000-line projects are in a different league than those who can manage 10,000-line projects.

(When I talk about a program that is so many lines long, I mean a program that needs to be about that long. It’s no achievement to write 1,000 lines of code for a problem that would be reasonable to solve in 10.)

Writing large buggy programs is hard.  To say a program is buggy is to imply that it is at least of sufficient quality to approximate what it’s supposed to do much of the time.  For example, you wouldn’t say that Notepad is a buggy web browser.  A program has got to display web pages at least occassionally to be called a buggy browser.

Writing large correct programs is much harder.  It’s even impossible, depending on what you mean by “large” and “correct.”  No large program is completely bug-free, but some large programs have a very small probability of failure.  The best programmers can think of a dozen ways to solve any problem, and they choose the way they believe has the best chance of being implemented corrrectly.  Or they choose the way that is most likely to make an error obvious if it does occur.  They know that software needs to be tested and they design their software to make it easier to test.

If you ask an amateur whether their program is correct, they are likely to be offended.  They’ll tell you that of course it’s correct because they were careful when they wrote it.  If you ask a professional the same question, they may tell you that their program probably has bugs, but then go on to tell you how they’ve tested it and what logging facilities are in place to help debug errors when they show up later.

{ 17 comments }

You do pay for what you don’t use

by John on September 1, 2008

Modern operating systems are huge, and their size comes at a cost. When I worry out loud about the size of operating systems (or applications, or programming languages) I often get the response “What do you care? If you don’t like the new features, just don’t use them.” The objection seems to be that you don’t pay for what you don’t use. But you do. Every feature comes at some cost. Every feature is a potential source of instability. Every feature takes up developer resources and computer resources. Often the extra cost is worth it for the extra benefit, but not always. And costs can be more subtle than benefits.

Suppose a developer has a great idea for a new feature. He’s so excited that he puts in voluntary overtime to develop his feature, so the cost of his extra contribution is zero. Or is it? Not unless his enthusiasm spills over to everyone else involved so that they volunteer overtime as well. The testers, tech writers, and others who now have more work to do because of this feature are unlikely to be as excited as the developer.  What was a labor of love for the developer is just plain labor for everyone else. So the new feature now takes a little time away from everything else that needs to be documented, tested, and otherwise managed, diluting overall quality.

This post was prompted by a discussion with Codewiz in the comments to his post about his woes recovering operating system problems. Along the way he mentioned a remarkably stable FreeBSD server he had and attributed its stability to the fact that he never installed any GUI on the box. Lest anyone think that only the Unix world would create a minimalist operating system, take a look at Windows Server Core. Microsoft also realizes that the features that aren’t there can’t cause problems.

{ 3 comments }

New blog on reproducible research

by John on July 24, 2008

Yesterday I added a blog to the ReproducibleResearch.org web site. You can visit the site here or subscribe via RSS.

I’d like a couple people to join me in writing this blog, and I would greatly appreciate suggestions, guest posts, etc. If you’re interested, please send a note to contribute at the domain name.

{ 0 comments }

Unit test boundaries

by John on July 23, 2008

Phil Haack has a great article on unit test boundaries. A unit test must not touch the file system, interact with a database, or communicate across a network. Tests that break these rules are necessary, but they’re not unit tests. With some hard thought, the code with external interactions can be isolated and reduced. This applies to both production and test code

As with most practices related to test-driven development, the primary benefit of unit test boundaries is the improvement in the design of the code being tested. If your unit test boundaries are hard to enforce, your production code may have architectural boundary problems. Refactoring the production code to make it easier to test will make the code better.

{ 0 comments }

Scaling the number of projects

by John on July 16, 2008

Software engineers typically use the term “horizontal scalability” to mean throwing servers at a problem. A web site scales horizontally if you can handle increasing traffic simply by adding more servers to a server farm. I think of horizontal scalability as scalability as the number of projects increases, rather than increasing the performance demands on a single project. My biggest challenges have come from managing lots of small projects, more projects than developers.

I’ve seen countless books and articles about how to scale a single project, but I don’t remember ever seeing anything written about scaling the number of projects. It sounds easy to manage independent projects: if the projects are for different clients and they have different developers, just let each one go their own way. But there are two problems. One is a single developer maintaining an accumulation of his or her own projects, and the other is the ability (or more important, the inability) of peers to maintain each other’s projects. Projects that were independent during development become dependent in maintenance because they are maintained at the same time by the same people. Consistency across projects didn’t seem necessary during development, but then in maintenance you look back and wish there had been more consistency.

Maintenance becomes a tractor pull. Robert Martin describes a software tractor pull in his essay The Tortoise and the Hare:

Have you ever been to a tractor pull? Imagine a huge arena filled with mud and churned up soil. Huge tractors tie themselves up to devices of torture and try to pull them across the arena. The devices get harder to pull the farther they go. They are inclined planes with wheels on the rear and a wide shoe at the front that sits squarely on the ground. There is a huge weight at the rear that is attached to a mechanism that drags the weight up the inclined plane and over the shoe as the wheels turn. This steadily increases the weight over the shoe until the friction overcomes the ability of the tractor.

Writing software is like a tractor pull. You start out fast without a lot of friction. Productivity is high, and you get a lot done. But the more you write the harder it gets to write more. The weight is being dragged up over the shoe. The more you write the more the mess builds. Productivity slows. Overtime increases. Teams grow larger. More and more code is piled up over the shoe, and the development team grinds to a halt unable to pull the huge mass of code any farther through the mud.

Robert Martin had in mind a single project slowing down over time, but I believe his analogy applies even better to maintenance of multiple projects.

To scale your number of projects you’ve got to enforce consistency before there’s an immediate need for it. But there you face several dangers. Enforcing apparently unnecessary consistency could make you appear arbitrary and damage morale. And you’ll make some wrong decisions. You’ve got to have a lot of experience to predict what sort of policies you’ll wish in the future that you had enforced. These issues are challenging when scaling a single project, but they are more of challenging when scaling across smaller projects because you don’t get feedback as quickly. On a single large project, you may feel the pain of a bad decision quickly, but with multiple small projects you may not feel the pain until much later.

Quality is critical when scaling the number of projects. Each project needs to be better than seems necessary. When you look at a single project in isolation, maybe it’s acceptable to have one bug report a month. But then when you have an accumulation of such projects, you’ll get bug reports every day. And the cost per bug fix goes up over time because developers can most easily fix bugs in the code freshest in their minds. Fixing a bug in an old project that no one wants to think about anymore will be unpleasant and expensive.

Scaling your number of projects requires more discipline than scaling a single project because feedback takes longer. Although scaling single projects gets far more attention, I suspect a lot of people are struggling with scaling their number of projects.

{ 2 comments }

Quantity and quality

by John on July 3, 2008

Here’s a quote from a recent blog post from Tom Peters:

You will be remembered in the long haul for the quality of your work, not the quantity of your work—the quantity part is just your defective ego talking—no one evaluates Picasso based on the number of paintings he churned out.

{ 2 comments }

Wine, Beer, and Statistics

by John on June 27, 2008

William Gosset discovered the t-distribution while working for the Guinness brewing company. Because his employer prevented employees from publishing papers, Gosset published his research under the pseudonym Student. That’s why his distribution is often called Student’s t-distribution.

This story is fairly well known. It often appears in the footnotes of statistics textbooks. However, I don’t think many people realize why it’s not surprising that fundamental statistical research should come from a brewery, and why we don’t hear of statistical research coming out of wineries.

Beer makers pride themselves on consistency while wine makers pride themselves on variety. That’s why you’ll never hear beer fans talk about a “good year” the way wine connoisseurs do. Because they value consistency, beer makers invest more in extensive statistical quality control than wine makers do.

{ 9 comments }