Dogfooding

Dogfooding refers companies using their own software. According to Wikipedia,

In 1988, Microsoft manager Paul Maritz sent Brian Valentine, test manager for Microsoft LAN Manager, an email titled “Eating our own Dogfood”, challenging him to increase internal usage of the company’s product. From there, the usage of the term spread through the company.

Dogfooding is a great idea, but it’s no substitute for usability testing. I get the impression that some products, if they’re tested at all, are tested by developers intimately familiar with how they’re intended to be used.

If your company is developing consumer software, it’s not dogfooding if just the developers use it. It’s dogfooding when people in sales and accounting use it. But that’s still no substitute for getting people outside the company to use it.

Dogfooding doesn’t just apply to software development. Whenever I buy something with inscrutable assembly instructions, I wonder why the manufacturer didn’t pay a couple people off the street to put the thing together on camera.

Defensible software

It’s not enough for software to be correct. It has to be defensible.

I’m not thinking of defending against malicious hackers. I’m thinking about defending against sincere critics. I can’t count how many times someone was absolutely convinced that software I had a hand in was wrong when it in fact it was performing as designed.

In order to defend software, you have to understand what it does. Not just one little piece of it, but the whole system. You need to understand it better than the people who commissioned it: the presumed errors may stem from unforeseen consequences of the specification.

Related post: The buck stops with the programmer

Hunt down bad error messages

My printer is unable to clean 51. It won’t work, and all it says is “Unable to Clean 51.”

Here’s my suggestion for finding such useless error messages in a code review: Write a script to extract all string literals from your source code, then read over the output. The beauty of this approach is that the reviewer sees the text without the context of the surrounding code, just as the user does. Ideally someone who is not a programmer would review the output strings. I would hope someone who ran across “Unable to Clean 51” would flag that as something that doesn’t make sense.

It’s not hard to write a script to pull out string literals. If you’d like, you could use the script that accompanies my Code Project article PowerShell Script for Reviewing Text Shown to Users. That script tries to filter out strings that are not user output, such as file paths. It’s a pretty crude script, attempting to handle source code written in several languages. It will have a few false positives, and a few false negatives, but it works quite well for a short script. (Code Project’s “browse source” tab doesn’t work for PowerShell source. You’ll have to download the code to read it.)

Whenever I recommend this script, I run into a few objections that I’ll address below.

Q: Wouldn’t it be better if programmers put all user output text in string tables rather than putting user output text in the main body of their code?

A: Yes, but that takes more effort, and it’s not how most programmers work. And even if you have a policy to put all output strings in a resource table, you’d need something like this script to enforce the policy.

Q: Wouldn’t it be better to have a real parser extract the strings rather than doing regular expression guesswork?

A: Sure.

Q: Wouldn’t it be better to use a spell checker that’s built into your IDE?

A: No. The purpose of the review is not just to check spelling. You also want to catch grammar errors and unhelpful messages.

Q: Wouldn’t it be better to do a complete code review?

A: Yes and no. Complete code reviews are great for overall code quality. But they take a lot of effort and don’t happen often. A review of just the extracted strings takes far, far less time. Also, viewing the output strings out of context is better for catching unhelpful or ungrammatical messages.

Q: Isn’t this simplistic? Doesn’t every company do something like this?

A: If they do, why do I continually find spelling errors, grammatical errors, and unhelpful messages in the software I use?

How things break

Venkatesh Rao wrote a blog post today Stress Failures versus Decay Failures. It reminded me of three other resources I recommend on how things break. The first is about how things literally break. For example, why the steel in the Titanic was brittle.

The other two are about how complex systems break.

Bugs, features, and risk

All software has bugs. Someone has estimated that production code has about one bug per 100 lines. Of course there’s some variation in this number. Some software is a lot worse, and some is a little better.

But bugs-per-line-of-code is not very useful for assessing risk. The risk of a bug is the probability of running into it multiplied by its impact. Some lines of code are far more likely to execute than others, and some bugs are far more consequential than others.

Devoting equal effort to testing all lines of code would be wasteful. You’re not going to find all the bugs anyway, so you should concentrate on the parts of the code that are most likely to run and that would produce the greatest harm if they were wrong.

However, here’s a complication. The probability of running into a bug can change over time as people use the software in new ways. For whatever reason people to want to use features that had not been exercised before. When they do so, they’re likely to uncover new bugs.

(This helps explain why everyone thinks his preferred software is more reliable than others. When you’re a typical user, you tread the well-tested paths. You also learn, often subconsciously, to avoid buggy paths. When you bring your expectations from an old piece of software to a new one, you’re more likely to uncover bugs.)

Even though usage patterns change, they don’t change arbitrarily. It’s still the case that some code is far more likely than other code to execute.

Good software developers think ahead. They solve more than they’re asked to solve. They think “I’m going to go ahead and include this other case while I’m at it in case they need it later.” They’re heroes when it turns out their guesses about future needs were correct.

But there’s a downside to this initiative. You pay for what you don’t use. Every speculative feature either has to be tested, incurring more expense up front, or delivered untested, incurring more risk. This suggests its better to disable unused features.

You cannot avoid speculation entirely. Writing maintainable software requires speculating well, anticipating and preparing for change. Good software developers place good bets, and these tend to be small bets, going to a little extra effort to make software much more flexible. As with bugs, you have to consider probabilities and consequences: how likely is this part of the software to change, and how much effort will it take to prepare for that change?

Developers learn from experience what aspects of software are likely to change and they prepare for that change. But then they get angry at a rookie who wastes a lot of time developing some unnecessary feature. They may not realize that the rookie is doing the same thing they are, but with a less informed idea of what’s likely to be needed in the future.

Disputes between developers often involve hidden assumptions about probabilities. Whether some aspect of the software is responsible preparation for maintenance or wasteful gold plating depends on your idea of what’s likely to happen in the future.

Related: Why programmers write unneeded code

Software exoskeletons

There’s a major divide between the way scientists and programmers view the software they write.

Scientists see their software as a kind of exoskeleton, an extension of themselves. Think Dr. Octopus. The software may do heavy lifting, but the scientists remain actively involved in its use. The software is a tool, not a self-contained product.

Spiderman versus Dr. Ock

Programmers see their software as something they will hand over to someone else, more like building a robot than an exoskeleton. Programmers believe it’s their job to encapsulate intelligence in software. If users have to depend on programmers after the software is written, the programmers didn’t finish their job.

I work with scientists and programmers, often bridging the gaps between the two cultures. One point of tension is defining when a project is done. To a scientist, the software is done when they get what they want out of it, such as a table of numbers for a paper. Professional programmers give more thought to reproducibility, maintainability, and correctness. Scientists think programmers are anal retentive. Programmers think scientists are cowboys.

Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision. Scientists need to understand that prototype code may need a complete rewrite before it can be used in production.

The real tension comes when a piece of research software is suddenly expected to be ready for production. The scientist will say “the code has already been written” and can’t imagine it would take much work, if any, to prepare the software for its new responsibilities. They don’t understand how hard it is for an engineer to turn an exoskeleton into a self-sufficient robot.

More software development posts

Pilots and pair programming

From Outliers by Malcolm Gladwell:

In commercial airlines, captains and first officers split the flying duties equally. But historically, crashes have been far more likely to happen when the captain is in the “flying seat.” At first this seems to make no sense, since the captain is almost always the pilot with the most experience. … Planes are safer when the least experienced pilot is flying, because it means the second pilot isn’t going to be afraid to speak up.

The context of this excerpt is an examination of airplane crashes in which the copilot was aware of the pilot’s errors but did not speak up assertively.

I wonder whether an analogous result holds for pair programming. Do more bugs slip into the code when the more experienced programmer has the keyboard? The German aerospace company DLR thinks so. The company pairs junior and senior programmers. The junior programmer writes all the code while the senior programmer watches.

Related posts

How to test a random number generator

Last year I wrote a chapter for O’Reilly’s book Beautiful Testing (ISBN 0596159811). The publisher gave each of us permission to post our chapters online, and so here is Chapter 10: How to test a random number generator.

Update: The chapter linked to above describes how to test transformations of a trusted uniform random number generator. For example, maybe your programming language provides a way to generate numbers between 0 and 1 uniformly, and you have written code to transform that into a normal distribution. That’s the most common case. Few people write their own core RNG; most bootstrap the core RNG into other kinds of RNG.

If you need to test a uniform random number generator, I can help with that.

Buggy code is biased code

As bad as corporate software may be, academic software is usually worse. I’ve worked in industry and in academia and have seen first-hand how much lower the quality bar is in academia. And I’m not the only one who has noticed this.

Why does this matter? Because buggy code is biased code. Bugs that cause the software to give unwanted results are more likely to be noticed and fixed. Bugs that cause software to produce expected results are more likely to remain in place.

If your software simulates some complex phenomena, you don’t know what it’s supposed to do; that’s why you’re simulating. Errors are easier to spot in consumer software. A climate model needs a higher level of quality assurance than a word processor because bugs in the latter are more obvious. Genomic analysis may contain egregious errors and no one ever know, but a bug in an MP3 player is sure to annoy users.

You have to test simulation software carefully. You have to test special cases and individual components to have any confidence in the final output. You can’t just look at the results and say “Yeah, that’s about what I expected.”

Related posts

Acknowledging problems versus solving problems

People want their problems acknowledged more than they want them solved, at least at first. That’s one of the points from Thomas Limoncelli’s book Time Management for System Administrators.

Suppose two system administrators get an email about similar problems. The first starts working on the problem right away and replies to the email a couple hours later saying the problem is fixed. The second replies immediately to say he understands the problem and will resolve it first thing tomorrow. The second system administrator will be more popular.

Of course people want their problems solved, and sooner is better than later. But first they want to know someone is listening. Sometimes that’s all they want.

Related posts