All software has bugs. Someone has estimated that production code has about one bug per 100 lines. Of course there’s some variation in this number. Some software is a lot worse, and some is a little better.
But bugs-per-line-of-code is not very useful for assessing risk. The risk of a bug is the probability of running into it multiplied by its impact. Some lines of code are far more likely to execute than others, and some bugs are far more consequential than others.
Devoting equal effort to testing all lines of code would be wasteful. You’re not going to find all the bugs anyway, so you should concentrate on the parts of the code that are most likely to run and that would produce the greatest harm if they were wrong.
However, here’s a complication. The probability of running into a bug can change over time as people use the software in new ways. For whatever reason people to want to use features that had not been exercised before. When they do so, they’re likely to uncover new bugs.
(This helps explain why everyone thinks his preferred software is more reliable than others. When you’re a typical user, you tread the well-tested paths. You also learn, often subconsciously, to avoid buggy paths. When you bring your expectations from an old piece of software to a new one, you’re more likely to uncover bugs.)
Even though usage patterns change, they don’t change arbitrarily. It’s still the case that some code is far more likely than other code to execute.
Good software developers think ahead. They solve more than they’re asked to solve. They think “I’m going to go ahead and include this other case while I’m at it in case they need it later.” They’re heroes when it turns out their guesses about future needs were correct.
But there’s a downside to this initiative. You pay for what you don’t use. Every speculative feature either has to be tested, incurring more expense up front, or delivered untested, incurring more risk. This suggests its better to disable unused features.
You cannot avoid speculation entirely. Writing maintainable software requires speculating well, anticipating and preparing for change. Good software developers place good bets, and these tend to be small bets, going to a little extra effort to make software much more flexible. As with bugs, you have to consider probabilities and consequences: how likely is this part of the software to change, and how much effort will it take to prepare for that change?
Developers learn from experience what aspects of software are likely to change and they prepare for that change. But then they get angry at a rookie who wastes a lot of time developing some unnecessary feature. They may not realize that the rookie is doing the same thing they are, but with a less informed idea of what’s likely to be needed in the future.
Disputes between developers often involve hidden assumptions about probabilities. Whether some aspect of the software is responsible preparation for maintenance or wasteful gold plating depends on your idea of what’s likely to happen in the future.
Related: Why programmers write unneeded code
I work in the biotech industry, making lab tools (think cheap versions of the stuff on CSI)
And software usability is always an issue.
My take is that , still, spending money on software is a lower priority then spending money on hardware or electronics; i guess the thought is that you can sell a piece of working hardware, even if the code is crappy, but you can’t sell good code without hardware.
So the most complex (in terms of paths the customer can follow) thing gets the least attention.
we try and deal with this by putting the minimal feature set into the code; we add stuff we only really, really need.
I think another factor is that the people who really control the money – CEOs – rarely actually USE (as real users) the product, so they have no idea how good or bad the code is; they see gross margins and quarterly increase/decrease in sales, and that is what they know.
Also, code is $ (really !!) programmers are $$, managing them is $$, testing the code is $$; in any real business, the operating principal is, if it isn’t affecting sales, it isn’t a problem
I think there is some value in distinguishing between speculative features and corrective measures that make software more usable. They are both additional code lines, and they will both incur the penalties of risk or testing that you note above. In my mind, the distinction is that corrective measures make software more useful for all users, whereas speculative features benefit only a few and are therefore harder to justify.
For example, suppose you have a program that can take input from a flat file. Initially you support ASCII tab-delimited and CSV file inputs. A programmer adding speculative features might ask, “What kind of file support would I need in the future?” and add unicode support. A programmer adding corrective features might ask, “What could go wrong with the import?” and add code to support corrections to bad rows during the import, rather than simply quitting with a message that the file has bad data.
I think the latter improves the usability of the program for everyone, while the former, while a nice forethought, doesn’t add as much overall utility. I’m almost always in favor of proactive corrective measures; much less a fan of new features until they are needed.
/ejt
All programming is not the same. Just as you use different construction techniques whether you are building a bird house, a single family home, or a 50-floor sky scraper, so too do you use different programming techniques to build a throw away test program or a giant multi-user, multi-site software package. Unfortunately, when the small, prototype program works well (for its small, single purpose) it often get put into use as a piece of production software. As new requirements cause the scope of the software to expand, the program fails to scale up because it was never built with the right foundation.
What John terms as good software developers are often those with experience, who have seen this progression before, and therefore are more likely to design code with scalability in mind right from the beginning. That is what makes them “good.”
Jim: Scalability is one form of anticipating change. So you have to consider how likely it is that you will need to scale and how much effort your preparation takes.
John,
As background to my comment, I’m a fan of your blog and frequent reader.
This is probably my all time favorite post of yours. This is because: (a) I strongly agree with the points you make and (b) because your insights hit close to home for me personally. After working as a corporate lawyer and IT consulting, I chose to change careers and focus on the software testing industry by founding a software testing tool company.
One of the reasons I like your blog posts (generally) is that you’ve got similar interests, including: software development, math, business, and statistics.
Which leads me to the following personal assertion and request for your reaction to it. (Apologies if this diverges into a new topic not explicitly raised in your post).
General Assertion: The software testing field would be far more efficient and effective overall if it heavily borrowed from lessons learned in applied statistics, particularly the lessons manufacturers, advertisers and others have learned over decades about how to design experiments intelligently through Design of Experiments methods.
Specific Assertion: Design of Experiments methods, when used to create software tests, maximize coverage in a minimal number of test cases. When this is done (or at least when it is done well), efficiency and effectiveness of software testing increases dramatically as compared to when testers select and document tests by hand. Evidence: see, e.g., http://hexawise.com/combinatorial-and-pairwise-testing This is because DoE-based tests (such as pairwise tests) minimize wasteful repetition and allow testers to zero in on a small set of unusually powerful tests. More information about these methods are available at, e.g., combinatorialtesting.com
Extremely Low Awareness of these Testing Methods: Based on my experience of talking to hundreds of testers and their managers at Fortune 500 firms over the last 5+ years, very few (e.g., fewer than 5% of testers) testers have heard about these powerful testing methods. As a result, they routinely write test cases that inadvertently contain a great deal of wasteful repetition and many gaps in coverage.
Questions / Request for Your Reaction: First, as someone with an impressive understanding of math, statistics, software development, and software testing, have you had experiences (successful or otherwise) of using DoE-based methods to select/prioritize which test cases should be executed? Second, do you have any predictions about whether these DoE-based methods will gradually become more and more popular in the software testing field to the point where they become a critically important part of many/most large testing projects executed by Fortune 500 firms (as DoE-based methods have become a critically important part of many/most large experimental projects in manufacturing, advertising, agriculture, etc.)?
“Disputes between developers often involve hidden assumptions about probabilities. ”
Very true. Which means that the way that software is developed depends on how the individuals involved resolve those disputes. True for much of everything we create though.
@Edward Trudeau: re. your example with reading a file. Nope. A programmer used to exercise speculative generality would immediately add code to import from JSON, XML and whatnot. A programmer who knows that speculative generality is expensive (you pay for what you don’t use) will just decouple the import via a generic file import interface, and pack the code to import from various formats inside distinct implementations of that interface, so that adding additional formats later on has to touch very little in the application. The rest is only proper exception handling (if you’re not in C++, where exceptions are more or less useless). An importer using unicode instead of ASCII would be just another variety of importer, maybe derived from the ASCII version.
@John Cook: even worse. Einstein (AFAIK) has coined the term mind experiment. What I’ve discovered is that most programmers, many very well prepared and very efficient engineers among them, almost never do experiments in their mind. How can you expect to reach somewhere, if you don’t think about it in detail, and build a pretty detailed idea of where you want to get before you start walking? Software engineers seem not to learn this in school.
Out of curiosity, more than 10 years ago I started keeping track of the code I write, and what of it I actually test. Ever since, when I have to go fix a bug, I can clearly see whether the bug is in tested or untested code. Nearly all bugs are in untested code (mine or others). I know, duh, but we do our due diligence testing like everybody else. We all debug my code — it works perfectly before we check it in. We do testing of the entire application. We test the typical usage cases. We test fringe cases. Sometimes we even test fringe cases that are very unlikely a customer will run into. Even with all that, most bugs are still in untested code.
A crash is about the greatest harm you can do. In my observation, untested code either works perfectly, or crashes the app, there’s almost no in between. So, if one does as you suggest, they will be leaving lots of crashes for their customers to experience. Exhaustively testing every line of code is difficult and costly, but the closer to this ideal you can afford to get, the happier your customers will be. Happy customers = brand loyalty = repeat purchases = low cost of sales = more profits = everybody’s happy.
patbob: Another option is to delete code you haven’t tested. Instead of saying “We don’t think this software will ever be used, so we’re not going to test it.” you could say “We don’t think this software will ever be used, so we’re removing it.” Then if customers clamor for the missing functionality, you put it back in, test it thoroughly, and release it.
“Disputes between developers often involve hidden assumptions about probabilities. ”
Nails it! one easy way to spot it is to look for the words “well, what if?..” or , “tomorrow if we…”. The disputes produce good code, but only when an experienced programmer is the one making the argument, novices try to anticipate 10 future scenarios and try to make code flexible for all cases, the expert just chooses the most likely scenario and it often pays off.