Rewarding complexity

Clay Shirky wrote an insightful article recently entitled The Collapse of Complex Business Models. The last line of the article contains the observation

… when the ecosystem stops rewarding complexity, it is the people who figure out how to work simply in the present, rather than the people who mastered the complexities of the past, who get to say what happens in the future.

It’s interesting to think how ecosystems reward complexity or simplicity.

Academia certainly rewards complexity. Coming up with ever more complex models is the safest road to tenure and fame. Simplification is hard work and isn’t good for your paper count.

Political pundits are rewarded for complex analysis, though politicians are rewarded for oversimplification.

The software market has rewarded complexity, but that may be changing.

Related posts:

Maintenance costs

No engineered structure is designed to be built and then neglected or ignored. — Henry Petroski

The quote above comes from Henry Petroski’s recent interview on Tech Nation. In the same interview, Petroski says that a common rule of thumb is that maintenance costs about 4% of construction cost per year. For a structure as old as the Golden Gate Bridge (completed in 1937), for example, that’s a lot of 4%’s.

Golden Gate Bridge

Painting the bridge has cost far more than building it. The bridge is painted continuously: as soon as the painters reach the end of the bridge, they turn around and start over. The engineers who designed the bridge knew this would happen. When you build something out of steel and put it outside, it will need to be painted. It was all part of the design.

Image credit: Wikipedia

Related links:

Two kinds of software challenges
Do you really want to be indispensable?
Upcoming Y2K-like problems
The Essential Engineer, Henry Petroski’s new book

Software sins of omission

The Book of Common Prayer contains the confession

… we have left undone those things which we ought to have done, and we have done those things which we ought not to have done.

The things left undone are called sins of omission; things which ought not to have been done are called sins of commission.

In software testing and debugging, we focus on sins of commission, code that was implemented incorrectly. But according to Robert Glass, the majority of bugs are sins of omission. In Frequently Forgotten Fundamental Facts about Software Engineering Glass says

Roughly 35 percent of software defects emerge from missing logic paths, and another 40 percent are from the execution of a unique combination of logic paths.

If these figures are correct, three out of four software bugs are sins of omission, errors due to things left undone. These are bugs due to contingencies the developers did not think to handle. Three quarters seems like a large proportion, but it is plausible. I know I’ve written plenty of bugs that amounted to not considering enough possibilities, particularly in graphical user interface software. It’s hard to think of everything a user might do and all the ways a user might arrive at a particular place. (When I first wrote user interface applications, my reaction to a bug report would be “Why would anyone do that?!” If everyone would just use my software the way I do, everything would be OK.)

It matters whether bugs are sins of omission or sins of commission. Different kinds of bugs are caught by different means. Developers have come to appreciate the value of unit testing lately, but unit tests primarily catch sins of commission. If you didn’t think to program something in the first place, you’re not likely to think to write a test for it. Complete test coverage could only find 25% of a projects bugs if you assume 75% of the bugs come from code that no one thought to write.

The best way to spot sins of omission is a fresh pair of eyes. As Glass says

Rigorous reviews are more effective, and more cost effective, than any other error-removal strategy, including testing. But they cannot and should not replace testing.

One way to combine the benefits of unit testing and code reviews would be to have different people write the unit tests and the production code.

Related posts:

Counterfeit coins and rare diseases

Here’s a puzzle I saw a long time ago that came to mind recently.

You have a bag of 27 coins. One of these coins is counterfeit and the rest are genuine. The genuine coins all weigh exactly the same, but the counterfeit coin is lighter. You have a simple balance. How can you find the counterfeit coin while using the scale the least number of times?

The surprising answer is that the false coin can be found while only using the scales only three times. Here’s how. Put nine coins on each side of the balance. If one side is lighter, the counterfeit is on that side; otherwise, it is one of the nine not on the scales. Now that you’ve narrowed it down to nine coins, apply the same idea recursively by putting three of the suspect coins on each side of the balance. The false coin is now either on the lighter side if the scales do not balance or one of the three remaining coins if the scales do balance. Now apply the same idea one last time to find which of the remaining three coins is the counterfeit. In general, you can find one counterfeit in 3n coins by using the scales n times.

The counterfeit coin problem came to mind when I was picking out homework problems for a probability class and ran into the following (problem 4.56 here):

A large number, N = mk, of people are subject to a blood test. This can be administered in two ways.

  1. Each person can be tested separately. In this case N tests are required.
  2. The blood samples of k people can be pooled and analyzed together. If the test is negative, this one test suffices for k people. If the test is positive, each of the k people must be tested separately, and, in all, k+1 test are required for the k people.

Suppose each person being tested has the disease with probability p. If the disease is rare, i.e. p is sufficiently small, the second approach will be more efficient. Consider the extremes. If p = 0, the first approach takes mk tests and the second approach takes only m tests. At the other extreme, if p = 1, the first approach still takes mk tests but the second approach now takes m(k+1) tests.

The homework problem asks for the expected number of tests used with each approach as a function of p for fixed k. Alternatively, you could assume that you always use the second method but need to find the optimal value of k. (This includes the possibility that k=1, which is equivalent to using the first method.)

I’d be curious to know whether these algorithms have names. I suspect computer scientists have given the coin testing algorithm a name. I also suspect the idea of pooling blood samples has several names, possibly one name when it is used in literally testing blood samples and other names when the same idea is applied to analogous testing problems.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

How to test a random number generator

Random number generators are challenging to test.

  • The output is supposed to be unpredictable, so how do you know when the generator working correctly?
  • Your tests will fail occasionally, but how do you decide whether they’re failing too often?
  • What kinds of errors are most common when writing random number generation software?

These are some of the questions I address in Chapter 10 of Beautiful Testing.

Beautiful Testing: Leading Professionals Reveal How They Test

The book is now in stock at Amazon. It is supposed to be in book stores by Friday. All profits from Beautiful Testing go to Nothing But Nets, a project to distribute anti-malarial bed nets.

Related: Need help with randomization?


Reviewing catch blocks

Here’s an interesting exercise. If you’re writing code in a language like C# or C++ that has catch statements, write a script to report all catch blocks. You might be surprised at what you find. Some questions to ask:

  • Do catch blocks swallow exceptions and thus mask problems?
  • Is information lost by catching an exception and throwing a new one?
  • Are exceptions logged appropriately?
  • Are notification messages grammatically correct and helpful?

Here’s a PowerShell script that will report all catch statements plus the five lines following the catch statement.

Related post: Finding embarrassing and unhelpful error messages

Maybe NASA could use some buggy software

In Coders at Work, Peter Norvig quotes NASA administrator Don Goldin saying

We’ve got to do the better, faster, cheaper. These space missions cost too much. It’d be better to run more missions and some of them would fail but overall we’d still get more done for the same amount of money.

NASA has extremely rigorous processes for writing software. They supposedly develop bug-free code; I doubt that’s true, thought I’m sure they do have exceptionally low bug rates. But this quality comes at a high price. Rumor has it that space shuttle software costs $1,500 per line to develop. When asked about the price tag, Norvig said “I don’t know if it’s optimal. I think they might be better off with buggy software.” At some point it’s certainly not optimal. If it doubles the price of a project to increase your probability of a successful mission from 98% to 99%, it’s not worth it; you’re better off running two missions with a 98% chance of success each.

Few people understand that software quality is all about probabilities of errors. Most people think the question is whether you’d rather have bug-free software or buggy software. I’d rather have bug-free software, thank you. But bug-free is not an option. Nothing humans do is perfect. All we can do is lower the probabilities of bugs. But as the probability of bugs goes to zero, the development costs go to infinity. (Actually it’s not all about probabilities of errors. It’s also about the consequences of errors. Sending back a photo with a few incorrect pixels is not the same as crashing a probe.)

Norvig’s comment makes sense regarding unmanned missions. But what about manned missions? Since one of the possible consequences of error is loss of life, the stakes are obviously higher. But does demanding flawless software increase the probability of a safe mission? One of the consequences of demanding extremely high quality software is that some tasks are too expensive to automate and so humans have to be trained to do those tasks. But astronauts make mistakes just as programmers do. If software has a smaller probability of error than an astronaut would have for a given task, it would be safer to rely on the software.

Related post: Software in space

Finding embarrassing and unhelpful error messages

Every time your software displays an error message, you risk losing credibility with your users. If the message is grammatically incorrect, your credibility definitely goes down a notch. And if the message is unhelpful, your credibility goes down at least one more notch. The same can be said for any message, but error messages are particularly important for three reasons.

  1. Users are in a bad mood when they see error messages; this is not the time to make things worse.
  2. Programmers are sloppy with error messages because they’re almost certain the messages will never be displayed.
  3. Error conditions are unusual by their very nature, and so it’s practically impossible to discover them all by black-box testing.

The best way to find error messages is to search the source code for text potentially displayed to users. I’ve advocated this practice for years and usually I encounter indifference or resistance. And yet nearly every time I extract the user-visible text from a software project I find dozens of spelling errors, grammar errors, and incomprehensible messages.

Last year I wrote an article for CodeProject on this topic and provided a script to strip text strings from source code. See PowerShell Script for Reviewing Text Shown to Users. The script looks for all quoted text and text inside XML entities. Then it tries to filter out text strings that are not displayed to users. For example, a string with no spaces is more likely to be a file name or some other code fragment than a user message. The script produces a report that a copy editor can then review. In addition to checking spelling and grammar, an editor can judge whether a message would be comprehensible and useful.

I admit that the parsing in the script is crude. It could miss some strings, and it could filter out some strings that it should keep. But the script is very simple, less than 100 lines. And it works on multiple source code types: C++, C#, XML, VB, etc. Writing a sophisticated parser for each of those languages would be a tremendous amount of work, but a quick-and-dirty script may be 98% as effective. Since most projects review 0% of their source code text, reviewing 98% is an improvement.

In addition to improving the text that a user sees, a text review gives some insight into a program’s structure. For example, if messages are repeated in multiple files, most likely the code has a lot of “clipboard inheritance,” code copied and pasted rather than isolated into reusable functions. A text review could also determine whether a program is concatenating strings to build SQL statements rather than calling stored procedures, possibly indicating a security vulnerability.

Broken windows theory and programming

The broken windows theory says that cracking down on petty crime reduces more serious crime. The name comes from the explanation that if a building has a few broken windows, it invites vandals to break more windows and eventually burn down the building. Turned around, this suggests that punishing vandalism could lead to a reduction in violent crime. Rudy Giuliani is perhaps the most visible proponent of the theory.  His first initiative as mayor of New York was to go after turnstile jumpers and squeegeemen as a way of reducing crime in city. Crime rates dropped dramatically during his tenure.

In the book Pragmatic Thinking and Learning, Andy Hunt applies the broken windows theory to software development.

Known problems (such as bugs in code, bad process in an organization, poor interfaces, or lame management) that are uncorrected have a debilitating, viral effect that ends up causing even more damage.

I’ll add a couple of my pet peeves to Andy Hunt’s list.

The first is compiler warnings. I can’t understand why some programmers are totally comfortable with their code having dozens of compiler warnings. They’ll say “Oh yeah, I know about that. It’s not a problem.” But then when a warning shows up that is trying to tell them something important, the message gets lost in the noise. My advice: Just fix the code. In very exceptional situations, explicitly turn off the warning.

The second is similar. Many programmers blithely ignore run-time exceptions that are written to an event log. As with compile warnings, they justify that these exceptions are not really a problem. My advice: If it’s not really a problem, then don’t log it. Otherwise, fix it.

Michael Feathers on refactoring

Michael Feathers wrote my favorite book on unit testing: Working Effectively with Legacy Code (ISBN 0131177052). Some books on unit testing just give abstract platitudes. Feather’s book wrestles with the hard, messy problem of retrofitting unit tests to existing code.

The .NET Rocks podcast had an interview with Michael Feathers recently. The whole interview is worth listening to, but here I’ll just recap a couple things he said about refactoring that I thought were insightful. First, most people agree that you need to have unit tests in place before you can do much refactoring. The unit tests give you the confidence to refactor without worrying that you’ll break something in the process and not know that you broke it. But Feathers adds that you might have to do some light refactoring before you can put the unit tests in place to allow more aggressive refactoring.

The second thing he mentioned about refactoring was the technique called “scratch refactoring.” With this approach, you refactor quickly without worrying about whether you are introducing bugs in order to see where you want to go. But then you completely throw away those changes and refactor carefully. Sometimes you need to do a dry run first to see what patterns emerge and determine where you want to go.

Both of these observations are ways to break out of a chicken-and-egg cycle, needing to refactor before you can refactor.

Errors in math papers not a big deal?

Daniel Lemire wrote a blog post this morning that ties together a couple themes previously discussed here.

Most published math papers contain errors, and yet there have been surprisingly few “major screw-ups” as defined by Mark Dominus. Daniel Lemire’s post quotes Doron Zeilberger on why these frequent errors are often benign.

Most mathematical papers are leaves in the web of knowledge, that no one reads, or will ever use to prove something else. The results that are used again and again are mostly lemmas, that while a priori non-trivial, once known, their proof is transparent. (Zeilberger’s Opinion 91)

Those papers that are “branches” rather than “leaves” receive more scrutiny and are more likely to be correct.

Zeilberger says lemmas get reused more than theorems. This dovetails with Mandelbrot’s observation mentioned a few weeks ago.

Many creative minds overrate their most baroque works, and underrate the simple ones. When history reverses such judgments, prolific writers come to be best remembered as authors of “lemmas,” of propositions they had felt “too simple” in themselves and had to be published solely as preludes to forgotten theorems.

There are obvious analogies to software.  Software that many people use has fewer bugs than software that few people use, just as theorems that people build on have fewer bugs than “leaves in the web of knowledge.” Useful subroutines and libraries are more likely to be reused than complete programs. And as Donald Knuth pointed out, re-editable code is better than black-box reusable code.

Everybody knows that software has bugs, but not everyone realizes how buggy theorems are. Bugs in software are more obvious because paper doesn’t abort. Proofs and programs are complementary forms of validation. Attempting to prove the correctness of an algorithm certainly reduces the chances of a bug, but proofs are fallible as well. Again quoting Knuth, he once said “Beware of bugs in the above code; I have only proved it correct, not tried it.” Not only can programs benefit from being more proof-like, proofs can benefit from being more program-like.

Why 90% solutions may beat 100% solutions

I’ve never written a line of Ruby, but I find Ruby on Rails fascinating. From all reports, the Rails framework lets you develop a web site much faster than you could using other tools, provided you can live with its limitations. Rails emphasizes consistency and simplicity, deliberately leaving out support for some contingencies.

I listened to an interview last night with Ruby developer Glenn Vanderburg. Here’s an excerpt that I found insightful.

In the Java world, the APIs and libraries … tend to be extremely thorough in trying to solve the entire problem that they are addressing and [are] somewhat complicated and difficult to use. Rails, in particular, takes exactly the opposite philosophy … Rails tries to solve the 90% of the problem that everybody has and that can be solved with 10% of the code. And it punts on that last 10%. And I think that’s the right decision, because the most complicated, odd, corner cases of these problems tend to be the things that can be solved by the team in a specific and rather simple way for one application. But if you try to solve them in a completely general way that everybody can use, it leads to these really complicated APIs and complicated underpinnings as well.

The point is not to pick on Java. I believe similar remarks apply to Microsoft’s libraries, or the libraries of any organization under pressure to be all things to all people. The Ruby on Rails community is a small, voluntary association that can turn away people who don’t like their way of doing things.

At first it sounds unprofessional to develop a software library does anything less than a thorough solution to the problem it addresses. And in some contexts that is true, though every library has to leave something out. But in other contexts, it makes sense to leave out the edge cases that users can easily handle in their particular context. What is an edge case to a library developer may be bread and butter to a particular set of users. (Of course the library provider should document explicitly just what part of the problem their code does and does not solve.)

Suppose that for some problem you really can write the code that is sufficient for 90% of the user base with 10% of the effort of solving the entire problem. That means a full solution is 10 times more expensive to build than a 90% solution.

Now think about quality. The full solution will have far more bugs. For starters, the extra code required for the full solution will have a higher density of bugs because it deals with trickier problems. Furthermore, it will have far fewer users per line of code — only 10% of the community cares about it in the first place, and of that 10%, they all care about different portions. With fewer users per line of code, this extra code will have more unreported bugs. And when users do report bugs in this code, the bugs will be a lower priority to fix because they impact fewer people.

So in this hypothetical example, the full solution costs an order of magnitude more to develop and has maybe two orders of magnitude more bugs.

Programmers aren’t reading programming books

In the interview with Charles Petzold I mentioned in my previous post, Petzold talks about the sharp decline in programming book sales. At one time, nearly every Windows programmer owned a copy of Petzold’s first book, especially in its earlier editions. But he said that now only 4,000 people have purchased his recent 3D programming book.

Programming book sales have plummeted, not because there is any less to learn, but because there is too much to learn. Developers don’t want to take the time to thoroughly learn any technology they suspect will become obsolete in a couple years, especially if its only one of many technologies they have to use. So they plunge ahead using tools they have never systematically studied. And when they get stuck, they Google for help and hope someone else has blogged about their specific problem.

Companies have cut back on training at the same time that they’re expecting more from software. So programmers do the best they can. They jump in and write code without really understanding what they’re doing. They guess and see what works. And when things don’t work, they Google for help. It’s the most effective thing to do in the short term. In the longer term it piles up technical debt that leads to a quality disaster or a maintenance quagmire.