Prevent errors or fix errors

Posted on 13 September 2014 by John

The other day I was driving by our veterinarian’s office and saw that the marquee said something like “Prevention is less expensive than treatment.” That’s sometimes true, but certainly not always.

This evening I ran across a couple lines from Ed Catmull that are more accurate than the vet’s quote.

Do not fall for the illusion that by preventing errors, you won’t have errors to fix. The truth is, the cost of preventing errors is often far greater than the cost of fixing them.

From Creativity, Inc.

15 thoughts on “Prevent errors or fix errors”

coder-mike

13 September 2014 at 20:13

That’s very thought provoking. I can’t think of an objective way to test the theory. But in my experience I would perhaps say that the number of bugs you *find* is a reasonable proxy for the number of bugs there *are* – whether or not you find them. So it’s better to develop software that seems to naturally have fewer bugs – where bugs are prevented by solid design – than to develop software that seems to be generally buggy but where you eliminate all the bugs you can find. The latter might result in software that works very well for all the tested states, but works unpredictably when deployed, where the sheer number of people using it exposes it to unusual states (and where it’s particularly difficult to debug). I’m not sure if this is the kind of “error” you were referring to, but a bug is one kind of software-related error.
Charlie

13 September 2014 at 20:32

A pithy version of this I heard somewhere (maybe in Bossypants?): “An ounce of cure is worth a pound of prevention”. Very important advice. :)
John

13 September 2014 at 20:58
Context is everything. An ounce of prevention can be worth a pound of cure, or vice versa.

In software development, it has become more economical to fix some errors than to prevent them. Compare
- waterfall development
- pessimistic database locking
- ACID transactions
- centralized version control
and
- agile development
- optimistic database locking
- eventual consistency
- and decentralized version control.
In general, the latter list has become more popular, though again, context is everything. Sometimes the items in the first list are preferable.
Henrik B

13 September 2014 at 22:08

In between prevention and fixing there is detection of errors. Even if you cannot prevent them, it important to be able to detect errors before they do too much damage. This is done by testing, alpha and beta releases, large user base, embracing error reports, etc. Applies to health services too.
Nathan Fiedler

13 September 2014 at 22:18

I believe that is the approach taken with systems written in Erlang — let it crash. Defensive coding is laborious. Take Go as an example; very verbose with 3 in 4 lines of code dealing with errors (I am exaggerating slightly, but it felt like a lot when I write in Go). Erlang, OTOH, says to write your code in the obvious way and let the supervisor process deal with any exceptional errors that arise. Makes for much more readable code.
John

13 September 2014 at 22:41

@Henrik: I agree that detection is very important. It seems to me that our ability to detect errors improves over a career much more than our ability to prevent errors. I see this especially in math. We all make mistakes, and pros may make nearly as many mistakes as rookies. But pros can say “That can’t be right because …” and catch errors that rookies can’t.

@Nathan: Erlang is a great example of a system being more robust by allowing components to fail. Programmed cell death is an example in biology.

Verbose code meant to prevent one kind of error may increase the chance of another kind of error. It’s all trade-offs.
Luis Pedro Coelho

14 September 2014 at 02:18

I remember an interview with a consultant who advised companies on how to maintain their expensive factory equipment. He said that one of the most difficult pieces of advice for his clients to follow was “wait til it breaks, then fix it” even when it was the best advice possible.
Wedge

14 September 2014 at 09:40

A perfect example of this in a technical sense is speculative execution at the processor level. If you can’t predict a branch operation ahead of time sometimes you can just execute both paths simultaneously then use the results from the path that actually does get taken while discarding the unneeded results.
MikeA

14 September 2014 at 15:13

Who pays each cost?. I worked on embedded systems where my company was responsible for the cost of fixing errors. Prevention was much cheaper than fixing in the field, or even final assembly (“Anything is cheaper than rework” overstates, but not that much). Contrast with an industry where a company can actually charge customers for fixing errors that company caused.
John

15 September 2014 at 11:55

Consider this quote from John Wanamaker: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half. Contrast that with the errors you fix vs. the errors you don’t fix.
Aaron Meurer

15 September 2014 at 18:57

The key point is the cost of the error itself, beyond the cost of fixing it. If the cost is life and limb, then it’s worth the extra cost to prevent the error.

And of course, everything must be considered stochasticly. You can’t know ahead of time if an error will occur, what it’s cost will be, or what the cost of preventing or fixing it would be. You can only estimate a probability distribution for each.
Benjamin I. Espen

16 September 2014 at 23:10

I think I’ve spent the last ten years wrestling with this problem. I am guessing that your frame of reference is code, but I think this is just as much a problem with designing and manufacturing physical objects as logical ones.

I design medical devices. This seems like a straight-forward case where you want to prevent as many errors as possible rather than waiting until a product breaks. However, it can be seven to ten years of development before you ever implant a device in a human being. Is preventing as many errors as you can the best approach possible, or do you want to fail often and fail early? Long before you even think about marketing a product. Is some kind of hybrid approach the best? If you try a hybrid, when do you switch over?

In my career, I’ve done it both ways, with several different mixtures of each extreme. I’m not sure I know the best answer to this question in any context, let alone my own, but in some ways this is the critical question in my field. Time to market matters, and patient outcomes also matter. Can I maximize them both, or there an unavoidable tradeoff involved?

As John mentioned, there are tradeoffs here, but it often not truly clear what kind of tradeoff you are making.
lens

17 September 2014 at 07:09

The medical device world puts the question in an interesting light.
If a device at a given state of development would save more lives than it would take then delaying release to make it better has net harm. But responsibility to those who would have been saved seems very different from responsibility to those are harmed.
John

17 September 2014 at 07:41

@Lens: That trade-off is what clinical trials are all about. It’s complicated. I spent years thinking about it and never came to very satisfying conclusions.

For one thing, you’re not talking about definitely harming a few people now to definitely benefit more people in the future. You’re talking about exposing trial subjects to much greater uncertainty in exchange for maybe improving the probability of success for future patients.

There are many levels of uncertainty. A treatment might not be better. It might be better but a trial doesn’t conclude it’s better. It might be better, accepted as better, but never used. It might be better for the demographic in the clinical trial but not in the wider public. Or maybe it would be much more effective in the general public, but not for the clinical trial demographic, so it never gets approval.

One thing that makes clinical trials easier to accept ethically is that the patients in a clinical trail — at least in oncology where I worked — generally do better than patients not on a trial, even if they are assigned to a control arm, because they get more attention.
David Locke

21 September 2014 at 03:46

The test would be not repairing a bridge, so that it falls down, then rebuilding the bridge vs. repairing the bridge.

Comments are closed.