Maintenance costs

No engineered structure is designed to be built and then neglected or ignored. — Henry Petroski

The quote above comes from Henry Petroski’s recent interview on Tech Nation. In the same interview, Petroski says that a common rule of thumb is that maintenance costs about 4% of construction cost per year. For a structure as old as the Golden Gate Bridge (completed in 1937), for example, that’s a lot of 4%’s.

Golden Gate Bridge

Painting the bridge has cost far more than building it. The bridge is painted continuously: as soon as the painters reach the end of the bridge, they turn around and start over. The engineers who designed the bridge knew this would happen. When you build something out of steel and put it outside, it will need to be painted. It was all part of the design.

Image credit: Wikipedia

Related links:

Two kinds of software challenges
Do you really want to be indispensable?
Upcoming Y2K-like problems
The Essential Engineer, Henry Petroski’s new book

Estimating the chances of something that hasn’t happened yet

Suppose you’re proofreading a book. If you’ve read 20 pages and found 7 typos, you might reasonably estimate that the chances of a page having a typo are 7/20. But what if you’ve read 20 pages and found no typos. Are you willing to conclude that the chances of a page having a typo are 0/20, i.e. the book has absolutely no typos?

To take another example, suppose you are testing children for perfect pitch. You’ve tested 100 children so far and haven’t found any with perfect pitch. Do you conclude that children don’t have perfect pitch? You know that some do because you’ve heard of instances before. But your data suggest perfect pitch in children is at least rare. But how rare?

The rule of three gives a quick and dirty way to estimate these kinds of probabilities. It says that if you’ve tested N cases and haven’t found what you’re looking for, a reasonable estimate is that the probability is less than 3/N. So in our proofreading example, if you haven’t found any typos in 20 pages, you could estimate that the probability of a page having a typo is less than 15%. In the perfect pitch example, you could conclude that fewer than 3% of children have perfect pitch.

Note that the rule of three says that your probability estimate goes down in proportion to the number of cases you’ve studied. If you’d read 200 pages without finding a typo, your estimate would drop from 15% to 1.5%. But it doesn’t suddenly drop to zero. I imagine most people would harbor a suspicion that that there may be typos even though they haven’t seen any in the first few pages. But at some point they might say “I’ve read so many pages without finding any errors, there must not be any.” The situation is a little different with the perfect pitch example, however, because you may know before you start that the probability cannot be zero.

If the sight of math makes you squeamish, you might want to stop reading now. Just remember that if you haven’t seen something happen in N observations, a good estimate is that the chances of it happening are less than 3/N.

What makes the rule of three work? Suppose the probability of what you’re looking for is p. If we want a 95% confidence interval, we want to find the largest p so that the probability of no successes out of n trials is 0.05, i.e. we want to solve (1-p)n = 0.05 for p. Taking logs of both sides, n log(1-p) = log(0.05) ≈ -3. Since log(1-p) is approximately –p for small values of p, we have p ≈ 3/n.

The derivation above gives the frequentist perspective. I’ll now give the Bayesian derivation of the same result. Then you can say “p is probably less than 3/N” in clear conscience since Bayesians are allowed to make such statements.

Suppose you start with a uniform prior on p. The posterior distribution on p after having seen 0 successes and N failures has a beta(1, N+1) distribution. If you calculate the posterior probability of p being less than 3/N you get an expression that approaches 1 – exp(-3) as N gets large, and 1 – exp(-3) ≈ 0.95.

Update: Italian translation of this post.


Click to learn more about Bayesian statistics consulting


Related posts:

The secret to understanding recursion

Recursion is the process of solving a problem in terms of smaller versions of the same problem. Since the problem gets smaller each time, the process eventually terminates in a problem (the “base case”)  that can be solved directly. Be sure of three things:

  1. The problem gets smaller each time.
  2. You include a solution for the base case.
  3. Each case is handled correctly.

That’s really all there is to it.

But what about visualizing how the code runs? You don’t have to. And that’s the point: sometimes a recursive solution is easier precisely because you don’t have to understand in detail how it executes. Graham explains in his Lisp book why tracing the code execution is unnecessary if not harmful.

Students learning about recursion are sometimes encouraged to trace all the invocations of a recursive function on a piece of paper. … This exercise could be misleading: a programmer defining a recursive function usually does not think explicitly about the sequence of invocations that results from calling it. … How do you visualize all those invocations? You don’t have to.

* * *

For a daily dose of computer science and related topics, follow @CompSciFact on Twitter.

CompSciFact twitter icon

I promise I’m not trying to learn anything

Medical experiments come under greater scrutiny than ordinary medical practice. There are good reasons for such precautions, but this leads to a sort of paradox. As Frederick Mosteller observed

We have a strange double standard now. As long as a physician treats a patient intending to cure, the treatment is admissible. When the object is to find out whether the treatment has value, the physician is immediately subject to many constraints.

If a physician has two treatment options, A and B, he can assign either treatment as long as he believes that one is best. But if he admits that he doesn’t know which is better and says he wants to treat some patients each way in order to get a better idea how they compare, then he has to propose a study and go through a long review processes.

I agree with Mosteller that we have a strange double standard, that a doctor is free to do what he wants as long as he doesn’t try to learn anything. On the other hand, review boards reduce the chances that patients will be asked to participate in ill-conceived experiments by looking for possible conflicts of interest, weaknesses in statistical design, etc. And such precautions are more necessary in experimental medicine than in more routine medicine. Still, there is more uncertainty in medicine than we may like to admit, and the line between “experimental” and “routine” can be fuzzy.

Related posts:

Confusing familiar with simple

Is Spanish simpler than Chinese? Most English speakers would think so, though that may not be true. Spanish is more familiar than Chinese if you’re an English speaker, but that does not mean the language is objectively simpler. In fact, linguists have a theory that all human languages are about equally complex, though they allocate their complexity in different areas. For example, Chinese has a complex tonal system, but I’ve been told its grammar is relatively simple.

We often confuse familiar with simple. Rich Hickey makes this observation in the context of programming languages, though the principle applies much more generally.

I think programmers have become inured to incidental complexity, in particular by confusing familiar or concise with simple. And when they encounter complexity, they consider it a challenge to overcome, rather than an obstacle to remove. Overcoming complexity isn’t work, it’s waste.

In some sense the familiar is simple. Familiar things have less perceived complexity, and sometimes perceived complexity is all that counts. But perceived complexity is personal. We can forget that familiarity clouds our judgment about complexity. We may recommend something familiar but complex to someone else who finds it unfamiliar and complex. Teachers have to keep in mind what students find complex. Programmers have to keep in mind what users find complex. Doctors have to keep in mind what patients find complex.

However, familiarity and perceived complexity can be deceiving even though no one else is involved. You may find something familiar and not realize how much effort you’re devoting to fighting its complexity. It’s easy to assume that things must be as complex as they are. I didn’t realize how complex clarinet was until I learned to play saxophone. I didn’t realize how complex C++ was until I had some experience with other programming languages. I didn’t realize how complex some desktop software was until I tried online alternatives.

The complexity of the familiar may not be apparent until you look closer. Nothing could be more familiar than the experience that the sun and planets go around the earth. That is a simple explanation until you look at orbits more carefully. Then you start introducing epicycles on top of epicycles to preserve the earth-centric model.  You may find that what you thought was simple was only familiar, and that what you dismissed as more complex was only less familiar.

Ford-Chevy arguments in tech

If you’ve never heard a Ford-Chevy argument, you may find it hard to believe that such things exist. People actually get into arguments, sometimes violent arguments, over which trucks are better, Ford or Chevy.

More generally, a Ford-Chevy argument is an emotionally charged debate over the merits of two similar things with each side fiercely loyal to its position. These arguments look silly to outsiders but are serious to insiders. We all have our Ford-Chevy topics.

Have you ever gotten into a Mac versus PC argument? Emacs versus vi? Your favorite programming language versus some inferior language? How about your profession versus some rival profession? Your favorite sports team versus a competitor?

Thomas Gideon recently recorded a podcast on software tools. Gideon gives a good explanation for why we have technical Ford-Chevy arguments.

The time needed to gain mastery over a single deep tool usually precludes being able to learn anything else in that category. Pointing out feature differences, ones that may paint your chosen tool in an unflattering light, can make you defensive without realizing it. … how much effort you put in is being called into question, and to a degree, if only subconsciously, your intelligence or judgment may also be questioned by implication.

When you’ve made a large investment in time or money, you don’t want to hear someone question that investment. You may feel that your intelligence or judgment is being called into question. You may fear that you’ve picked the wrong tool but don’t have the time or energy to learn an alternative.

I’d like to think I’m above Ford-Chevy arguments, but I’m not. I would never get into a literal Ford-Chevy argument because I don’t care about trucks. But I could easily fall into a Ford-Chevy–type argument about something I care more about.

It’s no surprise that emotional factors influence our choice of music or clothes. But it is surprising how much emotional factors influence even highly technical decisions. For example, people often choose statistical methods for emotional reasons, though they would never admit it. Once we make a decision, we come up with rational justifications after the fact. This applies to choosing a computer or a statistical method just as much as it applies to choosing a truck or pair of shoes.

Read a few of the over 6,500 comments on the video to get a taste of a real Ford-Chevy argument.

Related post: Doing good work with bad tools

Four mechanical devices better than their newer counterparts

Here are four mechanical devices I prefer to their modern counterparts.

French press. It makes better coffee than a typical coffee machine. Also, a French press work without electricity. Next time a hurricane comes through Houston and knocks out our power, I can still make my coffee.

Reel mower. I had gasoline powered lawn mowers until last year. Sometimes they’d start, sometimes they wouldn’t. My reel mower always starts. And it’s quiet.

Rake. I had a leaf blower once. It was obnoxiously loud and a nuisance to my neighbors. I much prefer raking leaves even though it takes longer.

Pencil sharpener. With four children, we sharpen a fair number of pencils. We have owned a couple electric pencil sharpeners. They were noisy, hard to use, and soon wore out. Our mechanical pencil sharpener is cheaper and far more reliable.

I’m no Luddite, but I firmly believe that newer isn’t necessarily better.

Related posts:

Kiss me, I might be Irish!

Happy St. Patrick’s Day everyone.

They say everyone’s Irish on St. Patrick’s Day. I’ve heard that I actually am part Irish (as well as Scottish, German, Cherokee, …) In any case, it’s nice of the Irish to share their holiday with the rest of the world.

St. Patrick’s Day 2007 in Seoul, Korea. Image credit: here via Wikipedia

Related posts:


Emacs is a text editor with ambitions to be an operating system. I do not use Emacs, though I once did, and I still find it intriguing. I’d like to find something similar that acts more like a Windows program.

GNU Emacs began in 1984 and has been in constant development ever since. The current version is 23.1. How many applications from 1984 are still in widespread use today? The only other one that comes to mind is TeX.

I used Emacs in graduate school and for a few years after that. I was fairly fluent with Emacs, though I never customized it much. I intended to learn Emacs Lisp and all that, but it never happened.

When I started developing Windows software I used Emacs at first, but the benefits of Visual Studio soon persuaded me give up my old editor. It was much easier to go with the flow.

I’ve revisited Emacs a couple times over the years. I still have some of the keystrokes burned into my memory. I use it on Linux now and then, but I mostly work on Windows, and my experience using Emacs on Windows has been frustrating to say the least. Tasks that are trivial in any Windows application, such as printing and spell checking, are surprisingly difficult to set up in Emacs. I’m sure it is possible to resolve these problems, though I never did.

The problems with printing and spell checking are part of the larger issue that Emacs is so idiosyncratic. It behaves nothing like a typical Windows program. Some people may say that’s a good thing. But it makes life more complicated if you switch between Emacs and more conventional Windows software.

Emacs is no more a typical Mac application than it is a typical Windows application. And yet my impression is that this is less of a problem for Mac users. I’d like to understand whether this is true and if so why.

One of the things I liked about Emacs was the way you could “live” there. An expert Emacs user might work inside Emacs all day, using it as an editor, debugger, shell, file system explorer, email program, etc. Steve Yegge is such an expert. When he blogged about his move from Windows to Mac,  he said the main reason for the switch was that he prefers the appearance of the fonts on a Mac. Changing operating systems was not a big deal for Yegge because he didn’t really live in Windows before, nor does he live in OS X now. He lives in Emacs. He concluded his essay by saying

So I’ll keep using my Macs. They’re all just plumbing for Emacs, anyway. And now my plumbing has nicer fonts.

Living inside Emacs comes at a price. Part of that price is writing lots of Emacs Lisp to glue things together. Another part of that price is the commitment to practicing using Emacs. As Yegge says elsewhere

… you need to make a serious, lifelong commitment to Emacs in order to master it. … So it’s not an editor for the faint of heart …

Yikes! I’m not ready to make a serious, lifelong commitment to a piece of software. To my wife? Yes. To my text editor? No.

One of the best features of Emacs is that it has custom “modes” for various kinds of files. Instead of using a separate program for editing every kind of file, Emacs users use one program with different modes. As soon as a new file type comes out, say for a new programming language, someone will post an Emacs mode for that new language.

I’d like to find an editor on Windows that is analogous to Emacs. By that I mostly have in mind a powerful, highly configurable editor with support for many file types. I’d want it to behave like a Windows application, not a foreign transplant, and integrate well with .NET.

There was a project to create such an editor, nicknamed Emacs.NET. It was announced in late 2007. It sounds like the project is still alive, but it doesn’t seem all that promising. [Update: maybe it’s dead.]

I’ve looked at a few Windows editors that claim to be highly configurable but are not well documented. So if such an editor is configurable, it’s configurable for the person who wrote it or possibly for anyone else willing to study the source code.

Any suggestions for a general purpose Windows editor? For starters, I’d be pleased to find something that’s good at editing LaTeX and HTML.

Update (2 April 2010): I’ve decided to give Emacs another try.

Related post: This post started out as an update to my earlier post One program to rule them all.

Adding simplicity

Simplicity is costly. You have to give up something to achieve it. You can’t just add it on top. William Bridges illustrates this in his book The Way of Transition where he describes his moving out to the country.

… I had been infatuated with Thoreau’s Walden and its story of living a basic life, close to nature. The heart of that undertaking, he had written, was to simplify your life. … In retrospect, I can see that although I thought that this was what I was doing, I was really just trying to add simplicity to my life. In addition to all the old things I had been doing … Of course, my life grew more and more complicated in the process.

A simplification has to remove or replace something else. You can’t just add on simplicity.

There may be an exception to this. Sometimes you can add a few missing pieces to make something more symmetric. In that case, the additions simplify the whole. (Mendeleev did something like this when he drew his periodic table.) Even then, I suppose you could say you’re removing the asymmetry. In any case, achieving simplicity usually requires more subtraction than addition.

Related posts:

A sort of command line for your browser

Quix is a sort of command line for web browsers. It’s a bookmarklet, a piece of JavaScript you save like a bookmark. When you launch Quix, it opens a small dialog that lets you enter brief commands for common browser tasks. For example the gs command does a Google search within the domain of the current page.

You can install Quix by dragging it to your bookmark menu. However, if you want to use Quix to make it easier to use your browser without a mouse, you don’t want to have to click the Quix bookmark to get started. You can integrate Quix with your browser to be able to launch Quix from the keyboard.

For example, if you’re using Firefox on Windows, you can drag the Quix bookmarklet to your bookmarks toolbar. Next right-click on the bookmarklet, select “Properties”, and set “q” as the keyword. Then you can launch Quix by typing Ctrl-L q.

You can find directions here for integrating Quix with Chrome, IE, Firefox, Opera, and Safari.

Related posts:

Does gaining weight make you taller?

In his autobiography, The Pleasures of Statistics, Frederick Mosteller gives an amusing example of why observational studies are no substitute for doing experiments.

We are all familiar with the idea that we can estimate height in male adults from their weight. … But not one of us believes that adding 20 pounds by eating and minimizing exercise will add an inch to our height.

The problem is not simply that the direction of causality backward, it’s that we cannot use a static description to predict what will happen if we change something.

Although regression situations may give one the illusion of finding out what would happen if we changed something, in the absence of an experiment they offer merely offer guesses.

He summarizes his point by quoting George Box:

To find out what happens to a system when you interfere with it, you have to interfere with it (and not just passively observe it).

Remember this next time you hear claims such as every dollar spent on X saves so many dollars spent on Y. Or every minute spent exercising increases your life expectancy by so many minutes. Or every time you do some activity you increase or decrease your risk of cancer by so much. First of all, these kinds of statements are linear extrapolations on situations that are not linear. Second, they may be observations that do not describe what will happen when you change something. They may be no more true than the idea that gaining weight makes you taller.

Here’s an example of how observation and intervention differ. Lottery winners often go bankrupt within a couple years of receiving their prize. If you suddenly make someone a millionaire, they’re not a typical millionaire.

Related posts:

Numerator-only data

I learned a useful new phrase today: numerator-only data. This is data without anything to compare it to, no denominator. I ran across the term in Frederick Mosteller’s autobiography. He illustrates the problem with the following old joke.

“Why do the white horses eat more than the black horses?”
“Don’t know. Why?”
“Because we have ten times as many white horses and black horses.”

Numerator-only data is data that leaves you asking “compared to what?” If I tell you the NASDAQ stock index closed at 2368 today, is that good or bad? The number by itself means nothing. Is that up or down compared to last week? Last year? If I tell you, for example, that the record high value was 5047, that gives you a denominator to compare it to.

Yahoo translation fail

Allen from the Wave Behind blog translated my blog post Just in case versus just in time into Chinese. I appreciate that Allen went to the trouble of doing the translation. I can’t read Chinese, but people who can told me he did a good job.

Mark Biek pointed out the quality of the Google and Yahoo translations from Chinese back into English. The Google translation is awkward but understandable. The Yahoo translation, however, is a total failure. First of all, the translation is illegible in Firefox:

Using Internet Explorer 8, the text is legible, but it doesn’t make sense:

The two screen shots focus on different parts of the text. I chose a swatch near the top of the Firefox version where the text was most illegible. I chose the IE8 swatch to showcase the phrase “the smelly spicy jiao raccoon dog” that Mark had pointed out.