Houston power outage and the 80-20 rule

Houston is in the midst of the largest power outage repair project in history. After Hurricane Ike passed through, about 2.5 million customers were without electricity. Now I hear that they’re down to half a million customers without power.

Let’s suppose the 80-20 rule applies to the repair effort. This seems reasonable since CenterPoint Energy understandably started with the easiest repairs. So with power restored to 2 million customers, they’ve completed 80% of their task. The 80-20 rule would predict that they have expended 20% of their effort. So if it took 10 days to restore the first 80% of customers, it will take another 40 days before they get to the last customer.

This not meant to be a precise estimate of the work that remains, only back-of-the-napkin speculation. But I do imagine a lot of work remains even though the repairs are in some sense 80% complete. This is not meant as a criticism of the heroic efforts of thousands of repairmen from around the US and Canada. I hope it increases appreciation for their efforts when progress, measured by percentage of customers restored, inevitably slows down.

Programmers aren’t reading programming books

In the interview with Charles Petzold I mentioned in my previous post, Petzold talks about the sharp decline in programming book sales. At one time, nearly every Windows programmer owned a copy of Petzold’s first book, especially in its earlier editions. But he said that now only 4,000 people have purchased his recent 3D programming book.

Programming book sales have plummeted, not because there is any less to learn, but because there is too much to learn. Developers don’t want to take the time to thoroughly learn any technology they suspect will become obsolete in a couple years, especially if its only one of many technologies they have to use. So they plunge ahead using tools they have never systematically studied. And when they get stuck, they Google for help and hope someone else has blogged about their specific problem.

Companies have cut back on training at the same time that they’re expecting more from software. So programmers do the best they can. They jump in and write code without really understanding what they’re doing. They guess and see what works. And when things don’t work, they Google for help. It’s the most effective thing to do in the short term. In the longer term it piles up technical debt that leads to a quality disaster or a maintenance quagmire.

Free C# book

Charles Petzold is a highly respected author in Windows programming circles. For years, his book was THE reference for Win32 API programming. I knew he had since written several books on .NET programming but I didn’t realize until I listened to an interview with Petzold that he has a .NET book that he gives away on his website.

.NET Book Zero: What the C or C++ Programmer Needs to Know About C# and the .NET Framework

How to compute standard deviation accurately

The most convenient way to compute sample variance by hand may not work in a program. Sample variance is given by

\sigma^2 = \frac{1}{ n(n-1)}\left(n \sum_{i=1}^n x_i^2 -\left(\sum_{i=1}^n x_k\right)^2\right)

If you compute the two summations and then carry out the subtraction above, you might be OK. Or you might have a large loss of precision. You might get a negative result even though in theory the quantity above cannot be negative. If you want the standard deviation rather than the variance, you may be in for an unpleasant surprise when you try to take your square root.

There is a simple but non-obvious way to compute sample variance that has excellent numerical properties. The algorithm was first published back in 1962 but is not as well known as it should be. Here are some notes explaining the algorithm and some C++ code for implementing the algorithm.

Accurately computing running variance

The algorithm has the added advantage that it keeps a running account of the mean and variance as data are entered sequentially.

Related posts

Four uncommon but handy math notations

Here are some of my favorite notations that are not commonly used.

The first is Richard Stanley’s notation for counting the number of ways to select k objects from a set of n objects with replacement. This is similar to the problem solved by binomial coefficients, but not the same since binomial coefficients count the number of possible selections without replacement. Stanley’s symbol is

\left( {n \choose k} \right)

I like this symbol for two reasons. First, it’s good to have a notation, any notation, for a concept that comes up fairly often. Second, it’s appropriate for this symbol to resemble the binary coefficient symbol. See selecting with replacement for more on Stanley’s symbol, how to think about it and how to compute it.

Next is Kenneth Iverson’s notation for indicator functions. Iverson’s idea was to put a Boolean condition in square brackets to indicate the function that is 1 when that condition is true and 0 otherwise. For example, [x > y] is the function f(x, y) such that f equals 1 when x is greater than y and equals 0 for all other arguments. This notation saves ink and makes it easier to concentrate on the substance of an expression. For more on Iverson’s notation, see Concrete Mathematics.

Another notation from Concrete Mathematics is the use of a perpendicular symbol to note that two integers are relatively prime. For example, mn would indicate that that m and n are relatively prime. The more common way to denote this would be to say gcd(m, n) = 1. The perpendicular symbol is nice because perpendicular lines have no component of direction in common, just as relative prime numbers have no prime factors in common.

Finally, multi-index notation is a handy way to make multivariable theorems easier to remember. For example, with this notation, Taylor series in several variables look similar to Taylor series in one variable.

Related link: Stanley’s twelvefold way

Robert’s rules of order and Galveston flooding

I found out recently that Henry Martyn Robert of Robert’s Rules of Order fame was also a civil engineer. After the devastating hurricane of 1900, Robert was part of the effort to raise the level of Galveston Island and build a seawall. As much damage as Hurricane Ike did to Galveston, it would have been far worse without the efforts of Robert and others over a century ago.

For more information, see Engines of Our Ingenuity Episode 1099.

Writes large correct programs

I had a conversation yesterday with someone who said he needed to hire a computer scientist. I replied that actually he needed to hire someone who could program, and that not all computer scientists could program. He disagreed, but I stood by my statement. I’ve known too many people with computer science degrees, even advanced degrees, who were ineffective software developers. Of course I’ve also known people with computer science degrees, especially advanced degrees, that were terrific software developers. The most I’ll say is that programming ability is positively correlated with computer science achievement.

The conversation turned to what it means to say someone can program. My proposed definition was someone who could write large programs that have a high probability of being correct. Joel Spolsky wrote a good book last year called Smart and Gets Things Done about recruiting great programmers. I agree with looking for someone who is “smart and gets things done,” but “writes large correct programs” may be easier to explain. The two ideas overlap a great deal.

People who are not professional programmers often don’t realize how the difficulty of writing software increases with size. Many people who wrote 100-line programs in college imagine that they could write 1,000-line programs if they worked at it 10 times longer. Or even worse, they imagine they could write 10,000-line programs if they worked 100 times longer. It doesn’t work that way. Most people who can write a 100-line program could never finish a 10,000-line program no matter how long they worked on it. They would simply drown in complexity.  One of the marks of a professional programmer is knowing how to organize software so that the complexity remains manageable as the size increases.  Even among professionals there are large differences in ability. The programmers who can effectively manage 100,000-line projects are in a different league than those who can manage 10,000-line projects.

(When I talk about a program that is so many lines long, I mean a program that needs to be about that long. It’s no achievement to write 1,000 lines of code for a problem that would be reasonable to solve in 10.)

Writing large buggy programs is hard. To say a program is buggy is to imply that it is at least of sufficient quality to approximate what it’s supposed to do much of the time. For example, you wouldn’t say that Notepad is a buggy web browser. A program has got to display web pages at least occasionally to be called a buggy browser.

Writing large correct programs is much harder. It’s even impossible, depending on what you mean by “large” and “correct.” No large program is completely bug-free, but some large programs have a very small probability of failure. The best programmers can think of a dozen ways to solve any problem, and they choose the way they believe has the best chance of being implemented correctly. Or they choose the way that is most likely to make an error obvious if it does occur. They know that software needs to be tested and they design their software to make it easier to test.

If you ask an amateur whether their program is correct, they are likely to be offended. They’ll tell you that of course it’s correct because they were careful when they wrote it. If you ask a professional the same question, they may tell you that their program probably has bugs, but then go on to tell you how they’ve tested it and what logging facilities are in place to help debug errors when they show up later.

Desktop applications, cloud computing, and hurricanes

There’s been much debate about the relative merits of desktop applications versus Internet-based applications. Both styles have their advantages, but the hybrid of both is less reliable than either separately.

Hurricane Ike knocked out our electricity for a couple days. Once the power came back on, I could use any software installed on my PC. We don’t have an Internet connection yet, but I’ve been able to check email etc. from other computers. I can use my PC, I can access the Internet, but I can’t do both at the same time. That means, for example, I can’t add podcasts to my iPod. Also, my back-up software cannot run. Applications that require local software and an Internet connection are the least reliable. I can work locally and move files around with a flash drive, or I can work “in the cloud,” but I cannot work in mid-air.

The trend lately has been toward cloud computing, entrusting your data and applications to anonymous servers somewhere out on the Internet. This can be a smart move. Companies like Amazon and Google have more sophisticated contingency plans than most consumers: redundant power and network connections, data centers in multiple geographic locations, etc. But there are always trade-offs. Think about an analogous situation with utility lines.

There has also been a trend toward underground utility lines. And recent experience shows what a good move that can be. Essentially the only places in Houston that had power immediately after the hurricane passed through were Downtown and the Galleria, two areas with underground utilities. Everyone with above-ground utilities was in the dark. But there’s more to consider. When Tropical Storm Allison came through Houston a few years ago, the underground utilities flooded and Downtown was without electricity while areas with old-fashioned powerlines were OK.

Above-ground and underground power lines both have their advantages, and overall it seems the latter is better. The most vulnerable position would be to depend on both above-ground and underground utilities, analogous to depending on both a particular PC and an Internet connection.