Why SQL failed

SQL (Structured Query Language) has been enormously successful as a tool for programmers. However, it completely failed at its intended purpose: a way for non-programmers to ask questions of a database.

I don’t think any other syntax would have been more successful. SQL syntax can be a little awkward, but that’s not the reason the general public doesn’t write SQL queries. The barrier to general use is that few people can form a moderately complex query precisely. Doing so requires thinking in terms of sets and logic. The difficulty is intrinsic to the problem; it’s not a problem of syntax.

I think about SQL every time I hear someone talking about how programmers are going to go away and that users will write their own software. And learning SQL is a pretty small task compared to learning everything else you need to know to develop software.

I also think about SQL when I hear someone talk about writing the ultimate report generator. Such a generator would be equivalent to SQL, and so it would take a programmer to use it. A good report generator is useful precisely because it is specialized to some context and therefore easier to use.

Update:  I don’t want to imply that you need to be a professional programmer to understand SQL or that you need years of training. But you do need an aptitude for programming. Someone with an aptitude for programming might catch on to SQL quickly. But someone without an aptitude for programming is going to require a combination of incentive, patience, and training before they’ll ever use SQL. If Google’s home page required users to enter Boolean expressions, they would have gone bankrupt long ago.

Related post: Why there will always be programmers

Google Reader and HTML lists

Yesterday I wrote a post about how to start numbering a list in HTML at some point other than 1. Mark Reid and Thomas Guest pointed out that my example did not show up correctly in Google Reader.

Here’s how the list shows up when I browse directly to the post using Firefox 3 on Windows XP.

browser screen shot

But here’s what the same list looks like when I look at the post inside Google Reader.

Google Reader screen shot

I don’t understand why the difference. In fact, I don’t understand in general why posts often look different in Google Reader. For example, the screenshots above are centered when you visit the blog directly, but are left-aligned in Google Reader. Also, the space between the images and the text is removed.

Do other RSS readers similarly mangle HTML? Any suggestions how to fix the problem?

Update: Changed the way images are centered per Thomas Guest’s suggestion.

Book review: Expert Python Programming

I often find it strange how some programming books arbitrarily classify content as either “beginner” or “advanced.” The advanced material may not be advanced at all. Maybe by “advanced” the author means “the stuff I didn’t learn right away.” However, Expert Python Programming by Tarek Ziadé objectively is an advanced book. I had heard good things about the book from the Python 411 podcast and had it on my list of books to buy when the book’s publisher gave me a copy to review.

Expert Python Programming book cover

Expert Python Programming assumes the reader has a solid knowledge of Python. In some ways, it is the Python counterpart to Scott Meyers’ Effective C++ series, giving advice about recommended practice. But Ziadé’s Python book is broader than Meyer’s C++ books because Expert Python Programming goes beyond best practices for language use. The latter chapters discuss how to package and distribute Python applications, software life cycle management, documentation, test-driven development, optimization, and design patterns. I’d like to see more books follow this pattern, covering these important topics in the context of a particular language or tool set. I was particularly pleased to see 27 pages devoted to documentation.

True to its name, Expert Python is not for beginners; I’d recommend Wesley Chun’s Core Python for a first book. But I’d heartily recommend Expert Python Programming as a second Python book. The book has an impressive range of practical material including numerous links to even more resources.

Starting number for HTML lists

I recently found out how to make an HTML list start numbering somewhere other than at 1. This is handy when you have a list interrupted by some text and want to continue the original numbering without starting over. I’ve only been using HTML for 15 years. Maybe one of these days I’ll really learn it.

In the <ol> tag, add the attribute start="7", for example, to make the list start numbering with 7.  The start attribute can be any integer, even negative.

For example, the seven dwarfs are

  1. Dopey
  2. Grumpy
  3. Doc
  4. Happy
  5. Bashful
  6. Sneezy

and last but not least

  1. Sleepy.

Update: As pointed out in the comments below, the example in this post may not render correctly in your reader. See this post for a discussion of the problem.

What is a confidence interval?

At first glance, a confidence interval is simple. If we say [3, 4] is a 95% confidence interval for a parameter θ, then there’s a 95% chance that θ is between 3 and 4. That explanation is not correct, but it works better in practice than in theory.

If you’re a Bayesian, the explanation above is correct if you change the terminology from “confidence” interval to “credible” interval. But if you’re a frequentist, you can’t make probability statements about parameters.

Confidence intervals take some delicate explanation. I took a look at Andrew Gelman and Deborah Nolan’s book Teaching Statistics: a bag of tricks,  to see what they had to say about teaching confidence intervals. They begin their section on the topic by saying “Confidence intervals are complicated …” That made me feel better. Some folks with more experience teaching statistics also find this challenging to teach. And according to The Lady Testing Tea, confidence intervals were controversial when they were first introduced.

From a frequentist perspective, confidence intervals are random, parameters are not, exactly the opposite of what everyone naturally thinks. You can’t talk about the probability that θ is in an interval because θ isn’t random. But in that case, what good is a confidence interval? As L. J. Savage once said,

The only use I know for a confidence interval is to have confidence in it.

In practice, people don’t go too wrong using the popular but technically incorrect notion of a confidence interval. Frequentist confidence intervals often approximate Bayesian credibility intervals; the frequentist approach is more useful in practice than in theory.

It’s interesting to see a sort of détente between frequentist and Bayesian statisticians. Some frequentists say that the Bayesian interpretation of statistics is nonsense, but the methods these crazy Bayesians come up with often have good frequentist properties. And some Bayesians say that frequentist methods, such as confidence intervals, are useful because they can come up with results that often approximate Bayesian results.

Four reasons we don’t apply the 80/20 rule

Why can’t we make more use of the 80/20 rule? I’ll review what the 80/20 rule is, explain how it can be powerful, then give four reasons why we don’t take advantage of it.

What is the 80/20 rule?

The 80/20 rule is amazing when you first learn about it. It says that efforts and results are often very unevenly distributed. You’ll get 80% of your results from the first 20% of your efforts. For example, maybe your top 20% of customers will provide 80% of your profit. Or when you’re debugging software, often 80% of the bugs will be in 20% of the code. Once you become aware of it, you’ll see 80/20 examples everywhere.

There’s nothing magical about the numbers 80 and 20. The general principle applies if 93% of your results come from 22% of your efforts. The numbers don’t have to add to 100. The principle is just that outcomes are unevenly distributed, more unevenly distributed than you may think.

Applying the 80/20 rule

Applications of the 80/20 rule are everywhere. For example, if you want to learn a foreign language, you don’t buy a dictionary and start learning words from page 1 and work your way to the end. Some words are used far more often than others. You’ll be able to use a language much sooner if you learn the vocabulary roughly in descending order of frequency.

Software optimizations can be extreme examples of an 80/20 rule. Sometimes 98% percent of a program’s time is being spent executing just five lines of code.  Finding those five lines and tuning them is far more effective than randomly tweaking things here and there in hopes that the changes improve performance.

Why don’t we apply the 80/20 rule?

If the 80/20 rule is so powerful, why don’t we use it more often? Why don’t we concentrate our efforts where we’re likely to see the best results? Here are four reasons.

  1. We don’t look for 80/20 payoffs. We don’t see 80/20 rules because we don’t think to look for them.
  2. We’re not clear about criteria for success. You can’t concentrate your efforts on the 20% with the biggest returns until you’re clear on how you measure returns.
  3. We’re unclear how inputs relate to outputs. It may be hard to predict what the most productive activities will be.
  4. We enjoy less productive activities more than more productive ones. We concentrate on what’s fun rather than what’s effective.

If you address these issues in order, you might get stuck on the third one. It can be hard to know what is most productive. Our intuition in this area is usually wrong. For example, maybe the most effective thing to do is very simple, but we overlook it because we think the answer must be more complicated. Or maybe we confuse what we need to do with what we want to do. Collecting data is the best way to find out what really works. The results are often surprising.

Sometimes the world changes and we’re stuck doing what used to be most effective. For example, some of the most persistent ideas about the “right” way to develop software come from studies of done forty years ago. It’s not enough to collect data one time.

More on Pareto and power laws

Bike shed arguments

bike in front of a shed

C. Northcote Parkinson observed that it is easier for a committee to approve a nuclear power plant than a bicycle shed. Nuclear power plants are complex, and no one on a committee presumes to understand every detail. Committee members must rely on the judgment of others. But everyone understands bicycle sheds. Also, questions such as what color to paint the bike shed don’t have objective answers. And so bike sheds provoke long discussions. The term bike shed argument has come to mean a lengthy, unproductive discussion over a minor issue.

In statistics, utility functions provoke bike shed arguments. Most statisticians agree that decision theory is a good idea, but it is hardly ever used in practice because applying decision theory to any specific problem invites bike shed arguments over utility functions.

Update: See Parkinson’s Law for more on Parkinson and his book that coined the term “bike shed argument.”

Finite differences

If f(x) is a function on integers, the forward difference operator is defined by

\Delta f(x) = f(x+1) - f(x)

For example, say f(x) = x2. The forward difference of the sequence of squares 1, 4, 9, 16, … is the sequence of odd numbers 3, 5, 7, …

There are many identities for the forward difference operator that resemble analogous formulas for derivatives. For example, the forward difference operator has its own product rule, quotient rule, etc. These rules are called the calculus of finite differences. The finite results are often much easier to prove than their continuous counterparts.

The calculus of finite differences makes it possible to solve some discrete problems systematically, analogous to the way one would solve continuous problems with more familiar differential calculus. For example, there is a “summation by parts” technique for computing sums analogous to integration by parts for integrals.

The product rule for forward differences looks a little odd:

\Delta(f(x)g(x)) = g(x) \Delta f(x) + f(x+1) \Delta g(x)

The left hand side is symmetric in f and g though the right side is not. There is also a symmetric version:

\Delta(f(x)g(x)) = g(x) \Delta f(x) + f(x) \Delta g(x) + \Delta f(x) \Delta g(x)

Here is the quotient rule for forward differences.

\Delta \frac{f(x)}{g(x)} = \frac{g(x) \Delta f(x) - f(x) \Delta g(x)}{g(x)g(x+1)}

One of the first things you learn in calculus is how to take the derivative of powers of x: the derivative of xn is n xn-1. There is an analogous formula in the calculus of finite differences, but with a different kind of power of x. For positive integers n, define the nth falling power of x by

x^{(n)} \equiv x(x-1)(x-2)\cdots(x-n+1)

Then

\Delta x^{(n)} = nx^{(n-1)}

Falling powers can be generalized to non-integer exponents by defining

x^{(n)} \equiv \frac{\Gamma(x+1)}{\Gamma(x-n+1)}

The formula for finite difference of falling powers given above remains valid when using the more general definition of falling powers. Falling powers arise in many areas: generating functions, power series solutions to differential equations, hypergeometric functions, etc.

The function 2x is its own forward difference, i.e.

\Delta 2^x = 2^x

analogous to fact that ex is its own derivative.

Here are a couple more identities showing a connection between the gamma function and finite differences. First,

\Delta \Gamma(x) = (x-1)\Gamma(x)

Also,

\Delta \log \Gamma(x) = \log(x)

To read more about the calculus of finite differences, see Concrete Mathematics.