Interpreting statistics

From Matt Briggs:

I challenge you to find me in any published statistical analysis, outside of an introductory textbook, a confidence interval given the correct interpretation. If you can find even one instance where the [frequentist] confidence interval is not interpreted as a [Bayesian] credible interval, then I will eat your hat.

Most statistical analysis is carried out by people who do not interpret their results correctly. They carry out frequentist procedures and then give the results a Bayesian interpretation. This is not simply a violation of an academic taboo. It means that people generally underestimate the uncertainty in their conclusions.

More statistical posts

Bugs, features, and risk

All software has bugs. Someone has estimated that production code has about one bug per 100 lines. Of course there’s some variation in this number. Some software is a lot worse, and some is a little better.

But bugs-per-line-of-code is not very useful for assessing risk. The risk of a bug is the probability of running into it multiplied by its impact. Some lines of code are far more likely to execute than others, and some bugs are far more consequential than others.

Devoting equal effort to testing all lines of code would be wasteful. You’re not going to find all the bugs anyway, so you should concentrate on the parts of the code that are most likely to run and that would produce the greatest harm if they were wrong.

However, here’s a complication. The probability of running into a bug can change over time as people use the software in new ways. For whatever reason people to want to use features that had not been exercised before. When they do so, they’re likely to uncover new bugs.

(This helps explain why everyone thinks his preferred software is more reliable than others. When you’re a typical user, you tread the well-tested paths. You also learn, often subconsciously, to avoid buggy paths. When you bring your expectations from an old piece of software to a new one, you’re more likely to uncover bugs.)

Even though usage patterns change, they don’t change arbitrarily. It’s still the case that some code is far more likely than other code to execute.

Good software developers think ahead. They solve more than they’re asked to solve. They think “I’m going to go ahead and include this other case while I’m at it in case they need it later.” They’re heroes when it turns out their guesses about future needs were correct.

But there’s a downside to this initiative. You pay for what you don’t use. Every speculative feature either has to be tested, incurring more expense up front, or delivered untested, incurring more risk. This suggests its better to disable unused features.

You cannot avoid speculation entirely. Writing maintainable software requires speculating well, anticipating and preparing for change. Good software developers place good bets, and these tend to be small bets, going to a little extra effort to make software much more flexible. As with bugs, you have to consider probabilities and consequences: how likely is this part of the software to change, and how much effort will it take to prepare for that change?

Developers learn from experience what aspects of software are likely to change and they prepare for that change. But then they get angry at a rookie who wastes a lot of time developing some unnecessary feature. They may not realize that the rookie is doing the same thing they are, but with a less informed idea of what’s likely to be needed in the future.

Disputes between developers often involve hidden assumptions about probabilities. Whether some aspect of the software is responsible preparation for maintenance or wasteful gold plating depends on your idea of what’s likely to happen in the future.

Related: Why programmers write unneeded code

Imploding my old office building

I used to have an office in this building that was imploded on Sunday.

Update: Video taken down. Sorry.

You can hear someone on the video say “Are we looking at the right building?” just before the building starts to collapse.

More on the implosion from the Houston Chronicle.

[If the video doesn’t show up in your blog reader, go directly to my blog page or to the Houston Chronicle link.]

Customizing conventional wisdom

From Solitude and Leadership by William Deresiewicz:

I find for myself that my first thought is never my best thought. My first thought is always someone else’s; it’s always what I’ve already heard about the subject, always the conventional wisdom. It’s only by concentrating, sticking to the question, being patient, letting all the parts of my mind come into play, that I arrive at an original idea. By giving my brain a chance to make associations, draw connections, take me by surprise. And often even that idea doesn’t turn out to be very good. I need time to think about it, too, to make mistakes and recognize them, to make false starts and correct them, to outlast my impulses, to defeat my desire to declare the job done and move on to the next thing.

Conventional wisdom summarizes the experience of many people. As a result, it’s often a good starting point. But like a blurred photo, it has gone through a sort of averaging process, loosing resolution along the way. It takes hard work to decide how, or even whether, conventional wisdom applies to your particular circumstances.

Bureaucracies are infuriating because they cannot deliberate on particulars the way Deresiewicz recommends. In order to scale up, they develop procedures that work well under common scenarios.

The context of Deresiewicz’s advice is a speech he gave at West Point. His audience will spend their careers in one of the largest and most bureaucratic organizations in the world. Deresiewicz is aware of this irony and gives advice for how to be a deep thinker while working within a bureaucracy.

Related posts

Holographic code

In a hologram, information about each small area of image is scattered throughout the holograph. You can’t say this little area of the hologram corresponds to this little area of the image. At least that’s what I’ve heard; I don’t really know how holograms work.

I thought about holograms the other day when someone was describing some source code with deeply nested templates. He told me “You can’t just read it. You can only step through the code with a debugger.” I’ve ran into similar code. The execution sequence of the code at run time is almost unrelated to the sequence of lines in the source code. The run time behavior is scattered through the source code like image information in a holograph.

Holographic code is an advanced anti-pattern. It’s more likely to result from good practice taken to an extreme than from bad practice.

Somewhere along the way, programmers learn the “DRY” principle: Don’t Repeat Yourself. This is good advice, within reason. But if you wring every bit of redundancy out of your code, you end up with something like Huffman encoded source. In fact, DRY is very much a compression algorithm. In moderation, it makes code easier to maintain. But carried too far, it makes reading your code like reading a zip file. Sometimes a little redundancy makes code much easier to read and maintain.

Code is like wine: a little dryness is good, but too much is bitter or sour.

Note that functional-style code can be holographic just like conventional code. A pure function is self-contained in the sense that everything the function needs to know comes in as arguments, i.e. there is no dependence on external state. But that doesn’t mean that everything the programmer needs to know is in one contiguous chunk of code. If you have to jump all over your code base to understand what’s going on anywhere, you have holographic code, regardless of what style it was written in. However, I imagine functional programs would usually be less holographic.

Related post: Baklava code

Pax Romana

From A History of the English Speaking Peoples by Winston Churchill:

In our own fevered, changing, and precarious age, where all is in flux and nothing is accepted, we must survey with respect a period when, with only three hundred thousand soldiers, widespread peace in the entire known world was maintained from generation to generation, and when the first pristine impulse of Christianity lifted men’s souls to the contemplation of new and larger harmonies beyond the ordered world around them.

Variable-length patents

Alex Tabarrok brings up an interesting question: Why should all patents have the same length?

Pharmaceuticals are really the classic case of where the [ratio of] innovation-to-imitation costs are extraordinarily high. It costs about a billion dollars to create a new pharmaceutical. The first pill costs a billion dollars; the second pill costs 50 cents. So, that’s a classic ase where imitation costs really are low. That’s the best case for patents, in a field like that.

But my question is: Why does every innovation deserve or require the same 20-year patent? Why do we have a system which gives a one billion dollar pharmaceutical—where there’s $1 billion in research and development costs—we give that a 20-year patent and one-click shopping gets the same 20-year patent? That makes no sense whatsoever.

So, what I suggest is a more flexible system. I’d like to have a 20-year patent, maybe a 15-year patent, maybe a 3-year patent. Something like that. And then we could say: You want to apply for a 3-year patent? We are going to get this through the system quickly; we won’t look at it so much. … You want a 20-year patent, though, you’d better show us that you really are deserving and put some costs in there.

Source: EconTalk

I don’t like software patents, though I don’t see them going away. But it might be possible to pass legislation to reduce the length of software patents.

See also this post about the tragedy of the anti-commons. The tragedy of the commons is misuse of a resource nobody owns. The tragedy of the anti-commons is the under-use of a resource that too many people own.

Building a DVD player requires using hundreds of patented inventions. No company could ever build a DVD player if it had to negotiate with all patent holders and obtain their unanimous consent. … Fortunately, the owners of the patents used in building DVD players have formed a single entity authorized to negotiate on their behalf. But if you’re creating something new that does not have an organized group of patent holders, there are real problems.

Stigler’s law and Avogadro’s number

Stigler’s law says that no scientific discovery is named after its original discoverer. Stigler attributed his law to Robert Merton, acknowledging that Stigler’s law obeys Stigler’s law.

Avogadro’s number may be an example of Stigler’s law, depending on your perspective. An episode of Engines of our Ingenuity on Josef Loschmidt explains.

The Italian, Romano Amadeo Carlo Avogadro, had suggested [in 1811] that all gases have the same number of molecules in a given volume. Loschmidt figured out [in 1865] how many molecules that would be.

You could argue that Avogadro’s constant should be named after Loschmidt, and some use the symbol L for the constant in honor of Loschmidt. Jean Perrin came up with more accurate estimates and proposed in 1909 that the constant should be named after Avogadro. Loschmidt made several important contributions to science that are now known by other’s names.

As I’d mentioned in an earlier post, there are some fun coincidences with Avogadro’s number.

  1. NA is approximately 24! (i.e., 24 factorial.)
  2. The mass of the earth is approximately 10 NA kilograms.
  3. The number of stars in the observable universe is 0.5 NA.

double.Epsilon != DBL_EPSILON

Here’s a pitfall in C# that keeps coming up. C# has a constant double.Epsilon that programmers coming from C naturally assume is the same as C’s DBL_EPSILON. It’s not. In fact, the former is hundreds of orders of magnitude smaller.

C#’s double.Epsilon is the closest floating point number to 0. C’s DBL_EPSILON is the distance between 1 and the closest floating point number greater than 1. Said another way, DBL_EPSILON is the smallest positive floating point number x such that 1 + x != 1, often called “machine epsilon.”

Typically double.Epsilon is on the order of 10^-324 and DBL_EPSILON is on the order of 10^-16. (These values could potentially change depending on the platform, but they hardly ever do.)

C# has no constant corresponding to DBL_EPSILON. This is unfortunate, since this constant appears frequently in numerical software. Why? Because it tells you, for example, when to stop adding series.

If DBL_EPSILON is on the order of 10^-16, that means that if you add two numbers that differ by more than 16 orders of magnitude, the sum doesn’t change. If you’re summing a decreasing series of numbers, say in order to evaluate a Taylor approximation, you might as well stop once the next term is 16 orders of magnitude smaller than the sum. If you keep going past that point, you’ll burn CPU cycles but you won’t change your answer.

DBL_EPSILON is almost always about 10^-16. But by giving it a name, you avoid having 10^-16 as a mysterious constant throughout code. And if your code should ever move to an environment with different floating point resolution, your code will correctly adjust to the new platform.

Related links

Just what do you mean by ‘scale’?

“Fancy algorithms are slow when n is small, and n is usually small.” — Rob Pike

Someone might object that Rob Pike’s observation is irrelevant. Everything is fast when the problem size n is small, so design your code to be efficient for large n and don’t worry about small n. But it’s not that simple.

Suppose you have two sorting algorithms, Simple Sort and Fancy Sort. Simple Sort is more efficient for lists with less than 50 element and Fancy Sort is more efficient for lists with more than 50 elements.

You could say that Fancy Sort scales better. What if n is a billion? Fancy Sort could be a lot faster.

But there’s another way a problem could scale. Instead of sorting longer lists, you could sort more lists. What if you have a billion lists of size 40 to sort?

People toss around the term “scaling,” assuming everyone has the same notion of scaling. But projects could scale along different dimensions. Whether Simple Sort or Fancy Sort scales better depends on how the problem scales.

The sorting example just has two dimensions: the length of each list and the number of lists. Software trade-offs are often much more complex. The more dimensions a problem has, the more opportunities there are for competing solutions to each claim that it scales better.

More posts on scaling