John Tukey and Aristotle

I just ran across a quote from Aristotle that seemed right in line with the quotes from John Tukey I posted the other day.

It is the mark of an educated man to look for precision in each class of things just so far as the nature of the subject admits.

I think Tukey and Aristotle may have gotten along well.

I believe Tukey said “There is no point in being precise when you don’t know what you’re talking about.” I’m going from memory, and that quote may not be verbatim. (I did a Google search on “john tukey quotes” and came up with maybe 20 pages that have the exact same three quotes from Tukey. I can’t imagine that 20 independent editors came up with the same three quotes. It’s not as if the man only said three memorable lines. I imagine there’s a great deal of copying going on.)

Here are a couple quotes from Tukey that Aristotle may have appreciated.

Finding the question is often more important than finding the answer.

The test of a good procedure is how well it works, not how well it is understood.

I have mixed feelings about the second quote. Sometimes you do have use things that work well even if you don’t understand why. For example, no one completely understands how anesthesia works. But Tukey was speaking in the context of statistical methods, and there I do see some virtue in using what you understand well even when something you don’t understand appears to work better. Maybe the poorly understood technique on appears to do better on a handful of examples and could fail on your data. But I believe Tukey was referring to techniques that many people have used successfully on a wide variety of problems even though the theoretical foundations haven’t been completely explored.

Getting started with IronPython

I’ve just started experimenting with IronPython, Microsoft’s implementation of Python built on .NET. You can download IronPython from here. I installed it from the zip file on one computer and from the MSI on another. I highly recommend the latter.

Installing from the zip file

The CodePlex download page has three files:

  • IronPython.msi
  • IronPython-2.0.1-Bin.zip
  • IronPython-2.0.1-Src.zip

My first thought was that I wasn’t interested in compiling IronPython from source, so I’d just download the bin file since it was smaller. I downloaded it, unzipped it, and copied it over to my C:bin directory. (I have a habit of installing languages in C:bin to placate software that assumes paths don’t contain spaces.  For example, if you install R in the default C:Program Files location, some add-ons will break.) The typical command line “hello world” program worked just fine. The example from the readme file on how to pop up a window using WinForms worked fine as well. But my attempt to use a standard library by typing import urllib didn’t work. The standard Python modules are not in the search path by default. The tutorial that comes with IronPython explains how to fix this.  I added the following two lines to C:binIronPython-2.0.1Libsite.py and then was able to use standard modules like urllib .

import sys
sys.path.append(r"C:binPython25Lib")

I had a non-ferrous version of Python installed already in C:binPython25 so I just reused those files. The tutorial explains where to get the standard library files if IronPython is the first Python you install.

Installing from the MSI file

On a different computer, I downloaded the MSI file and ran it. This was a much nicer experience. The installer has a check box to run NGen on the .NET code in IronPython. I checked this box assuming it would make IronPython run faster in the future.

The standard modules worked immediately with no configuration on my part.The installer created a sophisticated site.py file that builds the path on start-up. Presumably this site.py file will add new modules to my path as I install things in the future.

Recap of the Robert Martin/Joel Spolsky brouhaha

Here’s a timeline of the controversy between Robert Martin and Joel Spolsky.

  1. Robert “Uncle Bob” Martin speaks about SOLID design principles on Scott Hanselman’s podcast #145.
  2. Joel Spolsky and Jeff Atwood say some negative things about Uncle Bob and his SOLID principles on their StackOverflow podcast #38 (transcript).
  3. Uncle Bob reacts on his blog.
  4. Joel and Jeff have Uncle Bob as a guest on their podcast #41 (transcript) and Joel apologizes for getting too personal.
  5. Scott Hanselman has Uncle Bob back on his podcast #150.

The discussion got a little heated in the middle, but it all seems to have settled down.

In Scott’s latest podcast, Uncle Bob reviews the whole controversy briefly but spends most of his time talking about professionalism and how to train software developers.

Posts regarding Robert Martin:

The data may not contain the answer

Mark Reid sent me a link to a couple quotes by John Tukey that I had not seen before. First,

To statisticians, hubris should mean the kind of pride that fosters an inflated idea of one’s powers and thereby keeps one from being more than marginally helpful to others. … The feeling of “Give me (or more likely even, give my assistant) the data, and I will tell you what the real answer is!” is one we must all fight against again and again, and yet again.

Also,

The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

Here are some more posts about John Tukey:

Largest and second largest cities

Mark Dominus has posted an interesting article that looks that the populations of largest and second largest cities by state. The largest city in Illinois (Chicago) is about 25 times as large as the next largest city in the state (Peoria). Toward the other end of the scale, the Dallas-Ft. Worth metropolitan area is only about 10% larger than the Houston metropolitan area where I live.

Warning: If you’re from Rhode Island, don’t read Mark’s article unless you have a good sense of humor.

Gerald Weinberg’s law of twins

In his book Secrets of Consulting, Gerald Weinberg tells the story of a woman who had several pairs of twins. Someone asked her if she and her husband got twins every time. She replied no, most of the time they got nothing at all. Just as intimacy doesn’t usually result in one child, much less two, most efforts in business don’t produce any significant results. Weinberg summarizes this observation in Weinberg’s Law of Twins:

Most of the time, for most of the world, no matter how hard people work at it, nothing of any significance happens.

Later he turns this around and states the principle more positively in Weinberg’s Law of Twins, Inverted:

Some of the time, in some places, significant change happens — especially when people aren’t working hard at it.

Related post: Four reasons we don’t apply the 80/20 rule.

Ignorance doesn’t change reality: statistics pitfall

Here’s an easy error to fall into in statistics. Suppose I have n samples from a normal(μ, σ2) distribution, say n = 16, and σ is unknown. What is the distribution of the average of the samples? A common mistake is to say Student-t: if σ is known, the sample mean has a normal distribution, otherwise it has a t distribution.

But that’s wrong. Your ignorance of σ does not change the distribution of the data. There’s no spooky quantum effect that changes the data based on your knowledge. A linear combination of independent normal random variables is another normal random variable, so the sample mean has a normal distribution, whether or not you know its variance. Your knowledge or ignorance of σ doesn’t change the distribution of the data; it changes what you’re likely to want to do regarding the data. When the variance is unknown, you use procedures involving the sample variance rather than the distribution variance. This doesn’t change the distribution of the data but it changes the distribution you construct (implicitly) in your analysis of the data.