Broken windows theory and programming

The broken windows theory says that cracking down on petty crime reduces more serious crime. The name comes from the explanation that if a building has a few broken windows, it invites vandals to break more windows and eventually burn down the building. Turned around, this suggests that punishing vandalism could lead to a reduction in violent crime. Rudy Giuliani is perhaps the most visible proponent of the theory.  His first initiative as mayor of New York was to go after turnstile jumpers and squeegeemen as a way of reducing crime in city. Crime rates dropped dramatically during his tenure.

In the book Pragmatic Thinking and Learning, Andy Hunt applies the broken windows theory to software development.

Known problems (such as bugs in code, bad process in an organization, poor interfaces, or lame management) that are uncorrected have a debilitating, viral effect that ends up causing even more damage.

I’ll add a couple of my pet peeves to Andy Hunt’s list.

The first is compiler warnings. I can’t understand why some programmers are totally comfortable with their code having dozens of compiler warnings. They’ll say “Oh yeah, I know about that. It’s not a problem.” But then when a warning shows up that is trying to tell them something important, the message gets lost in the noise. My advice: Just fix the code. In very exceptional situations, explicitly turn off the warning.

The second is similar. Many programmers blithely ignore run-time exceptions that are written to an event log. As with compile warnings, they justify that these exceptions are not really a problem. My advice: If it’s not really a problem, then don’t log it. Otherwise, fix it.

Are men better than women at chess?

The most recent 60-Second Science podcast discusses the abilities of men and women in playing chess. One can argue that men are better than women at playing chess because all world champions have been men. However, that only suggests that the best men are better than the best women. It is possible that the distribution of chess ability is identical for men and women. Since more men than women play chess, the best men are the best of a larger population.

I looked at this exact issue in an earlier post on Olympic performance. That posts asks what to expect if men and women had equal ability in a sport that more men chose to compete in. The same considerations apply to country sizes. If two countries have equal ability at a sport, the larger country is likely to field a better team. The best performers from a larger group are typically better than the best performers from a smaller group. This post looks at how to quantify this observation using order statistics.

The podcast mentioned above says that the difference in male and female championship performance “can be almost entirely explained by statistics.” I assume this means that an order statistic model with identical distributions fits the data well.

Top 10 posts of 2008

This blog started in January 2008, so the best posts of the year are also the best posts of all time!

Here’s a list of a couple of the most popular posts on this site in each of five categories.

Business and management

Medieval project management
Peter Drucker and abandoning projects


Getting to the bottom of things
Simple legacy

Software development

Experienced programmers and lines of code
Programmers aren’t reading programming books


Jenga mathematics
How to compute binomial coefficients


Wine, Beer, and Statistics
Why microarray study conclusions are so often wrong

Early evidence-based medicine

In the 1840’s, Ignaz Semmelweis, an assistant professor in the maternity ward of Vienna General Hospital, demonstrated that mortality rates dropped from 12 percent to 2 percent when doctors washed their hands between seeing patients.

His colleagues resisted his findings for a couple reasons. First, they didn’t want to wash their hands so often. Second, Semmelweis had demonstrated association but did not give an underlying cause. (This was a few years before, and led to, the discovery of the germ theory.) He was fired, had a nervous breakdown, and died in a mental hospital at age 47. (Reference: Super Crunchers)

We know now Semmelweis was right and his colleagues wrong. It’s tempting to think that people in the 1840’s were either ignorant or lazy and that we’re different now. But human nature hasn’t changed. If someone asked you to do something you didn’t want to do and couldn’t explain exactly why you should do it, would you listen? You would naturally be skeptical, and it’s a good thing, since most published research results are false.

One thing that has changed since 1840 is the level of sophistication in interpreting data.  Semmelweis could argue today that his results warrant consideration despite the lack of a causal explanation, based on the strength of his data. Such an argument could be evaluated more readily now that we have widely accepted ways of measuring the strength of evidence. On the other hand, even the best statistical evidence does not necessarily cause people to change their behavior.

This New York Times editorial is a typical apologetic for evidence-based medicine. Let’s base medical decisions on evidence! But of course medicine does base decisions on evidence. The question is how medicine should use evidence, and this question is far more complex than it first appears.

Related: Adaptive clinical trial design

My favorite Christmas carol

A few years ago I noticed the words to Hark, the Herald Angels Sing as if I’d never heard the song before. Since then I’ve decided that it is my favorite carol because of its rich language and deep theology. Here are the words from the second verse that jumped out at me the first time I really listened to the carol.

Veiled in flesh the Godhead see,
Hail the incarnate Deity!
Pleased as man with man to dwell,
Jesus, our Emmanuel.

I often prefer the second and third verses of famous hymns. They may be no better than first verses, but they are less familiar and more likely to grab my attention.

Merry Christmas everyone.

Small advantages show up in the extremes

I’ve been reading Malcolm Gladwell’s book Outliers: The Story of Success. One of the examples he gives early in his book studies the best Canadian hockey players. A disproportionate number of the top players were born in the first quarter of the year.

The eligibility cutoff for age-class hockey league assignments is January 1. Those with birthdays early in the year will be older when they are first eligible to play for a given age group. On average, these children will be larger and more skilled than those born later in the year. Being a few months older is initially an advantage, but It would seem that it should wear off over time. It doesn’t. Those who had an age advantage, coupled with talent, developed a little more confidence and received a little more attention than those who did not. The advantages of extra confidence and attention carried on after the direct advantage of age disappeared.

I wrote a post a while back that looks at this sort of situation in some mathematical detail. Suppose the abilities of two groups are normally distributed with the same variance but the mean of one group is shifted just slightly. (The post I referred to looks at male and female Olympic athletes, but we could as easily think about Canadian hockey players born in December and January.) The further you go out in the extremes, the more significant that shift becomes.

For another example, think of how heights are distributed. Men are taller than women on average, but it’s not unheard of for the tallest person in a small group to be a woman. However, as the group gets larger, the odds that the tallest person in the group is male increase exponentially. As it turns out, average heights of men and women differ by about six inches. But even if average heights differed by the slightest amount, the odds in favor of the tallest person in a group being male would still increase exponentially as the group size increases.

Debasing the word "technology"

It bugs me to hear people say “technology” when they really mean “computer technology”, as if drug design, for example, isn’t technology. But now I’ve noticed some folks are even more narrow in their use of the term. They use “technology” to mean essentially blogs and podcasts.

So if you design satellites, program supercomputers, or clone sheep, but don’t read blogs and listen to podcasts, you’re just out of it. Someone should tell the Rice University nano technology group that they should change their name since they’re not really into technology. Unless of course they blog or podcast about their work.

Negative space in operating systems

Unix advocates often say Unix is great because it has all these powerful tools. And yet practically every Unix tool has been ported to Windows. So why not just run Unix tools on Windows so that you have access to both tool sets? Sounds reasonable, but hardly anyone does that. People either use Unix tools on Unix or Windows tools on Windows.

Part of the reason is compatibility. Not binary compatibility, but cultural compatibility. There’s a mental tax for shifting modes of thinking as you switch tools.

I think the reason why few people use Unix tools on Windows is a sort of negative space. Artists use the term negative space to discuss the importance of what is not in a work of art, such as the white space around a figure or the silence framing a melody.

Similarly, part of what makes an operating system culture is what is not there. You don’t have to worry about what’s not there. And not worrying about something frees up brain capacity to think about something else. Having too many options can be paralyzing. I think that even though people say they like Unix for what is there, they actually value what is not there.

* * *

For daily tips on using Unix, follow @UnixToolTip on Twitter.

UnixToolTip twitter icon

Partial function application in Python

My previous post showed how it is possible to do partial function application in C++ by using function objects. Here I’ll show how much simpler this can be done in Python.

As before, we want to evaluate a function of one variable and three parameters: f(x; a, b, c) = 1000a + 100b + 10c + x. Here’s an example how we could do this in Python using the functools module.

from functools import partial

def f(a, b, c, x):
    return 1000*a + 100*b + 10*c + x

g = partial(f, 3, 1, 4)

print g(5)

The code will print 3145.

The function f above is so simple that it may be hard to imagine why you would want to do such a thing. The earlier post gives a more realistic application, using partial function application in numerical integration and root-finding.

Functional programming in C++ with function objects

Here’s something I do all the time. I have a function of one variable and several parameters. I implement it as a function object in C++ so I can pass it on to code that does something with functions of one variable, such as integration or optimization. I’ll give a trivial example and then show the most recent real problem I’ve worked on.

Say I have a function f(x; a, b, c) = 1000a + 100b + 10c + x. In a sense this is simply a function of four variables. But the connotation of using a semicolon rather than a comma after the x is that I think of x as being a variable and I think of a, b, and c as parameters. So f is a function of one variable that depends on three constants. (A “parameter” is a “constant” that can change!)

I create a C++ function object with two methods. One method is a constructor that takes the function parameters as arguments and saves them to member variables. The other method is an overload of the parenthesis method. That’s what makes the class a function object. By overloading the parenthesis method, I can call an instance of the class as if it were a function. Here’s some code.

class FunctionObject
	FunctionObject(double a, double b, double c)
		m_a = a;
		m_b = b;
		m_c = c;

	double operator()(double x) const
		return 1000*m_a + 100*m_b + 10*m_c + x;

	double m_a;
	double m_b;
	double m_c;

So maybe I instantiate an instance of this function object and pass it to a function that finds the maximum value over an interval [a, b]. The code might look like this.

FunctionObject f(3, 1, 4);
double maximum = Maximize(f, a, b);

Here’s a more realistic example. A few days ago I needed to solve this problem. Given user input parameters λ, σ, n, and ξ, find b such that the following holds.

int_0^1 frac{1}{sqrt{2}nu} Phileft(frac{lambda sqrt{2nu n}}{sqrt{sigma^2(1 - 2nu) + bn}}right) , dnu = xi

The function Φ above is the CDF of a standard normal random variable, defined here.

To solve this problem, I wrote a function object to evaluate the left side of the equation above. It takes λ, σ, and n as constructor arguments and takes b as an argument to operator(). Then I passed the function object to a root-finding method to solve for the value of b that makes the function value equal ξ. But my function is defined in terms of an integral, so I needed to write another function object first that returns the integrand. Then I pass that function object to this numerical integration routine.  So I had to write two function objects to solve this problem.

There are several advantages to function objects over functions. For example, I would typically do parameter validation in the constructor. Quite often I also do some expensive calculations in the constructor and cache the results so that each call to operator() is then more efficient. Maybe I want to keep track of how often the function is called, so I put in some sort of odometer method that increments a counter with each call.

Unfortunately there’s a fair amount of code to write in order to implement even the simplest function. This effort hardly matters in production code; so many other things take more time. But it is annoying when doing some quick exploration. The next post shows how this can be done much easier in Python. The Python approach would be much easier for small problems, but it doesn’t have the advantages mentioned above such as caching expensive calculations in a constructor.

The probability that Shakespeare wrote a play

Some people object to asking about the probability that Shakespeare wrote this or that play. One objection is that someone has already written the play, either Shakespeare or someone else. If Shakespeare wrote it, then the probability is one that he did. Otherwise the probability is zero. By this reasoning, no one can make probability statements about anything that has already happened. Another objection is that probability only applies to random processes. We cannot apply probability to questions about document authorship because documents are not random.

I just ran across a blog post by Ted Dunning that weighs in on this question. He writes

The statement “It cannot be probability …” is essentially a tautology. It should read, “We cannot use the word probability to describe our state of knowledge because we have implicitly accepted the assumption that probability cannot be used to describe our state of knowledge”.

He goes on to explain that if we think about statements of knowledge in terms of probabilities, we get a consistent system, so we might as well reason as if it’s OK to use probability theory.

The uncertainty about the authorship of the play does not exist in history — an omniscient historian would know who wrote it. Nor does it exist in nature — the play was not created by a random process. The uncertainty is in our heads. We don’t know who wrote it. But if we use numbers to represent our uncertainty, and we agree to certain common-sense axioms about how this should be done, we inevitably get probability theory.

As E. T. Jaynes once put it, “probabilities do not describe reality — only our information about reality.”

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon