Michael Brecker

When I was in college, my saxophone teacher recommended I study Michael Brecker. I enjoyed his music, especially his recordings with Steps Ahead, but for some reason I quit listening to Brecker sometime after college. Then earlier this year I bought Brecker’s last album Pilgrimage after reading a glowing review.

Brecker recorded Pilgrimage as he was dying of leukemia, but there’s nothing morbid about the album. It’s upbeat, complex, and beautiful. Brecker spent his final days pursuing his art surrounded by friends.

Unit test boundaries

Phil Haack has a great article on unit test boundaries. A unit test must not touch the file system, interact with a database, or communicate across a network. Tests that break these rules are necessary, but they’re not unit tests. With some hard thought, the code with external interactions can be isolated and reduced. This applies to both production and test code

As with most practices related to test-driven development, the primary benefit of unit test boundaries is the improvement in the design of the code being tested. If your unit test boundaries are hard to enforce, your production code may have architectural boundary problems. Refactoring the production code to make it easier to test will make the code better.

Three ways of tuning an adaptively randomized trial

Yesterday I gave a presentation on designing clinical trials using adaptive randomization software developed at M. D. Anderson Cancer Center. The heart of the presentation is summarized in the following diagram.

Diagram of three methods of tuning adaptively randomized trial designs

(A slightly larger and clearer version if the diagram is available here.)

Traditional randomized trials use equal randomization (ER). In a two-arm trial, each treatment is given with probability 1/2. Simple adaptive randomization (SAR) calculates the probability that a treatment is the better treatment given the data seen so far and randomizes to that treatment with that probability. For example, if it looks like there’s an 80% chance that Treatment B is better, patients will be randomized to Treatment B with probability 0.80. Myopic optimization (MO) gives each patient what appears to be the best treatment given the available data with no randomization.

Myopic optimization is ethically appealing, but has terrible statistical properties. Equal randomization has good statistical properties, but will put the same number of patients on each treatment, regardless of the evidence that one treatment is better. Simple adaptive randomization is a compromise position, retaining much of the power of equal randomization while also treating more patients on the better treatment on average.

The adaptive randomization software provides three ways of compromising between the operating characteristics ER and SAR.

  1. Begin the trial with a burn-in period of equal randomization followed by simple adaptive randomization.
  2. Use simple adaptive randomization, except if the randomization probability drops below a certain threshold, substitute that minimum value.
  3. Raise the simple adaptive randomization probability to a power between 0 and 1 to obtain a new randomization probability.

Each of these three approaches reduces to ER at one extreme and SAR at the other. In between the extremes, each produces a design with operating characteristics somewhere between those of ER and SAR.

In the first approach, if the burn-in period is the entire trial, you simply have an ER trial. If there is no burn-in period, you have an SAR trial. In between you could have a burn-in period equal to some percentage of the total trial between 0 and 100%. A burn-in period of 20% is typical.

In the second approach, you could specify the minimum randomization probability as 0.5, negating the adaptive randomization and yielding ER. At the other extreme, you could set the minimum randomization probability to 0, yielding SAR. In between you could specify some non-zero randomization probability such as 0.10.

In the third approach, a power of zero yields ER. A power of 1 yields SAR. Unlike the other two approaches, this approach could yield designs approaching MO by using powers larger than 1. This is the most general approach since it can produce a continuum of designs with characteristics ranging from ER to MO. For more on this approach, see Understanding the exponential tuning parameter in adaptively randomized trials.

So with three methods to choose from, which one do you use? I did some simulations to address this question. I expected that all three methods would perform about the same. However, this is not what I found. To read more, see Comparing methods of tuning adaptive randomized trials.

Update: The ideas in this post and the technical report mentioned above have been further developed in this paper.

Related: Adaptive clinical trial design

Why so few electronic medical records

Computerworld has a good article on why electronic medical records are so slow to appear. Many people I’ve talked to believe that medical data is just harder to work with than other kinds of data. They see the barriers to electronic medical records as primarily technical. That’s hard to swallow when nearly every other sector of the economy has electronic records. As the Computerworld article says, we’ve had the technology to pull this off for 30 years. There are more plausible economic explanations for why EMRs are uncommon. In a nutshell, the party that pays to develop an EMR is not the party that reaps most of the financial benefit so there’s little incentive to move forward.

Why heights are not normally distributed

In my previous post, I speculated on why heights are normally distributed, that is, why their statistical distribution is very nearly Gaussian. In this post I want to point out where it breaks down. I’ll look closely at an example from Elementary Statistics by Mario Triola.

At the beginning of the chapter, we noted that the United States Army requires that women’s heights be between 58 and 80 inches. Find the percentage of women satisfying that requirement. Again assume that women have heights that are normally distributed with a mean of 63.6 inches and a standard deviation of 2.5 inches.

The book gives a solution of 98.7%. That’s probably a fairly realistic result, though maybe not to three significant figures. My quibble is with one of the details along the way to the solution, not the final solution itself.

A height of 80 inches is 6.56 standard deviations away from the mean. The probability of a normal random random variable taking on a value that far away from its mean is between 2 and 3 out of 100 billion. Since there are about 7 billion people on our planet, and less than half of these are adult women, this says it would be unlikely to ever find a woman 80 inches (6′ 8″) tall. But there are many women that tall or taller. The world record is 91 inches (7′ 7″), or about 11 standard deviations from the mean. If heights really were normally distributed, the probability of such a height would be 1.9 x 10-28 or about 2 chances in 10,000,000,000,000,000,000,000,000,000. The fit is even worse in the lower tail of the distribution. The world’s shortest woman is 25.5 inches tall, 15 standard deviations below the mean.

The normal distribution describes heights remarkably well near the mean, even a couple standard deviations on either side of the mean. But in the extremes, such as six standard deviations out, the model doesn’t fit well. The absolute error is small: the normal model predicts that women 80 inches tall or taller are uncommon, and indeed they are. But they are not nearly as uncommon as the model suggests. The relative error in the model when predicting extreme values is enormous.

The normal model often doesn’t fit well in the extremes. It often underestimates the probability of rare events. The Black Swan gives numerous examples of rare events that were not as rare as a normal distribution would predict. What might account for this poor fit?

Well, why should we expect a normal distribution to fit well in the first place? Because of the central limit theorem. This theorem says roughly that if you average a large number of independent random variables, the result has an approximately normal distribution. But there are many ways the assumptions of this theorem could fail to hold: the random variables might not be independent, they might not be identically distributed, they might have thick tails, etc. And even when the assumptions of the central limit do apply, the theorem only guarantees that the absolute error in the normal approximation goes to zero. It says nothing about the relative error. That may be why the normal model accurately predicts what percentage of women are eligible to serve in the US Army but does not accurately predict how many women are over 6′ 8″ tall.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

Why heights are normally distributed

The canonical example of the normal distribution given in textbooks is human heights. Measure the heights of a large sample of adult men and the numbers will follow a normal (Gaussian) distribution. The heights of women also follow a normal distribution. What textbooks never discuss is why heights should be normally distributed.

Why should heights be normally distributed? If height were a simple genetic characteristic, there would be two possibilities: short and tall, like Mendel’s peas that were either wrinkled or smooth but never semi-wrinkled. But height is not a simple characteristic. There are numerous genetic and environmental factors that influence height. When there are many independent factors that contribute to some phenomena, the end result may follow a Gaussian distribution due to the central limit theorem.

The normal distribution is a remarkably good model of heights for some purposes. It may be more interesting to look at where the model breaks down. See my next post, why heights are not normally distributed.

Update: See Distribution of adult heights

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

Normal approximation errors

Many well-known probability distributions converge to the normal distribution as some parameter or other increases. In a sense this is not very interesting: All roads lead to Rome. But though destinations are the same, the paths to the destination are varied and more interesting.

I’ve posted notes on how the error in the normal approximation varies for the beta, gamma, and Student t distributions.

The animation below shows the error in the normal approximation to the gamma distribution as the shape parameter grows from 3 to 23. See the gamma notes for more details.

Animation of error in normal approximation to gamma as shape parameter increases

(If the image above is not animated in your browser, visit the gamma notes page where the image should display correctly in all browsers.)

Click to find out more about consulting for numerical computing

Random number generation in C++ TR1

The C++ Standard Library Technical Report 1 (TR1) includes a specification for random number generation classes.

The Boost library has supported TR1 for a while. Microsoft released a feature pack for Visual Studio 2008 in April that includes support for most of TR1. (They left out support for mathematical special functions.) Dinkumware sells a complete TR1 implementation. And gcc included support for TR1 in version 4.3 released in May. (According to the gcc status page the latest version supports most of TR1 except regular expressions. I’ve been able to get some TR1 features to work using gcc 4.3.1 but have not been able to get random number generation to work yet.)

I’ve posted a set of notes that explain how to use the C++ TR1 random number generation classes in Visual Studio 2008. The notes include sample code and point out a few gotchas. They also explain how to use the C++ TR1 classes to generate from distributions not directly supported by the TR1.

Related: Need help with randomization?

Scaling the number of projects

Software engineers typically use the term “horizontal scalability” to mean throwing servers at a problem. A web site scales horizontally if you can handle increasing traffic simply by adding more servers to a server farm. I think of horizontal scalability as scalability as the number of projects increases, rather than increasing the performance demands on a single project. My biggest challenges have come from managing lots of small projects, more projects than developers.

I’ve seen countless books and articles about how to scale a single project, but I don’t remember ever seeing anything written about scaling the number of projects. It sounds easy to manage independent projects: if the projects are for different clients and they have different developers, just let each one go their own way. But there are two problems. One is a single developer maintaining an accumulation of his or her own projects, and the other is the ability (or more important, the inability) of peers to maintain each other’s projects. Projects that were independent during development become dependent in maintenance because they are maintained at the same time by the same people. Consistency across projects didn’t seem necessary during development, but then in maintenance you look back and wish there had been more consistency.

Maintenance becomes a tractor pull. Robert Martin describes a software tractor pull in his essay The Tortoise and the Hare:

Have you ever been to a tractor pull? Imagine a huge arena filled with mud and churned up soil. Huge tractors tie themselves up to devices of torture and try to pull them across the arena. The devices get harder to pull the farther they go. They are inclined planes with wheels on the rear and a wide shoe at the front that sits squarely on the ground. There is a huge weight at the rear that is attached to a mechanism that drags the weight up the inclined plane and over the shoe as the wheels turn. This steadily increases the weight over the shoe until the friction overcomes the ability of the tractor.

Writing software is like a tractor pull. You start out fast without a lot of friction. Productivity is high, and you get a lot done. But the more you write the harder it gets to write more. The weight is being dragged up over the shoe. The more you write the more the mess builds. Productivity slows. Overtime increases. Teams grow larger. More and more code is piled up over the shoe, and the development team grinds to a halt unable to pull the huge mass of code any farther through the mud.

Robert Martin had in mind a single project slowing down over time, but I believe his analogy applies even better to maintenance of multiple projects.

To scale your number of projects you’ve got to enforce consistency before there’s an immediate need for it. But there you face several dangers. Enforcing apparently unnecessary consistency could make you appear arbitrary and damage morale. And you’ll make some wrong decisions. You’ve got to have a lot of experience to predict what sort of policies you’ll wish in the future that you had enforced. These issues are challenging when scaling a single project, but they are more of challenging when scaling across smaller projects because you don’t get feedback as quickly. On a single large project, you may feel the pain of a bad decision quickly, but with multiple small projects you may not feel the pain until much later.

Quality is critical when scaling the number of projects. Each project needs to be better than seems necessary. When you look at a single project in isolation, maybe it’s acceptable to have one bug report a month. But then when you have an accumulation of such projects, you’ll get bug reports every day. And the cost per bug fix goes up over time because developers can most easily fix bugs in the code freshest in their minds. Fixing a bug in an old project that no one wants to think about anymore will be unpleasant and expensive.

Scaling your number of projects requires more discipline than scaling a single project because feedback takes longer. Although scaling single projects gets far more attention, I suspect a lot of people are struggling with scaling their number of projects.

Computer processes, human processes, and scalability

Jeff Atwood had a good post today about database normalization and denormalization recently. A secondary theme of his post is scalability, how well software performs as inputs increase. A lot of software developers worry too much about scalability, or they worry about the wrong kind of scalability.

In my career, scalability of computer processes has usually not been the biggest problem, even though I’ve done a lot of scientific computing. I’ve more often run into problems with the scalability of human processes. When I use the phrase “this isn’t going to scale,” I usually mean something like “You’re not going to be able to remember all that” or “We’re going to go crazy if we do a few more projects this way.” 

Getting to the bottom of things

In the article Neo-Amish Drop Outs, Kevin Kelly shares a quote from Donald Knuth explaining why he (Knuth) seldom reads email.

Rather than trying to stay on top of things, I am trying to get to the bottom of things.

Getting to the bottom of things — questioning assumptions, investigating causes, making connections — requires a different state of mind than staying on top of things. Deep thought is difficult when you’re frequently interrupted. It’s just as difficult when you anticipate being interrupted even if the interruption never comes.

We don’t task switch nearly as well as we think we do. We think we can switch instantly between tasks, when in reality it takes at least 15 minutes to recover our thoughts, and that’s if we were doing something relatively simple. With more complex tasks, it takes longer.

When I began to understand this a few years ago, I asked a colleague how long it takes her to recover from an interruption. She said three days. I thought she was exaggerating, but now I appreciate that it really can take a few days to get into a hard problem.

Accented letters in HTML, TeX, and MS Word

I frequently need to look up how to add diacritical marks to letters in HTML, TeX, and Microsoft Word, though not quite frequently enough to commit the information to my long-term memory. So today I wrote up a set of notes on adding accents for future reference. Here’s a chart summarizing the notes.

gravegrave\`CTRL + `
acuteacute\'CTRL + '
circumflexcirc\^CTRL + ^
tildetidle\~CTRL + SHIFT + ~
umlautuml\"CTRL + SHIFT + :
cedillacedil\cCTRL + ,
æ, Ææ, Æ\ae, \AECTRL + SHIFT + & + a or A
ø, Øø, Ø\o, \OCTRL + / + o or O
å, Åå, Å\aa, \AACTRL + SHIFT + @ + a or A

The notes go into more details about how accents function in each environment and what limitations each has. For example, LaTeX will let you combine any accent with any letter, but MS Word and HTML only support letter/accent combinations that are common in spoken languages.

* * *

For daily tips on LaTeX and typography, follow @TeXtip on Twitter.

TeXtip logo

Was Einstein an atheist?

From time to time people speculate whether Einstein was an atheist. Richard Dawkins, for example, said in his book The God Delusion that Einstein was an atheist. However, Einstein addressed this point directly:

I am not an atheist, and I don’t think I can call myself a pantheist.

This quote comes from There Is a God by Anthony Flew. Flew in turn credits Max Jammer’s book Einstein and Religion, page 44.

For sixty years Anthony Flew was an apologist for atheism. Four years ago he announced that he had changed is mind. Last year he published There Is a God, an account of how he first became an atheist and of how decades later he reversed his position.