God is in the details

Some say “The devil is in the details,” meaning solutions break down when you examine them closely enough. Some say “God is in the details,” meaning opportunities for discovery and creativity come from digging into the details. Both are true, but the latter is more interesting.

I posted something along these lines a few weeks ago, Six quotes on digging deep. In that post I quote Richard Feynman

… nearly everything is really interesting if you go into it deeply enough …

I thought about this again last night when I ran across a post by Andrew Gelman entitled God is in every leaf of every tree. He has a similar quote from Feynman.

No problem is too small or too trivial if we really do something about it.

From there he links to a post where he describes what he calls the paradox of importance. Sometimes we can do our most creative work on the least important problems. The important problems often demand quick solutions, so we fall back on familiar methods.

Everything in this post applies equally well to creativity in other fields: graphic design, music composition, literature, etc.  However, Gelman is talking about creativity specifically in the context of statistics. Statistics is a prime example of something that appears dull from the outside but becomes fascinating in the details. A course in statistics can be mind-numbingly dull when the emphasis is on rote application of black-box procedures. Looking inside the boxes is more interesting, and designing the boxes is most interesting.

Related post: Simple legacy

Dose-finding: why start at the lowest dose?

You’ve got a new drug and it’s time to test it on patients. How much of the drug do you give? That’s the question dose-finding trials attempt to answer.

The typical dose-finding procedure starts by selecting a small number of dose levels, say four or five. The trial begins by giving the lowest dose to the first few patients, and there is some procedure for deciding when to try higher doses. Convention says it is unethical to start at any dose other than lowest dose. I will give several reasons to question convention.

Suppose you want to run a clinical trial to test the following four doses of Agent X: 10 mg, 20 mg, 30 mg, 50 mg. You want to start with 20 mg. Your trial goes for statistical review and the reviewer says your trial is unethical because you are not starting at the lowest dose. You revise your protocol saying you only want to test three doses: 20 mg, 30 mg, and 50 mg. Now suddenly it is perfectly ethical to start with a dose of 20 mg because it is the lowest dose.

The more difficult but more important question is whether a dose of 20 mg of Agent X is medically reasonable. The first patient in the trial does not care whether higher or lower doses will be tested later. He only cares about the one dose he’s about to receive. So rather than asking “Why are you starting at dose 2?” reviewers should ask “How did you come up with this list of doses to test?”

A variation of the start-at-the-lowest-dose rule is the rule to always start at “dose 1”. Suppose you revise the original protocol to say dose 1 is 20 mg, dose 2 is 30 mg, and dose 3 is 50 mg. The protocol also includes a “dose -1” of 10 mg. You explain that you do not intend to give dose -1, but have included it as a fallback in case the lowest dose (i.e. 20 mg) turns out to be too toxic. Now because you call 20 mg “dose 1” it is ethical to begin with that dose. You could even begin with 30 mg if you were to label the two smaller doses “dose -2” and “dose -1.” With this reasoning, it is ethical to start at any dose, as long as you call it “dose 1.” This approach is justified only if the label “dose 1” carries the implicit endorsement of an expert that it is a medically reasonable starting dose.

Part of the justification for starting at the lowest dose is that the earliest dose-finding methods would only search in one direction. This explains why some people still speak of “dose escalation” rather than “dose-finding.” More modern dose-finding methods can explore up and down a dose range.

The primary reason for starting at the lowest dose is fear of toxicity. But when treating life-threatening diseases, one could as easily justify starting at the highest dose for fear of under treatment. (Some trials do just that.) Depending on the context, it could be reasonable to start at the lowest, highest, or any dose in between.

The idea of first selecting a range of doses and then deciding where to start exploring seems backward. It makes more sense to first pick the starting dose, then decide what other doses to consider.

Related: Adaptive clinical trial design

Feasibility studies

Jeff Atwood gives a summary of Facts and Fallacies of Software Engineering by Robert Glass on his blog. I was struck by point #14:

The answer to a feasibility study is almost always “yes”.

I hadn’t thought about that before, but it certainly rings true. I can’t think of an exception.

Some say about half of all large software projects fail, and presumably many of these failures passed a feasibility study. Why can’t we predict whether a project stands a good chance of succeeding? Are committees sincerely overly optimistic, or do they recognize doomed projects but tell the sponsor what the sponsor wants to hear?

Related post: Engineering statistics

Innovation IV

John Tukey said

efficiency = statistical efficiency x usage.

I don’t know the context of this quote, but here’s what I think Tukey meant. The usefulness of a statistical method depends not just on the method’s abstract virtues, but also on how often the method can be used and how often in fact it is used. This ties in with Michael Schrage’s comment that innovation is not what innovators do but what customers adopt.

Linear interpolator

I added a form to my web site yesterday that does linear interpolation. If you enter (x1, y1) and (x2, y2), it will predict x3 given y3or vice versa by fitting a straight line to the first two points. It’s a simple calculation, but it comes up just often enough that it would be handy to have a page to do it.

Innovation III

In his book Diffusion of Innovations Everett Rogers lists five factors in determining rate of adoption of an innovation.

First is the relative advantage of the innovation. This is not limited to objective improvements but also includes factors such as social prestige.

The second is compatibility with existing systems and values.

Third is complexity, especially perceived complexity.

The fourth is trialability, how easily someone can try out the innovation without making a commitment.

The fifth is observability, whether the advantages of the innovation are visible.

Innovators are often criticized for compatibility, for not making a larger break from the past. After Bjarne Stroustrup invented the C++ programming language, many people said he should have sacrificed compatibility with C in order to make C++ a better language. However, had he done so, C++ would not have become popular enough to gain the critics’ attention. As Stroustrup said in an interview, “There are just two kinds of languages: the ones everybody complains about and the ones nobody uses.”

Innovation II

In 1601, an English sea captain did a controlled experiment to test whether lemon juice could prevent scurvy.  He had four ships, three control and one experimental.  The experimental group got three teaspoons of lemon juice a day while the control group received none. No one in the experimental group developed scurvy while 110 out of 278 in the control group died of scurvy. Nevertheless, citrus juice was not fully adopted to prevent scurvy until 1865.

Overwhelming evidence of superiority is not sufficient to drive innovation.

Source: Diffusion of Innovations

Innovation I

Innovation is not the same as invention. According to Peter Denning,

An innovation is a transformation of practice in a community. It is not the same as the invention of a new idea or object. The real work of innovation is in the transformation of practice. … Many innovations were preceded or enabled by inventions; but many innovations occurred without a significant invention.

Michael Schrage makes a similar point.

I want to see the biographies and the sociologies of the great customers and clients of innovation. Forget for awhile about the Samuel Morses, Thomas Edisons, the Robert Fultons and James Watts of industrial revolution fame. Don’t look to them to figure out what innovation is, because innovation is not what innovators do but what customers adopt.

Innovation in the sense of Denning and Schrage is harder than invention. Most inventions don’t lead to innovations.

The simplest view of the history of invention is that Morse invented the telegraph, Fulton the steamboat, etc. A sophomoric view is that men like Morse and Fulton don’t deserve so much credit because they only improved on and popularized the inventions of others. A more mature view is that Morse and Fulton do indeed deserve the credit they receive. All inventors build on the work of predecessors, and popularizing an invention (i.e. encouraging innovation) requires persistent hard work and creativity.

The simplest thing that might work

Ward Cunningham‘s design advice is to try the simplest thing that might work. If that doesn’t work, try the next simplest thing that might work. Note the word “might.”

We all like simplicity in theory, and we may think we’re following Cunningham’s advice when we’re not. Instead, we try the simplest thing that we’re pretty sure will work. Solutions usually get more complex as they’re fleshed out, so we miss out on simple solutions by starting from an idea that is too complex to begin with.

Once you have a simple idea that might work, you have to protect it. Simple solutions are magnets for complexity. People immediately suggest “improvements.” As design guru Donald Norman says “The hardest part of design … is keeping features out.”

Multiple comparisons

Multiple comparisons present a conundrum in classical statistics. The options seem to be:

  1. do nothing and tolerate a high false positive rate
  2. be extremely conservative and tolerate a high false negative rate
  3. do something ad hoc between the extremes

A new paper by Andrew Gelman, Jennifer Hill, and Masanao Yajima opens with “The problem of multiple comparisons can disappear when viewed from a Bayesian perspective.” I would clarify that the resolution comes not from the Bayesian perspective per se but from the Bayesian hierarchical perspective.

See this blog post for a link to the article “Why we (usually) don’t have to worry about multiple comparisons” and to a presentation by the same title.

Click to learn more about Bayesian statistics consulting

What's better about small companies?

Popular business writers often say flat organizations are better than hierarchical organizations, and small businesses are better than big businesses. By “better” they usually mean more creative, nimble, fun, and ultimately profitable. But they don’t often try to explain why small and flat is better than big and hierarchical. They support their argument with examples of big sluggish companies and small agile companies, but that’s as far as they go.

Paul Graham posted a new essay called You Weren’t Meant to Have a Boss in which he also argues for small and flat over big and hierarchical. However, his line of reasoning is fresh. I haven’t decided what I think of his points, but as usual his writing is creative and thought-provoking.

Update: See Jeff Atwood’s comments, Paul Graham’s Participatory Narcissism.

Simple unit tests

After you’ve read a few books or articles on unit testing, the advice becomes repetitive. But today I heard someone who had a few new things to say. Gerard Meszaros made these points in an interview on the OOPSLA 2007 podcast, Episode 11.

Test code should be much simpler than production code for three reasons.

  1. Unit tests should not contain branching logic. Each test should test one path through the production code. If a unit test has branching logic, it’s doing too much, attempting to test more than one path.
  2. Unit tests are the safety net for changes to production code, but there is no such safety net for the tests themselves. Therefore tests should be written simply the first time rather simplified later through refactoring.
  3. Unit tests are not subject to the same constraints as production code. They can be slow, and they only have to work in isolation. Brute force is more acceptable in tests than in production code.

(Meszaros made points 1 and 2 directly. Point 3 is my interpolation.)

A well-tested project will have at least as much test code as production code. The immediate conclusion too many people draw is that therefore unit testing doubles the cost of a project.  One reason this is not true is that test code is easier to write than production code for the reasons listed above. Or rather, test code can be easier to write, if the project uses test-driven development. Retrofitting tests to code that wasn’t designed to be testable is hard work indeed.

Plausible reasoning

If Socrates is probably a man, he’s probably mortal.

How do you extend classical logic to reason with uncertain propositions, such as the statement above? Suppose we agree to represent degrees of plausibility with real numbers, larger numbers indicating greater plausibility. If we also agree to a few axioms to quantify what we mean by consistency and common sense, there is a unique system that satisfies the axioms. The derivation is tedious and not well suited to a blog posting, so I’ll cut to the chase: given certain axioms, the inevitable system for plausible reasoning is probability theory.

There are two important implications of this result. First, it is possible to develop probability theory with no reference to sets. This renders much of the controversy about the interpretation of probability moot. Instead of arguing about what a probability can and cannot represent, one could concede the point. “We won’t use probabilities to represent uncertain information. We’ll use ‘plausibilities’ instead, derived from rules of common sense reasoning. And by the way, the resulting theory is identical to probability theory.”

The other important implication is that all other systems of plausible reasoning — fuzzy logic, neural networks, artificial intelligence, etc. — must either lead to the same conclusions as probability theory, or violate one of the axioms used to derive probability theory.

See the first two chapters of Probability Theory by E. T. Jaynes (ISBN 0521592712) for a full development. It’s interesting to note that the seminal paper in this area came out over 60 years ago. (Richard Cox, “Probability, frequency, and reasonable expectation”, 1946.)

Second homes

Designing good surveys is hard work. Andrew Gelman posted an example of unintended consequences in survey design yesterday. A survey question asked “How many people do you know who have a second home?” Apparently some respondents thought the question was asking about folks who own vacation homes while others thought the question referred to immigrants.