Small batch sizes

Posted on 4 December 2012 by John

Don Reinertsen gave a great keynote address at YOW 2012 entitled The Practical Science of Batch Size. I recommend watching the video when it’s posted, probably in January. In the mean time, I want to relate one small illustration from the talk. It’s a parable of why agile methods can save money.

Someone generates a two-digit number at random. You can pay $2 for a chance to guess the number, and if you are correct, you win $200. It’s a fair bet: both sides have net expected return zero.

Now let’s change the rules. Same reward, but now you pay $1 to guess the first digit and you find out whether you were right. If you were, you then have the option of spending another $1 to guess the second digit. Now you have a winning bet.

Your chances of winning are the same as before: there’s a 1% chance that you’ll win $200. But the cost of playing the game has gone down. There’s a 90% chance that you’ll spend $1 and a 10% chance you’ll spend $2. So your expected return is $2, but your expected cost is $1.10, so on average you make $0.90 every time you play the game.

Don argues that we often come out ahead by doing things in smaller batches: committing less money at a time, taking on smaller projects at a time, etc. Not that smaller batches are always better. As batch sizes decrease, holding costs decrease but transaction costs increase. You want to minimize the sum of holding costs and transaction costs. But we more often err on the side of making batches too large.

In the example above, the batch size is how much of the number we want to guess at one time: both digits or just one. By guessing the digits one at a time, we reduce our holding cost. And in this example, there are no transaction costs: you can guess half the digits for half the cost. But if there were some additional cost to playing the game — say you had to pay a tax to make a guess, the same tax whether you guess one or two digits — then it may or may not be optimal to guess the digits one at a time, depending on the size of the tax.

Books by Don Reinertsen

Statistics in Julia

Posted on 2 December 2012 by John

Over the last few days, several people have asked me about the new Julia programming language and its support for statistics. John Myles White just wrote new blog post that answers these questions: The State of Statistics in Julia.

Eight fallacies of declarative computing

Posted on 30 November 2012 by John

Erik Meijer listed eight fallacies of declarative programming in his keynote address at YOW in Melbourne this morning:

Exceptions do not exist.
Statistics are precise.
Memory is infinite.
There are no side-effects.
Schema don’t change.
There is one developer.
Compilation time is free.
The language is homogeneous.

To put these in some context, Erik made several points about declarative programming in his talk. First, “declarative” is relative. For example, if you’re an assembly programmer, C looks declarative, but if you program in some higher level language, C looks procedural. Then he argued that SQL is not as declarative as people say and that in some ways SQL is quite procedural. Finally, the fallacies listed above correspond to things that can cause a declarative abstraction to leak.

(The videos of the YOW presentations should be available in January. I haven’t heard anyone say, but I imagine the slides from the presentations will be available sooner, maybe in a few days.)

A literal black swan

Posted on 29 November 2012 by John

Nassim Taleb popularized the phrase black swan in his book by that name. Taleb uses a black swan as a metaphor for rare events with significant impact. Europeans assumed that all swans were white until explorers reached Australia and saw black swans.

I took this photo at Albert Park in Melbourne this morning.

Winston Churchill, Bessie Braddock, and Python

Posted on 29 November 2012 by John

Last night I was talking with someone about the pros and cons of various programming languages and frameworks for data analysis. One of the pros of Python is its elegance. The primary con is that it can be slow.

The conversation reminded me of an apocryphal exchange between Winston Churchill and Bessie Braddock.

Braddock: Winston, you are drunk.

Churchill: Yes I am. And you, Bessie, are ugly. But I shall be sober in the morning, and you will still be ugly.

Python can be slow, though there are ways to improve its performance. But ugly code is just ugly, and there’s nothing you can do about it.

Quantum superposition of malice and stupidity

Posted on 28 November 2012 by John

Last night, several of us at YOW were discussing professional secrets, inaccuracies and omissions that are corrected via apprenticeship but rarely in writing. We were arguing over whether these secrets were the result of conspiracy or laziness. Do people deliberately conceal information to keep the uninitiated from really knowing what’s going on, or do they wave their hands because being precise takes too much energy?

I argued for the latter, a sort of variation on Hanlon’s razor: Never attribute to malice that which is adequately explained by stupidity. In this case, I didn’t want to attribute to conspiracy what could adequately be explained by laziness. Sins of omission are more common than sins of commission.

Brian Beckman’s comment on Hanlon’s razor was that there is a sort of quantum superposition of malice and stupidity. That is, you have some indeterminate mixture of malice and stupidity (or in the context of our conversation, conspiracy and laziness) that leads to the same results. This closely resembles Grey’s law that any sufficiently advanced incompetence is indistinguishable from malice. Being a physicist, Brian used a physical metaphor. He commented later that it may be possible in retrospect to determine whether some action was malicious or stupid, collapsing a sort of wave function.

Related post: Hanlon’s razor and corporations

Equivalent form of the Riemann hypothesis

Posted on 28 November 2012 by John

The famous Riemann hypothesis is equivalent to the following not-so-famous conjecture:

For every N ≥ 100, | log( lcm(1, 2, …, N) ) – N | ≤ 2 log(N) √N.

Here “lcm” stands for “least common multiple” and “log” means natural log.

Source: Andrew Granville’s chapter on analytic number theory in Princeton Companion to Mathematics.

Water signs

Posted on 27 November 2012 by John

There are strange signs about water usage all over Melbourne. For example:

Should I be worried? The typography implies I should. But unless you’re combining your own hydrogen and oxygen atoms, it’s all water recycled?

Here’s another one.

Again, the typography implies this is a dire warning. Rainwater in use! Beware! But rainwater is usually in use. It waters plants, cleans streets, etc. It’s very useful.

From what I gather, the intention of the signs is to convey something like this:

Don’t be upset with us during a drought because you see we have thriving plants or a beautiful lawn. We’re not using municipally treated water. We’re using rainwater we’ve captured, or gray water, etc.

Shakespeare on adolescence

Posted on 27 November 2012 by John

From The Winter’s Tale

I would there were no age between ten and three-and-twenty, or that youth would sleep out the rest; for there is nothing in the between but getting wenches with child, wronging the ancientry, stealing, fighting.

Approximation relating lg, ln, and log10

Posted on 24 November 2012 by John

My previous post about logarithms has generated far more discussion than I expected. One valuable comment cites Donald Knuth’s TAOCP. While looking up the reference, I stumbled on this curiosity:

lg x ≈ ln x + log₁₀ x.

In words, log base 2 is approximately natural log plus log base 10. It’s a pleasant coincidence that there’s a simple relationship between the three most commonly used logarithms.

Knuth credits the approximation to R. W. Hamming and notes that the relative error is less than 1%. In fact, it’s easy to show that the relative error is exactly equal to

1 – (1 + 1/ln 10) ln 2 ≈ 0.0058

for all x.

Related post: The most interesting logs in the world

Year: 2012