College courses often begin by trying to weaken your confidence in common sense. For example, a psychology course might start by presenting optical illusions to show that there are limits to your ability to perceive the world accurately. I’ve seen at least one physics textbook that also starts with optical illusions to emphasize the need for measurement. Optical illusions, however, take considerable skill to create. The fact that they are so contrived illustrates that your perception of the world is actually pretty good in ordinary circumstances.
For several years I’ve thought about the interplay of statistics and common sense. Probability is more abstract than physical properties like length or color, and so common sense is more often misguided in the context of probability than in visual perception. In probability and statistics, the analogs of optical illusions are usually called paradoxes: St. Petersburg paradox, Simpson’s paradox, Lindley’s paradox, etc. These paradoxes show that common sense can be seriously wrong, without having to consider contrived examples. Instances of Simpson’s paradox, for example, pop up regularly in application.
Some physicists say that you should always have an order-of-magnitude idea of what a result will be before you calculate it. This implies a belief that such estimates are usually possible, and that they provide a sanity check for calculations. And that’s true in physics, at least in mechanics. In probability, however, it is quite common for even an expert’s intuition to be way off. Calculations are more likely to find errors in common sense than the other way around.
Nevertheless, common sense is vitally important in statistics. Attempts to minimize the need for common sense can lead to nonsense. You need common sense to formulate a statistical model and to interpret inferences from that model. Statistics is a layer of exact calculation sandwiched between necessarily subjective formulation and interpretation. Even though common sense can go badly wrong with probability, it can also do quite well in some contexts. Common sense is necessary to map probability theory to applications and to evaluate how well that map works.
The other day I was driving by our veterinarian’s office and saw that the marquee said something like “Prevention is less expensive than treatment.” That’s sometimes true, but certainly not always.
This evening I ran across a couple lines from Ed Catmull that are more accurate than the vet’s quote.
Do not fall for the illusion that by preventing errors, you won’t have errors to fix. The truth is, the cost of preventing errors is often far greater than the cost of fixing them.
From Creativity, Inc.
Let xn be a sequence of non-negative numbers. Then the sum of their running geometric means is bounded by e times their sum. In symbols
The inequality is strict unless all the x‘s are zero, and the constant e on the right side is optimal. Torsten Carleman proved this theorem in 1923.
Watching the news gives you an inverted sense of risk.
We fear bad things that we’ve seen on the news because they make a powerful emotional impression. But the things rare enough to be newsworthy are precisely the things we should not fear. Conversely, the risks we should be concerned about are the ones that happen too frequently to make the news.
I asked on Twitter today “What steep learning curves do you wish you’d climbed sooner?” Here’s a summary of the replies:
John Tukey said that the best thing about being a statistician is that you get to play in everyone’s backyard. This morning I got to play in IsoTherapeutics‘ backyard. The most photogenic thing on the tour they gave me was their box for working with highly radioactive material with robotic arms. (There was nothing hot inside at the time.)
At some point in the past, computer time was more valuable than human time. The balance changed long ago. While everyone agrees that human time is more costly than computer time, it’s hard to appreciate just how much more costly.
You can rent time on a virtual machine for around $0.05 per CPU-hour. You could pay more or less depending on on-demand vs reserved, Linux vs Windows, etc.
Suppose the total cost of hiring someone — salary, benefits, office space, equipment, insurance liability, etc. — is twice their wage. This implies that a minimum wage worker in the US costs as much as 300 CPUs.
This also implies that programmer time is three orders of magnitude more costly than CPU time. It’s hard to imagine such a difference. If you think, for example, that it’s worth minutes of programmer time to save hours of CPU time, you’re grossly under-valuing programmer time. It’s worth seconds of programmer time to save hours of CPU time.
I’ve seen exhortations to think like Leonardo da Vinci or Albert Einstein, but these leave me cold. I can’t imagine thinking like either of these men. But here are a few famous people I could imagine emulating when trying to solve a problem
What would Donald Knuth do? Do a depth-first search on all technologies that might be relevant, and write a series of large, beautiful, well-written books about it all.
What would Alexander Grothendieck do? Develop a new field of mathematics that solves the problem as a trivial special case.
What would Richard Stallman do? Create a text editor so powerful that, although it doesn’t solve your problem, it does allow you to solve your problem by writing a macro and a few lines of Lisp.
What would Larry Wall do? Bang randomly on the keyboard and save the results to a file. Then write a language in which the file is a program that solves your problem.
What would you add to the list?
Last year I worked with Hitachi Data Systems to evaluate the trade-offs of replication and erasure coding as ways to increase data storage reliability while minimizing costs. This lead to a white paper that has just been published:
Compare Cost and Performance of Replication and Erasure Coding
Hitachi Review Vol. 63 (July 2014)
John D. Cook
Ab de Kwant
We shape our tools and then our tools shape us. — John M. Culkin
Discussions about technology choices seldom consider who we become by using a tool. Different tools encourage different ways of thinking. Over time, different tools lead to different habits of mind.
Three cheers for Brent Yorgey! He’s finishing up his dissertation, and he’s posting drafts online, including a GitHub repo of the source.
Cheer 1: He’s not being secretive, fearing that someone will scoop his results. There have been a few instances of one academic scooping another’s research, but these are rare and probably not worth worrying about. Besides, a public GitHub repo is a pretty good way to prove your priority.
Cheer 2: Rather than being afraid someone will find an error, he’s inviting a world-wide audience to look for errors.
Cheer 3: He’s writing a dissertation that someone might actually want to read! That’s not the fastest route to a degree. It’s even actively discouraged in some circles. But it’s generous and great experience.