Mathematical tools such as Bayesian analysis and differential equations allow you to combine your intuition and data to make better decisions. With mathematical models you can answer questions that would be expensive or impossible to address directly.
For over twenty years, I have created and implemented mathematical models to solve problems in business, science, and engineering. Some areas of application include risk assessment, adaptive clinical trial design, computer hardware reliability, and software optimization.
Projects can founder because no one knows each of the pieces well enough to bring them all together. The people working on different aspects of the project, such as engineers and managers, don't speak the same language and so their deliverables don't fit together.
My role on projects has often been to be the interpreter and integrator. I bring different areas of math together to solve problems. I bring math and software together to implement solutions, And I bring people together by interpreting between scientists, software developers, and business leaders.
Technical skills are often wasted because vital information is not meaningfully conveyed to decision makers. I can speak the native language of scientists and engineers, and also communicate technical information to a wider, non-technical audience.
For example, I have helped lawyers understand and convey probability. I have helped salesmen understand what scientific articles are saying about their product and how they can convey that information to customers. I have helped business understand and mitigate risks.
College courses often begin by trying to weaken your confidence in common sense. For example, a psychology course might start by presenting optical illusions to show that there are limits to your ability to perceive the world accurately. I’ve seen at least one physics textbook that also starts with optical illusions to emphasize the need for measurement. Optical illusions, however, take considerable skill to create. The fact that they are so contrived illustrates that your perception of the world is actually pretty good in ordinary circumstances.
For several years I’ve thought about the interplay of statistics and common sense. Probability is more abstract than physical properties like length or color, and so common sense is more often misguided in the context of probability than in visual perception. In probability and statistics, the analogs of optical illusions are usually called paradoxes: St. Petersburg paradox, Simpson’s paradox, Lindley’s paradox, etc. These paradoxes show that common sense can be seriously wrong, without having to consider contrived examples. Instances of Simpson’s paradox, for example, pop up regularly in application.
Some physicists say that you should always have an order-of-magnitude idea of what a result will be before you calculate it. This implies a belief that such estimates are usually possible, and that they provide a sanity check for calculations. And that’s true in physics, at least in mechanics. In probability, however, it is quite common for even an expert’s intuition to be way off. Calculations are more likely to find errors in common sense than the other way around.
Nevertheless, common sense is vitally important in statistics. Attempts to minimize the need for common sense can lead to nonsense. You need common sense to formulate a statistical model and to interpret inferences from that model. Statistics is a layer of exact calculation sandwiched between necessarily subjective formulation and interpretation. Even though common sense can go badly wrong with probability, it can also do quite well in some contexts. Common sense is necessary to map probability theory to applications and to evaluate how well that map works.
The other day I was driving by our veterinarian’s office and saw that the marquee said something like “Prevention is less expensive than treatment.” That’s sometimes true, but certainly not always.
This evening I ran across a couple lines from Ed Catmull that are more accurate than the vet’s quote.
Do not fall for the illusion that by preventing errors, you won’t have errors to fix. The truth is, the cost of preventing errors is often far greater than the cost of fixing them.
From Creativity, Inc.
Let xn be a sequence of non-negative numbers. Then the sum of their running geometric means is bounded by e times their sum. In symbols
The inequality is strict unless all the x‘s are zero, and the constant e on the right side is optimal. Torsten Carleman proved this theorem in 1923.
Watching the news gives you an inverted sense of risk.
We fear bad things that we’ve seen on the news because they make a powerful emotional impression. But the things rare enough to be newsworthy are precisely the things we should not fear. Conversely, the risks we should be concerned about are the ones that happen too frequently to make the news.
I asked on Twitter today “What steep learning curves do you wish you’d climbed sooner?” Here’s a summary of the replies:
About three years ago JD Long said
I like the term “Data Scientist” for now. I expect that term will be meaningless in 5 years.
Sounds about right.
John Tukey said that the best thing about being a statistician is that you get to play in everyone’s backyard. This morning I got to play in IsoTherapeutics‘ backyard. The most photogenic thing on the tour they gave me was their box for working with highly radioactive material with robotic arms. (There was nothing hot inside at the time.)
At some point in the past, computer time was more valuable than human time. The balance changed long ago. While everyone agrees that human time is more costly than computer time, it’s hard to appreciate just how much more costly.
You can rent time on a virtual machine for around $0.05 per CPU-hour. You could pay more or less depending on on-demand vs reserved, Linux vs Windows, etc.
Suppose the total cost of hiring someone — salary, benefits, office space, equipment, insurance liability, etc. — is twice their wage. This implies that a minimum wage worker in the US costs as much as 300 CPUs.
This also implies that programmer time is three orders of magnitude more costly than CPU time. It’s hard to imagine such a difference. If you think, for example, that it’s worth minutes of programmer time to save hours of CPU time, you’re grossly under-valuing programmer time. It’s worth seconds of programmer time to save hours of CPU time.
I will be giving a talk “Bayesian statistics as a way to integrate intuition and data” at KeenCon, September 11, 2014 in San Francisco.
Update: Use promo code KeenCon-JohnCook to get 75% off registration.
I’ve seen exhortations to think like Leonardo da Vinci or Albert Einstein, but these leave me cold. I can’t imagine thinking like either of these men. But here are a few famous people I could imagine emulating when trying to solve a problem
What would Donald Knuth do? Do a depth-first search on all technologies that might be relevant, and write a series of large, beautiful, well-written books about it all.
What would Alexander Grothendieck do? Develop a new field of mathematics that solves the problem as a trivial special case.
What would Richard Stallman do? Create a text editor so powerful that, although it doesn’t solve your problem, it does allow you to solve your problem by writing a macro and a few lines of Lisp.
What would Larry Wall do? Bang randomly on the keyboard and save the results to a file. Then write a language in which the file is a program that solves your problem.
What would you add to the list?
Last year I worked with Hitachi Data Systems to evaluate the trade-offs of replication and erasure coding as ways to increase data storage reliability while minimizing costs. This lead to a white paper that has just been published:
Compare Cost and Performance of Replication and Erasure Coding
Hitachi Review Vol. 63 (July 2014)
John D. Cook
Ab de Kwant
We shape our tools and then our tools shape us. — John M. Culkin
Discussions about technology choices seldom consider who we become by using a tool. Different tools encourage different ways of thinking. Over time, different tools lead to different habits of mind.
Three cheers for Brent Yorgey! He’s finishing up his dissertation, and he’s posting drafts online, including a GitHub repo of the source.
Cheer 1: He’s not being secretive, fearing that someone will scoop his results. There have been a few instances of one academic scooping another’s research, but these are rare and probably not worth worrying about. Besides, a public GitHub repo is a pretty good way to prove your priority.
Cheer 2: Rather than being afraid someone will find an error, he’s inviting a world-wide audience to look for errors.
Cheer 3: He’s writing a dissertation that someone might actually want to read! That’s not the fastest route to a degree. It’s even actively discouraged in some circles. But it’s generous and great experience.