The value of typing code

Tommy Nicholas recently wrote a blog post advocating typing rather than copying-and-pasting code samples. I thought this was the most interesting paragraph from the post:

When Hunter S. Thompson was working as a copy boy at Time Magazine in 1959, he spent his spare time typing out the entire Great Gatsby by F. Scott Fitzgerald and A Farewell to Arms by Ernest Hemingway in order to better understand what it feels like to write a great book. To be able to feel the author’s turns in logic and storytelling weren’t possible from reading the books alone, you had to feel what it feels like to actually create the thing. And so I have found it to be with coding.

Joe Armstrong had similar advice:

Forget about the tools … buy a decent book and type in the programs by hand. One at a time thinking as you go. After 30 years you will get the hang of this and be a good programmer.

Typing code may be like riding a bicycle. I’m surprised how much more detail I see the first time I ride my bicycle over a road I’ve driven on, mostly because I’m moving slower but also because there’s an additional muscular dimension to the experience.

Another advantage to typing example code is that you’ll make mistakes, just as you will in real work. This will give you the opportunity to see what happens and to learn debugging, even though you may not appreciate the opportunity.

The Lindy effect

The longer a technology has been around, the longer it’s likely to stay around. This is a consequence of the Lindy effect. Nassim Taleb describes this effect in Antifragile but doesn’t provide much mathematical detail. Here I’ll fill in some detail.

Taleb, following Mandelbrot, says that the lifetimes of intellectual artifacts follow a power law distribution. So assume the survival time of a particular technology is a random variable X with a Pareto distribution. That is, X has a probability density of the form

f(t) = c/tc+1

for t ≥ 1 and for some c > 0. This is called a power law because the density is proportional to a power of t.

If c > 1, the expected value of X exists and equals c/(c-1). The conditional expectation of X given that X has survived for at least time k is ck/(c-1). This says that the expected additional life X is ck/(c-1) – k = k/(c-1), and so the expected additional life of X is proportional to the amount of life seen so far. The proportionality constant 1/(c-1) depends on the power c that controls the thickness of the tails. The closer c is to 1, the longer the tail and the larger the proportionality constant. If c = 2, the proportionality constant is 1. That is, the expected additional life equals the life seen so far.

Note that this derivation computed E( X | X > k ), i.e. it only conditions on knowing that X > k. If you have additional information, such as evidence that a technology is in decline, then you need to condition on that information. But if all you know is that a technology has survived a certain amount of time, you can estimate that it will survive about that much longer.

This says that technologies have different survival patterns than people or atoms. The older a person is, the fewer expected years he has left. That is because human lifetimes follow thin-tailed distributions. Atomic decay follows a medium-tailed exponential distribution. The expected additional time to decay is independent of how long an atom has been around. But for technologies follow a thick-tailed distribution.

Another way to look at this is to say that human survival times have an increasing hazard function and atoms have a constant hazard function. The hazard function for a Pareto distribution is c/t and so decreases with time.

Update: Beethoven, Beatles, and Beyoncé: more on the Lindy effect

Small batch sizes II

A few days ago I wrote about an example from a presentation by Don Reinertsen on the benefits of small batch sizes. Nassim Taleb brings up similar ideas in Antifragile. He opens one chapter with the following rabbinical story.

A king, angry at his son, swore that he would crush him with a large stone. After he calmed down, he realized he was in trouble, as a king who breaks his oath is unfit to rule. His sage advisor came up with a solution. Have the stone cut into very small pebbles, and have the mischievous son pelted with them.

The harm done by being hit with a stone is a nonlinear function of the stone’s size. A stone half the size does less than half the harm. Cutting the stone into pebbles makes it harmless.

Related post: Appropriate scale

Most popular pages

Here are the most popular pages on my website that are not blog posts.

Programming language notes:

Probability and statistics:

Books:

Miscellaneous:

Two views of modernity

Here are a couple descriptions of modernity that I’ve run across lately and found interesting.

First, from Eva Brann:

Now what is actually meant by “modern times?” The term cannot just mean “contemporary” because all times are con-temporary with themselves. Modern is a Latin word which means “just now.” Modern times are the times which are in a special way “just now!” Modernity is just-nowness, up-to-date-ness.

… We live differently in our time from the way those who came before us lived in theirs. For instance, when we speak of something or even someone as being “up to date” we are implying that what time it is, is significant, that time marches, or races, on by itself, and we have the task of keeping up with it. Our time is not a comfortable natural niche within the cycle of centuries, but a fast sliding rug being pulled out from under us.

Furthermore, we have a sense of the extraordinariness of our times … Modernity itself is, apparently, a way of charging the Now with special significance.

Second, from Nassim Taleb:

Modernity corresponds to the systematic extraction of humans from their randomness-laden ecology. … It is rather the spirit of an age marked by rationalization (naive rationalism), the idea that society is understandable, hence must be designed, by humans. With it was born statistical theory, hence the beastly bell curve. So was linear science. So was the notion of “efficiency” — or optimization.

Modernity is a Procrustean bed, good or bad — a reduction of humans to what appears to be efficient and useful. Some aspects of it work: Procrustean beds are not all negative reductions. Some may be beneficial, those these are rare.

Water spirals in the southern hemisphere

Urban legend has it that the earth’s rotation causes toilets to drain in the opposite direction in the southern hemisphere. Several people have asked me whether this is true. No, the direction the water in a toilet swirls is determined by the angle of the jets that direct water into the bowl.

What about bathtubs? Do they drain opposite directions in each hemisphere? When you pull the plug in a bathtub, there are no jets directing the flow of the water. However, the water will drain clockwise about half the time in either hemisphere. Here’s an explanation from my post Two myths I learned in college:

The Coriolis effect does explain why cyclones rotate one way in the northern hemisphere and the opposite way in the southern hemisphere. The rotation of the earth influences the rotation of large bodies of fluid, like weather systems. However, a bathtub would need to be maybe a thousand miles in diameter before the Coriolis effect would determine how it drains. Bathtubs drain clockwise and counterclockwise in both hemispheres. Random forces such as sound in the air have more influence than the Coriolis effect on such small bodies of water.

Three new Python books

This post reviews three Python books that have come out recently:

SciPy and NumPy (ISBN 1449305466) by Eli Bressert is the smallest book I’ve seen from O’Reilly, aside from books in their pocket guide series. The SciPy and NumPy libraries are huge, and it can be hard to know where to start. This book gives a good, brisk overview.  In addition to SciPy and NumPy, the it also gives a brief introduction to SciKit, in particular scikit-learn for machine learning and scikit-image for image processing.

(Eli told me that he is working on supplementary material for the book. Everyone who bought the book electronically will automatically receive the new material when it is available.)

Python for Kids (ISBN 1593274076) by Jason R. Briggs is an introduction to programming aimed at kids. It starts with an introduction to Python and moves to developing a simple game. It seems to me that kids would find the book interesting. It’s about seven times longer than the SciPy and NumPy book. It moves at a slow pace, has many illustrations, and has a casual tone.

NumPy Cookbook by Ival Idris contains around 70 small recipes, about three pages each. Many of these are about NumPy itself, but the book covers much more than its title would imply. Out of 10 chapters, four are strictly about NumPy. The first chapter of the book is about IPython. Another chapter is about “connecting NumPy with the rest of the world,” i.e. interfacing with Java, R, Matlab, and cloud services. Two chapters are devoted to profiling, debugging, and optimizing performance. There is a chapter on quality assurance (static analysis, unit testing, and documentation). And the final chapter is about Scikits and Pandas.

Sleeper theorems

I’m using the term “sleeper” here for a theorem that is far more important than it seems, something that you may not appreciate for years after you first see it.

Bayes’ theorem

The first such theorem that comes to mind is Bayes’ theorem. I remember being unsettled by this theorem when I took my first probability course. I found it easy to prove but hard to understand. I couldn’t decide whether it was trivial or profound. Then years later I found myself using Bayes theorem routinely.

The key insight of Bayes theorem is that it gives you a way to turn probabilities around. That is, it lets you compute the probability of A given B from the probability of B given A. That may not seem so important, but it’s vital in application. It’s often easy to compute the probability of data given an hypothesis, but we need to know the probability of an hypothesis given data. Those unfamiliar with Bayes theorem often get probabilities backward.

Jensen’s inequality

Another sleeper theorem is Jensen’s inequality: If φ is a convex function and X is a random variable, φ( E(X) ) ≤ E( φ(X) ). In words, φ at the expected value of X is less than the expected value of φ of X. Like Bayes’ theorem, it’s a way of turning things around. If the convex function φ represents your gain from some investment, Jensen’s inequality says that randomness is good for you; variability in X is to your advantage on average. But if φ is concave, variability works against you.

Sam Savage’s book The Flaw of Averages is all about the difference between φ( E(X) ) and E( φ(X) ). When φ is linear, they’re equal. But in general they’re different and there’s not much you can say about the relation of the two. However, when φ is convex or concave, you can say what the direction of the difference is.

I’ve just started reading Nassim Taleb’s new book Antifragile, and it seems to be an extended meditation on Jensen’s inequality. Systems with concave returns are fragile; they are harmed by variability. Systems with convex returns are antifragile; they benefit from variability.

Other examples

What are some more examples of sleeper theorems?

Related posts

Top down, bottom up

Toward the end of his presentation Don’t fear the Monad, Brian Beckman makes an interesting observation. He says that early in the history of programming, languages split into two categories: those that start from the machine and add layers of abstraction, and those that start from mathematics and work their way down to the machine.

These two branches are roughly the descendants of Fortran and Lisp respectively. Or more theoretically, these are descendants of the Turing machine and the lambda calculus. By this classification, you could call C# a bottom-up language and Haskell a top-down language.

Programmers tend to write software in the opposite direction of their language’s history. That is, people using bottom-up languages tend to write their software top-down. And people using top-down languages tend to write their software bottom-up.

Programmers using bottom-up languages tend to approach software development analytically, breaking problems into smaller and smaller pieces. Programmers using top-down languages tend to build software synthetically. Lisp programmers in particular are fond of saying they grow the language up toward the problem being solved.

You can write software bottom-up in a bottom-up language or top-down in a top-down language. Some people do. Also, the ideas of bottom-up and top-down are not absolutes. Software development (and language design) is some mixture of both approaches. Still, as a sweeping generalization, I’d say that people tend to develop software top-down in bottom-up languages and bottom-up in top-down languages.

The main idea of Brian Beckman’s video is that algebraic structures like monoids and monads inspire programming models designed make composition easier. This could explain why top-down languages enable or even encourage bottom-up development. It’s not as clear to me why bottom-up languages lead to top-down development.

Related post: Functional in the small, OO in the large