Laws of large numbers and small numbers

In case my previous note on the law of small numbers confused anyone, I’ll compare it to the law of large numbers.

The law of large numbers is a mathematical theorem; the law of small numbers is an observation about human psychology.

The name “law of large numbers” is a standard term applied to a theorem about the convergence of random variables. (OK, actually two theorems. Like nuclear forces, the laws of large numbers comes in a strong and a weak form.)

The name “law of small numbers” is a pun, and I don’t believe the term is commonly used. Too bad. It’s a convenient label for a common phenomena.

Related post: Law of medium numbers

The law of small numbers

The book Judgment under uncertainty analyzes common fallacies in how people estimate probabilities. The book asserts that no one has good intuition about probability. Statisticians do better than the general public, not because their intuition is much better, but because they know not to trust their intuition; they know they need to rely on calculations.

One of the common fallacies listed in the book is the law of small numbers. In general, people grossly underestimate the variability in small samples. This phenomenon comes up all the time. It’s good to know someone has given it a name.

Related posts

The excitement of not knowing what you’re doing

The following excerpt is a quote from Edsgar Dijkstra.

I think it was in 1970 that I gave my first talk in a foreign country on the design of programs that you could actually control and prove were correct. … The talk fell completely on its face. … The programmers didn’t like the idea at all because it deprived them of the intellectual excitement of not quite understanding what they were doing. They liked the challenge of chasing the bugs.

Taken from Out of their Minds: The Lives and Discoveries of 15 Great Computer Scientists. Emphasis added.

C# verbatim strings vs. PowerShell here-strings

C# verbatim strings and PowerShell here-strings have just enough in common to be confusing. The differences are summarized here.

C# verbatim strings PowerShell here-strings
May contain line breaks Must contain line breaks
Only double quote variety Single and double quote varieties
Begins with @” Begins with @” (or @’) plus a line break
Ends with “ Ends with a line break followed by “@ (or ‘@)
Cannot contain un-escaped double quotes May contain quotes
Turns off C# escape sequences @’ turns off PowerShell escape sequences but @” does not

Selection bias and bombers

B17 stratofortress

During WWII, statistician Abraham Wald was asked to help the British decide where to add armor to their bombers. After analyzing the records, he recommended adding more armor to the places where there was no damage!

This seems backward at first, but Wald realized his data came from bombers that survived. That is, the British were only able to analyze the bombers that returned to England; those that were shot down over enemy territory were not part of their sample. These bombers’ wounds showed where they could afford to be hit. Said another way, the undamaged areas on the survivors showed where the lost planes must have been hit because the planes hit in those areas did not return from their missions.

Wald assumed that the bullets were fired randomly, that no one could accurately aim for a particular part of the bomber. Instead they aimed in the general direction of the plane and sometimes got lucky. So, for example, if Wald saw that more bombers in his sample had bullet holes in the middle of the wings, he did not conclude that Nazis liked to aim for the middle of wings. He assumed that there must have been about as many bombers with bullet holes in every other part of the plane but that those with holes elsewhere were not part of his sample because they had been shot down.

Repairing tumors

Imagine this conversation with your doctor:

Your poor tumor. It has a chaotic blood supply. Parts of it get too much blood, other parts too little. We’re going to give you a drug to improve your tumor’s blood supply, making it healthier.

Before you run screaming from your doctor’s office, see if there’s a copy of the January 2008 issue of Scientific American in the waiting room. If there is, read the article Taming Vessels to Treat Cancer by Rakesh Jain.

Just as the cells in a tumor are abnormal and growing out of control, so are the blood vessels that feed the tumor. This lack of proper infrastructure inhibits the tumor’s growth, but it also makes it difficult to deliver chemotherapy to the tumor. This lead to the radical idea to make the tumors healthier in preparation for killing them.

So how would you go about improving a tumor’s circulatory system? By administering a drug that was designed to attack tumor vessels!

A new class of cancer drugs, antiangiogenic agents, has been designed to attack tumors by cutting off their blood supply. These agents haven’t been a complete success. Experience with one such agent, Avastin, shows that while it shuts down some of the blood vessels in tumors, it may make the remaining tumor vessels healthier. That’s bad news if you’re treating patients with Avastin alone. But when used in combination with chemotherapy, it’s just what people like Dr. Jain were looking for: a way to normalize the blood flow in a tumor in order to make it more vulnerable to chemotherapy.

More information, including videos, is available at the website of Dr. Jain’s lab.

Related: Adaptive clinical trial design

Interesting is better than perfect

Seth Godin has an interesting blog post today called The problem with perfect. Companies with a reputation for perfect service are only remarkable when they disappoint. Being interesting is a more viable business strategy than being perfect.

Thick tails

Bart Kosko in his book Noise argues that thick-tailed probability distributions such as the Cauchy distribution are common in nature. This is the opposite of what I was taught in college. I remember being told that the Cauchy distribution, a distribution with no mean or variance, is a mathematical curiosity more useful for constructing academic counterexamples than for modeling the real world. Kosko disagrees. He writes

… all too many scientists simply do not know that there are infinitely many different types of bell curves. So they do not look for these bell curves and thus they do not statistically test for them. The deeper problem stems from the pedagogical fact that thick-tailed bell curves get little or no attention in the basic probability texts that we still use to train scientists and engineers. Statistics books for medicine and the social sciences tend to be even worse.

We see thin-tailed distributions everywhere because we don’t think to look for anything else. If we see samples drawn from a thick-tailed distribution, we may throw out the “outliers” before we analyze the data, and then a thin-tailed model fits just fine.

How do you decide what’s an outlier? Two options. You could use your intuition and discard samples that “obviously” don’t belong, or you could use a formal test. But your intuition may implicitly be informed by experience with thin-tailed distributions, and your formal test may also circularly depend on the assumption of a thin-tailed model.

Quick TeX to graphic utility

Here’s a website where you can type in some TeX code, click a button, and get back a GIF with a transparent background. Handy for pasting equations into HTML.

http://www.artofproblemsolving.com/LaTeX/AoPS_L_TeXer.php

For example:

Gaussian integral

Update (December 2014): The site mentioned in this post doesn’t seem to exist any more. Some alternatives:

Coping with exponential growth

Everything is supposedly growing exponentially. But when most people say “exponential,” they don’t mean what they say. They mean “fast.” Exponential growth can indeed be fast. Or it can be slow. Excruciatingly slow.

If you earn a million dollars a day, your wealth is growing quickly, but not exponentially. And if you have $100 in the bank earning 3% compound interest, your money is growing slowly, but it is growing exponentially.

Linear growth is a constant amount of increase per unit of time. Exponential growth is a constant percentage increase per unit of time. If you buy a pack of baseball cards every Friday, the size of your baseball card collection will grow linearly. But if you breed rabbits with no restriction, the size of your bunny heard will grow exponentially.

It matters a great deal whether you’re growing linearly or exponentially.

When you start a new venture it may truly grow exponentially. Growth may be determined by word of mouth, which is exponential (at first). The number of new people who hear each month depends on the number people who talk, and hearers become talkers. But that process can be infuriatingly slow when it’s just getting started. If the number of visitors to your website is growing 5% per month, that’s great in the long term, but disappointing at first when it means going from 40 visitors one month to 42 the next.

How do you live on an exponential curve? You need extraordinary patience. While any exponential curve will eventually pass any linear curve, it may take a long time. If you’re making barely perceptible but compounding progress, be encouraged that you’re on the right curve. Eventually you’ll have all the growth you can handle. Realize that you may be having a harder time initially because you’re on the exponential curve rather than the linear curve.

How do you know whether you’re on an exponential curve? This is not as easy as it sounds. Because of random noise, it may be hard to tell from a small amount of data whether growth is linear or exponential, or even to tell growth from stagnation. Eventually the numbers will tell you. But until enough data come in to reveal what’s going on, look at the root causes of your growth. If you’re growing because customers are referring customers, that’s a recipe for exponential growth. If you’re growing because you’re working more hours, that’s linear growth.

Nothing grows exponentially forever. Word of mouth slows down when the message reaches saturation, when the talkers run into fewer people who haven’t heard. Rabbit farms slow down when they can’t feed all the rabbits. Most of the things we call exponential growth are more accurately logistic growth: exponential growth slows to linear growth, then linear growth begins to plateau.

How do you live on a logistic curve? Realize that initial exponential growth doesn’t last. Watch the numbers. They’ll tell you when you’ve gone from approximately exponential to approximately linear. Understand the mechanisms that turn exponential into logic growth in your context.