How do you justify that distribution?

Someone asked me yesterday how people justify probability distribution assumptions. Sometimes the most mystifying assumption is the first one: “Assume X is normally distributed …” Here are a few answers.

  1. Sometimes distribution assumptions are not justified.
  2. Sometimes distributions can be derived from fundamental principles. For example, there are axioms that uniquely specify a Poisson distribution.
  3. Sometimes distributions are justified on theoretical grounds. For example, large samples and the central limit theorem together may justify assuming that something is normally distributed.
  4. Often the choice of distribution is somewhat arbitrary, chosen by intuition or for convenience, and then empirically shown to work well enough.
  5. Sometimes a distribution can be a bad fit and still work well, depending on what you’re asking of it.

The last point is particularly interesting. It’s not hard to imagine that a poor fit would produce poor results. It’s surprising when a poor fit produces good results. Here’s an example of the latter.

Suppose you are testing a new drug and hoping that it improves how long patients live. You want to stop the clinical trial early if it looks like patients are living no longer than they would have on standard treatment. There is a Bayesian method for monitoring such experiments that assumes survival times have an exponential distribution. But survival times are not exponentially distributed, not even close.

The method works well because of the question being asked. The method is not being asked to accurately model the distribution of survival times for patients in the trial. It is only being asked to determine whether a trial should continue or stop, and it does a good job of doing so. Simulations show that the method makes the right decision with high probability, even when the actual survival times are not exponentially distributed.

More probability posts

Accuracy versus perceived accuracy

Commercial weather forecasters need to be accurate, but they also need to be perceived as being accurate, and sometimes the latter trumps the former.

For instance, the for-profit weather forecasters rarely predict exactly a 50% chance of rain, which might seem wishy-washy and indecisive to customers. Instead, they’ll flip a coin and round up to 60, or down to 40, even though this makes the forecasts both less accurate and less honest.

Forecasters also exaggerate small chances of rain, such as reporting 20% when they predict 5%.

People notice one type of mistake—the failure to predict rain—more than another kind, false alarms. If it rains when it isn’t supposed to, they curse the weatherman for ruining their picnic, whereas an unexpectedly sunny day is taken as a serendipitous bonus.

From The Signal and the Noise. The book gets some of its data from Eric Floehr of ForecastWatch. Read my interview with Eric here.

Robustness of simple rules

In his speech The dog and the Frisbee, Andrew Haldane argues that simple models often outperform complex models in complex situations. He cites as examples sports prediction, diagnosing heart attacks, locating serial criminals, picking stocks, and  understanding spending patterns. The gist of his argument is this:

Complex environments often instead call for simple decision rules. That is because these rules are more robust to ignorance.

And yet behind every complex set of rules is a paper showing that it outperforms simple rules, under conditions of its author’s choosing. That is, the person proposing the complex model picks the scenarios for comparison. Unfortunately, the world throws at us scenarios not of our choosing. Simpler methods may perform better when model assumptions are violated. And model assumptions are always violated, at least to some extent.

Related posts

Working to change the world

I recently read that Google co-founder Sergey Brin asked an audience whether they are working to change the world. He said that for 99.9999% of humanity, the answer is no.

I really dislike that question. It invites arrogance. Say yes and you’re one in a million. You’re a better person than the vast majority of humanity.

Focusing on doing enormous good can make us feel justified in neglecting small acts of goodness. Many have professed a love for Humanity and shown contempt for individual humans. “I’m trying to end poverty, cure cancer, and make the world safe for democracy; I shouldn’t be held to same petty standards as those who are wasting their lives.”

To paraphrase Thomas Sowell, we should judge people by their means, not their ends, because most people don’t achieve their ends and all we’re left with is their means [1].

In context Brin implies that only grand technological innovation is worthwhile, obviously a rather narrow perspective. Did Anne Frank make the world a better place by keeping a diary? I think so.

The opposite of a technologist might be a medieval literature professor. If you wanted to “change the world” the last thing you’d do might be to choose a career medieval scholarship. And yet two of the most influential people of the 20th century—C. S. Lewis and J. R. R. Tolkien—were medieval literature professors.

It’s very hard to know what kind of impact you’re going to have in the world. The surest way to do great good is to focus first on doing good.

Related post: Here’s to the sane ones

* * *

[1] I think Thomas Sowell said something like this in the context of organizations rather than individuals, but I can’t find the quote.

The paper is too big

In response to the question “Why are default LaTeX margins so big?” Paul Stanley answers

It’s not that the margins are too wide. It’s that the paper is too big!

This sounds flippant, but he gives a compelling argument that paper really is too big for how it is now used.

As is surely by now well-known, the real question is the size of the text block. That is a really important factor in legibility. As others have noted, the optimum line length is broadly somewhere between 60 characters and 75 characters.

Given reasonable sizes of font which are comfortable for reading at the distance we want to read at (roughly 9 to 12 point), there are only so many line lengths that make sense. If you take a book off your shelf, especially a book that you would actually read for a prolonged period of time, and compare it to a LaTeX document in one of the standard classes, you’ll probably notice that the line length is pretty similar.

The real problem is with paper size. As it happens, we have ended up with paper sizes that were never designed or adapted for printing with 10-12 point proportionally spaced type. They were designed for handwriting (which is usually much bigger) or for typewriters. Typewriters produced 10 or 12 characters per inch: so on (say) 8.5 inch wide paper, with 1 inch margins, you had 6.5 inches of type, giving … around 65 to 78 characters: in other words something pretty close to ideal. But if you type in a standard proportionally spaced font (worse, in Times—which is rather condensed because it was designed to be used in narrow columns) at 12 point, you will get about 90 to 100 characters in the line.

He then gives six suggestions for what to do about this. You can see his answer for a full explanation. Here I’ll just summarize his points.

  1. Use smaller paper.
  2. Use long lines of text but extra space between lines.
  3. Use wide margins.
  4. Use margins for notes and illustrations.
  5. Use a two column format.
  6. Use large type.

Given these options, wide margins (as in #3 and #4) sound reasonable.

Author’s note

Here’s a great disclaimer from an article on MapReduce:

I wrote this essay specifically to be controversial. The views expressed herein are more extreme than what I believe personally, written primarily for the purposes of provoking discussion. If after reading this essay you have a strong reaction, then I’ve accomplished my goal :)

ABC vs FLT

There’s been a lot of buzz lately about Shinichi Mochizuki’s proposed proof of the ABC conjecture, a conjecture in number theory named after the variables used to state it. Rather than explaining the conjecture here, I recommend a blog post by Brian Hayes.

The ABC conjecture has been compared to Fermat’s Last Theorem (FLT). Both are famous number theory problems, fairly easy to state but notoriously hard to prove. (FLT is easy to state. ABC takes a little more work, but is accessible to a patient teenager.) And both have been proved recently, assuming the ABC proof holds up. But here are three contrasts between ABC and FLT.

  1. FLT was proposed in 1637, proved in 1995. ABC was proposed in 1985, possibly proved in 2012.
  2. The conclusion of FLT is not that important, but the proof is very important. The conclusion of ABC is important, and nobody knows about the proof yet.
  3. The FLT proof established deep connections between widely-known areas of math. The proof of ABC is comparatively self-contained, relying on a new specialized area of math few understand.

I’m not an expert in number theory, not by a long shot, but I don’t believe many proofs cite Fermat’s Last Theorem per se. Instead, there are proofs that depend on the more abstract results that Wiles proved, results that imply FLT as a corollary. And long before Wiles, a tremendous amount of math was motivated by attempts to prove FLT.

The ABC conjecture is a more technical statement, as its name might imply. The conjecture itself has wide-ranging applications, if it is true. The proof may also be important, but apparently nobody knows yet. The proof of FLT brought together a lot of existing machinery, but the ABC proof created a lot of new machinery that few understand.

Related posts

True versus Publishable

This weekend John Myles White and I discussed true versus publishable results in the comments to an earlier post. Methods that make stronger modeling assumptions lead to more statistical confidence, but less actual confidence. That is, they are more likely to produce positive results, but less likely to produce correct results.

JDC: If some scientists were more candid, they’d say “I don’t care whether my results are true, I care whether they’re publishable. So I need my p-value less than 0.05. Make as strong assumptions as you have to.”

JMW: My sense of statistical education in the sciences is basically Upton Sinclair’s view of the Gilded Age: “It is difficult to get a man to understand something when his salary depends upon his not understanding it.”

Perhaps I should have said that scientists know that their conclusions are true. They just need the statistics to confirm what they know.

Brian Nosek talks about this theme on the EconTalk podcast. He discusses the conflict of interest between creating publishable results and trying to find out what is actually true. However, he doesn’t just grouse about the problem; he offers specific suggestions for how to improve scientific publishing.

Related post: More theoretical power, less real power

Mental indigestion

From The Future Does Not Compute:

The critical law at work here is that whatever I take in without having fully digested it — whatever I receive in less than full consciousness — does not therefore lose its ability to act on me. It simply acts from beyond the margin of my awareness. … To open myself inattentively to a chaotic world, superficially taking in “one damned thing after another,” is to guarantee a haphazard behavior controlled by that world rather than by my own, wide-awake choices.