From the category archives:

Statistics

Advantages of crude models

by John on May 25, 2011

One advantage of crude models is that we know they are crude and will not try to read too much from them. With more sophisticated models,

… there is an awful temptation to squeeze the lemon until it is dry and to present a picture of the future which through its very precision and verisimilitude carries conviction. Yet a man who uses an imaginary map, thinking it is a true one, is like to be worse off than someone with no map at all; for he will fail to inquire whenever he can, to observe every detail on his way, and to search continuously with all his senses and all his intelligence for indications of where he should go.

From Small is Beautiful by E. F. Schumacher.

Crude models are easier to implement. They may also be more robust and better descriptions of reality.

Obviously crude models are not always better. But I like to have some evidence that a complex model is worthwhile before I invest too much effort in it. And I’m well aware of forces that reward complexity for its own sake.

{ 7 comments }

Works well versus well understood

by John on May 10, 2011

While I was looking up the Tukey quote in my earlier post, I ran another of his quotes:

The test of a good procedure is how well it works, not how well it is understood.

At some level, it’s hard to argue against this. Statistical procedures operate on empirical data, so it makes sense that the procedures themselves be evaluated empirically.

But I question whether we really know that a statistical procedure works well if it isn’t well understood. Specifically, I’m skeptical of complex statistical methods whose only credentials are a handful of simulations. “We don’t have any theoretical results, buy hey, it works well in practice. Just look at the simulations.”

Every method works well on the scenarios its author publishes, almost by definition. If the method didn’t handle a scenario well, the author would publish a different scenario. Even if the author didn’t select the most flattering scenarios, he or she may simply not have considered unflattering scenarios. The latter is particularly understandable, almost inevitable.

Simulation results would have more credibility if an adversary rather than an advocate chose the scenarios. Even so, an adversary and an advocate may share the same blind spots and not explore certain situations. Unless there’s a way to argue that a set of scenarios adequately samples the space of possible inputs, it’s hard to have a great deal of confidence in a method based on simulation results alone.

Related posts:

Buggy code is biased code
Software sins of omission
Occam’s razor and Bayes’ theorem

{ 10 comments }

Move on to the next question

by John on May 9, 2011

Here’s a recent discussion from Math Overflow.

Q: I have some data points and, when I plot them on R, it looks like a normal distribution. I want to know how well my data fits the normal distribution. What kind of test should I do?

A: There’s actually a much broader question that you should be asking yourself here: does it matter whether your data really is normally distributed, or will the procedures that you’re going to perform on the data be reasonably robust in the presence of a distribution that is only approximately normal? …

The person asking the question was already satisfied that his data were approximately normal. So it was time to move on to the next question: Does what I want to do next work well for approximately normal data? (There’s no point asking whether your data is normal; it’s not. Normality is an idealization.)

We’re often tempted to add decimal places to the answer to one question instead of moving on to the next question. Maybe we don’t even realize what the next question should be. Or maybe we do know but we want stay with the familiar. In either case, this quote from John Tukey comes to mind.

An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.

Related post:

What distribution does my data have?

{ 4 comments }

Teaching Bayesian stats backward

by John on April 20, 2011

Most presentations of Bayesian statistics I’ve seen start with elementary examples of Bayes’ Theorem. And most of these use the canonical example of testing for rare diseases. But the connection between these examples and Bayesian statistics is not obvious at first. Maybe this isn’t the best approach.

What if we begin with the end in mind? Bayesian calculations produce posterior probability distributions on parameters. An effective way to teach Bayesian statistics might be to start there. Suppose we had probability distributions on our parameters. Never mind where they came from. Never mind classical objections that say you can’t do this. What if you could? If you had such distributions, what could you do with them?

For starters, point estimation and interval estimation become trivial. You could, for example, use the distribution mean as a point estimate and the area between two quantiles as an interval estimate. The distributions tell you far more than  point estimates or interval estimates could; these estimates are simply summaries of the information contained in the distributions.

It makes logical sense to start with Bayes’ Theorem since that’s the tool used to construct posterior distributions. But I think it makes pedagogical sense to start with the posterior distribution and work backward to how one would come up with such a thing.

Bayesian statistics is so named because Bayes’ Theorem is essential to its calculations. But that’s a little like classical statistics Central Limitist statistics because it relies heavily on the Central Limit Theorem.

The key idea of Bayesian statistics is to represent all uncertainty by probability distributions. That idea can be obscured by an early emphasis on calculations.

Related posts:

Interview with David Spiegelhalter
Occam’s razor and Bayes’ theorem
Four reasons to use Bayesian inference

{ 11 comments }

Significance testing and Congress

by John on April 14, 2011

The US Supreme Court’s criticism of significance testing has been in the news lately. Here’s a criticism of significance testing involving the US Congress. Consider the following syllogism.

  1. If a person is an American, he is not a member of Congress.
  2. This person is a member of Congress.
  3. Therefore he is not American.

The initial premise is false, but the reasoning is correct if we assume the initial premise is true.

The premise that Americans are never members of Congress is clearly false. But it’s almost true! The probability of an American being a member of Congress is quite small, about 535/309,000,000. So what happens if we try to salvage the syllogism above by inserting “probably” in the initial premise and conclusion?

  1. If a person is an American, he is probably not a member of Congress.
  2. This person is a member of Congress.
  3. Therefore he is probably not American.

What went wrong? The probability is backward. We want to know the probability that someone is American given he is a member of Congress, not the probability he is a member of Congress given he is American.

Science continually uses flawed reasoning analogous to the example above. We start with a “null hypothesis,” a hypothesis we seek to disprove. If our data are highly unlikely assuming this hypothesis, we reject that hypothesis.

  1. If the null hypothesis is correct, then these data are highly unlikely.
  2. These data have occurred.
  3. Therefore, the null hypothesis is highly unlikely.

Again the probability is backward. We want to know the probability of the hypothesis given the data, not the probability of the data given the hypothesis.

We can’t reject a null hypothesis just because we’ve seen data that are rare under this hypothesis. Maybe our data are even more rare under the alternative. It is rare for an American to be in Congress, but it is even more rare for someone who is not American to be in the US Congress!

I found this illustration in The Earth is Round (p < 0.05) by Jacob Cohen (1994). Cohen in turn credits Pollard and Richardson (1987) in his references.

Related posts:

How insignificant is significance testing?
Five criticisms of significance testing
Most published research results are false
Classical statistics in a nutshell

{ 8 comments }

Luis Pericchi sent me a brief note commenting on the recent US Supreme Court decision involving statistical significance and medical reporting. Here is his paper, about a page and a half.

How insignificant is statistical significance? (PDF)

Related post: Significance testing and Congress

{ 1 comment }

Saved by symmetry

by John on March 31, 2011

When I solve a problem by appealing to symmetry, students’ jaws drop. They look at me as if I’d pulled a rabbit out of a hat.

I used think of these tricks as common knowledge, but now I think they’re common knowledge in some circles (e.g. physics) and not as common in others. These tricks are simple, but not as many people as I’d thought have been trained to spot opportunities to apply them.

[click to continue...]

{ 10 comments }

A support one-liner

by John on March 15, 2011

This morning I had a fun support request related to our software. The exchange took place over email but it could have fit into a couple Twitter messages. Would that all requests could be answered so succinctly.

Question:

Do you have R code to compute P(X > Y) where X ~ gamma(ax, bx) and Y ~ gamma(ay, by)?

Response:

ineq <- function(ax, bx, ay, by) pbeta(bx/(bx+by), ay, ax)

For more on the problem and the solution, see Exact calculation of inequality probabilities.

Related links:

Inequality Calculator software
Blog posts on random inequalities

{ 4 comments }

Absence of evidence

by John on February 22, 2011

Here’s a little saying that irritates me:

Absence of evidence is not evidence of absence.

It’s the kind of thing a Sherlock Holmes-like character might say in a detective novel. The idea is that we can’t be sure something doesn’t exist just because we haven’t seen it yet.

What bothers me is that the statement misuses the word “evidence.” The statement would be correct if we substituted “proof” for “evidence.” We can’t conclude with absolute certainty that something doesn’t exist just because we haven’t yet proved that it does. But evidence is not the same as proof.

Why do we believe that dodo birds are extinct? Because no one has seen one in three centuries. That is, there is an absence of evidence that they exist. That is tantamount to evidence that they do not exist. It’s logically possible that a dodo bird is alive and well somewhere, but there is overwhelming evidence to suggest this is not the case.

Evidence can lead to the wrong conclusion. Why did scientists believe that the coelacanth was extinct? Because no one had seen one except in fossils. The species was believed to have gone extinct 65 million years ago. But in 1938 a fisherman caught one. Absence of evidence is not proof of absence.

coelacanth, a fish once thought to be extinct

Though it is not proof, absence of evidence is unusually strong evidence due to subtle statistical result. Compare the following two scenarios.

Scenario 1: You’ve sequenced the DNA of a large number prostate tumors and found that not one had a particular genetic mutation. How confident can you be that prostate tumors never have this mutation?

Scenario 2: You’ve found that 40% of prostate tumors in your sample have a particular mutation. How confident can you be that 40% of all prostate tumors have this mutation?

It turns out you can have more confidence in the first scenario than the second. If you’ve tested N subjects and not found the mutation, the length of your confidence interval around zero is proportional to N. But if you’ve tested N subjects and found the mutation in 40% of subjects, the length of your confidence interval around 0.40 is proportional to √N. So, for example, if N = 10,000 then the former interval has length on the order of 1/10,000 while the latter interval has length on the order of 1/100. This is known as the rule of three. You can find both a frequentist and a Bayesian justification of the rule here.

Absence of evidence is unusually strong evidence that something is at least rare, though it’s not proof. Sometimes you catch a coelacanth.

Related posts:

Estimating the chances of something that hasn’t happened
Complementary validation

{ 27 comments }

Like Laplace, only more so

by John on February 17, 2011

The Laplace distribution is pointy in the middle and fat in the tails relative to the normal distribution.This post is about a probability distribution that is more pointy in the middle and fatter in the tails.

[click to continue...]

{ 8 comments }

The end of hard-edged science?

by John on February 14, 2011

Bradley Efron says that science is moving away from things like predicting sunrise times and toward predicting things like the weather. The trend is away from studying precisely predictable systems, what Efron calls “hard-edged science,” and toward studying systems “where predictability is tempered by a heavy dose of randomness.”

Hard-edged science still dominates public perceptions, but the attention of modern scientists has swung heavily toward rainfall-like subjects, the kind where random behavior plays a major role. … Deterministic Newtonian science is majestic, and the basis of modern science too, but a few hundred years of it pretty much exhausted nature’s storehouse of precisely predictable events. Subjects like biology, medicine, and economics require a more flexible scientific world view, the kind we statisticians are trained to understand.

Certainly there is increased interest in systems containing “a heavy dose of randomness” but can we really say that we have “pretty much exhausted nature’s storehouse of precisely predictable effects”?

Source: Modern Science and the Bayesian-Frequentist Controversy

Related posts:

Scientific results fading over time
Occam’s razor and Bayes’ theorem
The law of medium numbers

{ 11 comments }

Interview with David Spiegelhalter

by John on February 2, 2011

Samuel Hansen interviews David Spiegelhalter on his mathematical podcast Strongly Connected Components. From the show notes:

On today’s episode of Strongly Connected Components Samuel Hansen called up the Winton Professor for the Public Understanding of Risk, as well as Senior Scientist in the MRC Biostatistics Unit, David Spiegelhalter. They discussed the true meaning of risk, the importance of the Bayesian Method, how to get a lot of citations, and even a bit about the bookies.

{ 1 comment }

When it works, it works really well

by John on January 27, 2011

Stephen Stigler [1] compares least-squares methods to the iPhone:

In the United States many consumers are entranced by the magic of the new iPhone, even though they can only use it with the AT&T system, a system noted for spotty coverage — even no receivable signal at all under some conditions. But the magic available when it does work overwhelms the very real shortcomings. Just so, least-squares will remain the tool of choice unless someone concocts a robust methodology that can perform the same magic, a step that would require the suspension of the laws of mathematics.

In other words, least-squares, like the iPhone, works so well when it does work that it’s OK that it fails miserably now and then. Maybe so, but that depends on context.

In his quote, Stigler argues that Americans feel that missing a phone call occasionally is an acceptable trade-off for the features of the iPhone. Many people would agree. But if you’re If you’re on a transplant waiting list, you might prefer more reliable coverage to a nicer phone.

It’s not enough to talk about probabilities of failure without also talking about consequences of failure. For example, the consequences of missing a phone call are greater for some people than for others.

Least-squares is a mathematically convenient way to place a cost on errors: the cost is proportional to the square of the size of the error. That’s often reasonable in application, but not always. In some applications, the cost is simply proportional to the size of error. In other applications, it doesn’t matter how large an error is once it above some threshold. Sometimes the cost of errors is asymmetric: over-estimating has a different cost than under-estimating by the same amount. Sometimes you’re more worried about the worst case than the average case. One size does not fit all.

[1] Stephen M. Stigler, The Changing History of Robustness, American Statistician, Vol. 64, No. 4. November 2010. (Written before Verizon announced it would be supporting the iPhone)

Related posts:

More theoretical power, less real power
Cost-benefit analysis versus benefit-only analysis

{ 7 comments }

More theoretical power, less real power

by John on January 24, 2011

Suppose you’re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision in theory while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of “in theory” changes because you have two competing theories.

When you compare the power of two methods, you’re evaluating each method’s probability of success under its own assumptions. In other words, you’re picking the method that has the better opinion of itself. Thus the more powerful method is not necessarily the method that has the better chance of leading you to a correct conclusion.

Comparing power alone is not enough. You also need to evaluate whether a method makes realistic assumptions and whether it is robust to deviations from its assumptions.

Related posts:

Most published research results are false

Canonical examples from robust statistics

{ 6 comments }

A couple preprints

by John on January 20, 2011

Here are a couple new preprints.

Block-adaptive randomization.
A proposed method for limiting the size of runs in a response-adaptive clinical trial.

Skeptical and optimistic robust priors for clinical trials.
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.

{ 1 comment }