From the category archives:

Statistics

Saved by symmetry

by John on March 31, 2011

When I solve a problem by appealing to symmetry, students’ jaws drop. They look at me as if I’d pulled a rabbit out of a hat.

I used think of these tricks as common knowledge, but now I think they’re common knowledge in some circles (e.g. physics) and not as common in others. These tricks are simple, but not as many people as I’d thought have been trained to spot opportunities to apply them.

[click to continue...]

{ 10 comments }

A support one-liner

by John on March 15, 2011

This morning I had a fun support request related to our software. The exchange took place over email but it could have fit into a couple Twitter messages. Would that all requests could be answered so succinctly.

Question:

Do you have R code to compute P(X > Y) where X ~ gamma(ax, bx) and Y ~ gamma(ay, by)?

Response:

ineq <- function(ax, bx, ay, by) pbeta(bx/(bx+by), ay, ax)

For more on the problem and the solution, see Exact calculation of inequality probabilities.

Related links:

Inequality Calculator software
Blog posts on random inequalities

{ 4 comments }

Absence of evidence

by John on February 22, 2011

Here’s a little saying that irritates me:

Absence of evidence is not evidence of absence.

It’s the kind of thing a Sherlock Holmes-like character might say in a detective novel. The idea is that we can’t be sure something doesn’t exist just because we haven’t seen it yet.

What bothers me is that the statement misuses the word “evidence.” The statement would be correct if we substituted “proof” for “evidence.” We can’t conclude with absolute certainty that something doesn’t exist just because we haven’t yet proved that it does. But evidence is not the same as proof.

Why do we believe that dodo birds are extinct? Because no one has seen one in three centuries. That is, there is an absence of evidence that they exist. That is tantamount to evidence that they do not exist. It’s logically possible that a dodo bird is alive and well somewhere, but there is overwhelming evidence to suggest this is not the case.

Evidence can lead to the wrong conclusion. Why did scientists believe that the coelacanth was extinct? Because no one had seen one except in fossils. The species was believed to have gone extinct 65 million years ago. But in 1938 a fisherman caught one. Absence of evidence is not proof of absence.

coelacanth, a fish once thought to be extinct

Though it is not proof, absence of evidence is unusually strong evidence due to subtle statistical result. Compare the following two scenarios.

Scenario 1: You’ve sequenced the DNA of a large number prostate tumors and found that not one had a particular genetic mutation. How confident can you be that prostate tumors never have this mutation?

Scenario 2: You’ve found that 40% of prostate tumors in your sample have a particular mutation. How confident can you be that 40% of all prostate tumors have this mutation?

It turns out you can have more confidence in the first scenario than the second. If you’ve tested N subjects and not found the mutation, the length of your confidence interval around zero is proportional to N. But if you’ve tested N subjects and found the mutation in 40% of subjects, the length of your confidence interval around 0.40 is proportional to √N. So, for example, if N = 10,000 then the former interval has length on the order of 1/10,000 while the latter interval has length on the order of 1/100. This is known as the rule of three. You can find both a frequentist and a Bayesian justification of the rule here.

Absence of evidence is unusually strong evidence that something is at least rare, though it’s not proof. Sometimes you catch a coelacanth.

Related posts:

Estimating the chances of something that hasn’t happened
Complementary validation

{ 27 comments }

Like Laplace, only more so

by John on February 17, 2011

The Laplace distribution is pointy in the middle and fat in the tails relative to the normal distribution.This post is about a probability distribution that is more pointy in the middle and fatter in the tails.

[click to continue...]

{ 8 comments }

The end of hard-edged science?

by John on February 14, 2011

Bradley Efron says that science is moving away from things like predicting sunrise times and toward predicting things like the weather. The trend is away from studying precisely predictable systems, what Efron calls “hard-edged science,” and toward studying systems “where predictability is tempered by a heavy dose of randomness.”

Hard-edged science still dominates public perceptions, but the attention of modern scientists has swung heavily toward rainfall-like subjects, the kind where random behavior plays a major role. … Deterministic Newtonian science is majestic, and the basis of modern science too, but a few hundred years of it pretty much exhausted nature’s storehouse of precisely predictable events. Subjects like biology, medicine, and economics require a more flexible scientific world view, the kind we statisticians are trained to understand.

Certainly there is increased interest in systems containing “a heavy dose of randomness” but can we really say that we have “pretty much exhausted nature’s storehouse of precisely predictable effects”?

Source: Modern Science and the Bayesian-Frequentist Controversy

Related posts:

Scientific results fading over time
Occam’s razor and Bayes’ theorem
The law of medium numbers

{ 11 comments }

Interview with David Spiegelhalter

by John on February 2, 2011

Samuel Hansen interviews David Spiegelhalter on his mathematical podcast Strongly Connected Components. From the show notes:

On today’s episode of Strongly Connected Components Samuel Hansen called up the Winton Professor for the Public Understanding of Risk, as well as Senior Scientist in the MRC Biostatistics Unit, David Spiegelhalter. They discussed the true meaning of risk, the importance of the Bayesian Method, how to get a lot of citations, and even a bit about the bookies.

{ 1 comment }

When it works, it works really well

by John on January 27, 2011

Stephen Stigler [1] compares least-squares methods to the iPhone:

In the United States many consumers are entranced by the magic of the new iPhone, even though they can only use it with the AT&T system, a system noted for spotty coverage — even no receivable signal at all under some conditions. But the magic available when it does work overwhelms the very real shortcomings. Just so, least-squares will remain the tool of choice unless someone concocts a robust methodology that can perform the same magic, a step that would require the suspension of the laws of mathematics.

In other words, least-squares, like the iPhone, works so well when it does work that it’s OK that it fails miserably now and then. Maybe so, but that depends on context.

In his quote, Stigler argues that Americans feel that missing a phone call occasionally is an acceptable trade-off for the features of the iPhone. Many people would agree. But if you’re If you’re on a transplant waiting list, you might prefer more reliable coverage to a nicer phone.

It’s not enough to talk about probabilities of failure without also talking about consequences of failure. For example, the consequences of missing a phone call are greater for some people than for others.

Least-squares is a mathematically convenient way to place a cost on errors: the cost is proportional to the square of the size of the error. That’s often reasonable in application, but not always. In some applications, the cost is simply proportional to the size of error. In other applications, it doesn’t matter how large an error is once it above some threshold. Sometimes the cost of errors is asymmetric: over-estimating has a different cost than under-estimating by the same amount. Sometimes you’re more worried about the worst case than the average case. One size does not fit all.

[1] Stephen M. Stigler, The Changing History of Robustness, American Statistician, Vol. 64, No. 4. November 2010. (Written before Verizon announced it would be supporting the iPhone)

Related posts:

More theoretical power, less real power
Cost-benefit analysis versus benefit-only analysis

{ 7 comments }

More theoretical power, less real power

by John on January 24, 2011

Suppose you’re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision in theory while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of “in theory” changes because you have two competing theories.

When you compare the power of two methods, you’re evaluating each method’s probability of success under its own assumptions. In other words, you’re picking the method that has the better opinion of itself. Thus the more powerful method is not necessarily the method that has the better chance of leading you to a correct conclusion.

Comparing power alone is not enough. You also need to evaluate whether a method makes realistic assumptions and whether it is robust to deviations from its assumptions.

Related posts:

Most published research results are false

Canonical examples from robust statistics

{ 6 comments }

A couple preprints

by John on January 20, 2011

Here are a couple new preprints.

Block-adaptive randomization.
A proposed method for limiting the size of runs in a response-adaptive clinical trial.

Skeptical and optimistic robust priors for clinical trials.
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.

{ 1 comment }

Fitting an elephant

by John on January 18, 2011

“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” — John von Neumann

Related post: Occam’s razor and Bayes’ theorem

{ 2 comments }

Scientific results fading over time

by John on January 17, 2011

A recent article in The New Yorker gives numerous examples of scientific results fading over time. Effects that were large when first measured become smaller in subsequent studies. Firmly established facts become doubtful. It’s as if scientific laws are being gradually repealed. This phenomena is known as “the decline effect.” The full title of the article is The decline effect and the scientific method.

The article brings together many topics that have been discussed here: regression to the mean, publication bias, scientific fashion, etc. Here’s a little sample.

“… when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … After a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.

This excerpt happens to be talking about “fluctuating asymmetry,” the idea that animals prefer more symmetric mates because symmetry is a proxy for good genes. (I edited out references to fluctuating asymmetry from the quote to emphasize that the remarks could equally apply to any number of topics. ) Fluctuating asymmetry was initially confirmed by numerous studies, but then the tide shifted and more studies failed to find the effect.

When such a shift happens, it would be reassuring to believe that the initial studies were simply wrong and that the new studies are right. But both the positive and negative results confirmed the prevailing view at the time they were published. There’s no reason to believe the latter studies are necessarily more reliable.

Related posts:

Why microarray study conclusions are so often wrong
Popular research areas produce more false results
Five criticisms of significance testing

{ 6 comments }

Occam’s razor and Bayes’ theorem

by John on January 12, 2011

Occam’s razor says that if two models fit equally well, the simpler model is likely to be a better description of reality. Why should that be?

A paper by Jim Berger suggests a Bayesian justification of Occam’s razor: simpler hypotheses have higher posterior probabilities when they fit well.

A simple model makes sharper predictions than a more complex model. For example, consider fitting a linear model and a cubic model. The cubic model is more general and fits more data. The linear model is more restrictive and hence easier to falsify. But when the linear and cubic models both fit, Bayes’ theorem “rewards” the linear model for making a bolder prediction. See Berger’s paper for a details and examples.

From the conclusion of the paper:

Ockham’s razor, far from being merely an ad hoc principle, can under many practical situations in science be justified as a consequence of Bayesian inference. Bayesian analysis can shed new light on what the notion of “simplest” hypothesis consistent with the data actually means.

Related links:

How loud is the evidence?
Blog posts on Bayesian statistics

{ 15 comments }

Bayesian methods at the end

by John on December 16, 2010

I was looking at the preface of an old statistic book and read this:

The Bayesian techniques occur at the end of each chapter; therefore they can be omitted if time does not permit their inclusion.

This approach is typical. Many textbooks present frequentist statistics with a little Bayesian statistics at the end of each section or at the end of the book.

There are a couple ways to look at that. One is simply that Bayesian methods are optional. They must not be that important or they’d get more space. The author even recommends dropping them if pressed for time.

Another way to look at this is that Bayesian statistics must be simpler than frequentist statistics since the Bayesian approach to each task requires fewer pages.

Related posts:

Musicians, drunks, and Oliver Cromwell
What is a confidence interval?
Classical statistics in a nutshell

{ 11 comments }

Big data is not enough

by John on December 15, 2010

Given enough data, correct answers jump out at you, right?

In some ways I think that scientists have misled themselves into thinking that if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart. You can narrow down your questions; but enormous data sets often consist of enormous numbers of small sets of data, none of which by themselves are enough to solve the thing you are interested in, and they fit together in some complicated way.

Bradley Efron, quoted in Significance. Emphasis added.

Related posts:

Predicting height from genes
The data may not contain the answer

{ 13 comments }

How to test a random number generator

by John on December 6, 2010

Last year I wrote a chapter for O’Reilly’s book Beautiful Testing. The publisher gave each of us permission to post our chapters electronically, and so here is Chapter 10: How to test a random number generator.

Beautiful Testing: Leading Professionals Reveal How They Test

{ 19 comments }