Posts tagged as:

Probability and Statistics

A support one-liner

by John on March 15, 2011

This morning I had a fun support request related to our software. The exchange took place over email but it could have fit into a couple Twitter messages. Would that all requests could be answered so succinctly.

Question:

Do you have R code to compute P(X > Y) where X ~ gamma(ax, bx) and Y ~ gamma(ay, by)?

Response:

ineq <- function(ax, bx, ay, by) pbeta(bx/(bx+by), ay, ax)

For more on the problem and the solution, see Exact calculation of inequality probabilities.

Related links:

Inequality Calculator software
Blog posts on random inequalities

{ 4 comments }

Absence of evidence

by John on February 22, 2011

Here’s a little saying that irritates me:

Absence of evidence is not evidence of absence.

It’s the kind of thing a Sherlock Holmes-like character might say in a detective novel. The idea is that we can’t be sure something doesn’t exist just because we haven’t seen it yet.

What bothers me is that the statement misuses the word “evidence.” The statement would be correct if we substituted “proof” for “evidence.” We can’t conclude with absolute certainty that something doesn’t exist just because we haven’t yet proved that it does. But evidence is not the same as proof.

Why do we believe that dodo birds are extinct? Because no one has seen one in three centuries. That is, there is an absence of evidence that they exist. That is tantamount to evidence that they do not exist. It’s logically possible that a dodo bird is alive and well somewhere, but there is overwhelming evidence to suggest this is not the case.

Evidence can lead to the wrong conclusion. Why did scientists believe that the coelacanth was extinct? Because no one had seen one except in fossils. The species was believed to have gone extinct 65 million years ago. But in 1938 a fisherman caught one. Absence of evidence is not proof of absence.

coelacanth, a fish once thought to be extinct

Though it is not proof, absence of evidence is unusually strong evidence due to subtle statistical result. Compare the following two scenarios.

Scenario 1: You’ve sequenced the DNA of a large number prostate tumors and found that not one had a particular genetic mutation. How confident can you be that prostate tumors never have this mutation?

Scenario 2: You’ve found that 40% of prostate tumors in your sample have a particular mutation. How confident can you be that 40% of all prostate tumors have this mutation?

It turns out you can have more confidence in the first scenario than the second. If you’ve tested N subjects and not found the mutation, the length of your confidence interval around zero is proportional to N. But if you’ve tested N subjects and found the mutation in 40% of subjects, the length of your confidence interval around 0.40 is proportional to √N. So, for example, if N = 10,000 then the former interval has length on the order of 1/10,000 while the latter interval has length on the order of 1/100. This is known as the rule of three. You can find both a frequentist and a Bayesian justification of the rule here.

Absence of evidence is unusually strong evidence that something is at least rare, though it’s not proof. Sometimes you catch a coelacanth.

Related posts:

Estimating the chances of something that hasn’t happened
Complementary validation

{ 27 comments }

Like Laplace, only more so

by John on February 17, 2011

The Laplace distribution is pointy in the middle and fat in the tails relative to the normal distribution.This post is about a probability distribution that is more pointy in the middle and fatter in the tails.

[click to continue...]

{ 8 comments }

The end of hard-edged science?

by John on February 14, 2011

Bradley Efron says that science is moving away from things like predicting sunrise times and toward predicting things like the weather. The trend is away from studying precisely predictable systems, what Efron calls “hard-edged science,” and toward studying systems “where predictability is tempered by a heavy dose of randomness.”

Hard-edged science still dominates public perceptions, but the attention of modern scientists has swung heavily toward rainfall-like subjects, the kind where random behavior plays a major role. … Deterministic Newtonian science is majestic, and the basis of modern science too, but a few hundred years of it pretty much exhausted nature’s storehouse of precisely predictable events. Subjects like biology, medicine, and economics require a more flexible scientific world view, the kind we statisticians are trained to understand.

Certainly there is increased interest in systems containing “a heavy dose of randomness” but can we really say that we have “pretty much exhausted nature’s storehouse of precisely predictable effects”?

Source: Modern Science and the Bayesian-Frequentist Controversy

Related posts:

Scientific results fading over time
Occam’s razor and Bayes’ theorem
The law of medium numbers

{ 11 comments }

Interview with David Spiegelhalter

by John on February 2, 2011

Samuel Hansen interviews David Spiegelhalter on his mathematical podcast Strongly Connected Components. From the show notes:

On today’s episode of Strongly Connected Components Samuel Hansen called up the Winton Professor for the Public Understanding of Risk, as well as Senior Scientist in the MRC Biostatistics Unit, David Spiegelhalter. They discussed the true meaning of risk, the importance of the Bayesian Method, how to get a lot of citations, and even a bit about the bookies.

{ 1 comment }

When it works, it works really well

by John on January 27, 2011

Stephen Stigler [1] compares least-squares methods to the iPhone:

In the United States many consumers are entranced by the magic of the new iPhone, even though they can only use it with the AT&T system, a system noted for spotty coverage — even no receivable signal at all under some conditions. But the magic available when it does work overwhelms the very real shortcomings. Just so, least-squares will remain the tool of choice unless someone concocts a robust methodology that can perform the same magic, a step that would require the suspension of the laws of mathematics.

In other words, least-squares, like the iPhone, works so well when it does work that it’s OK that it fails miserably now and then. Maybe so, but that depends on context.

In his quote, Stigler argues that Americans feel that missing a phone call occasionally is an acceptable trade-off for the features of the iPhone. Many people would agree. But if you’re If you’re on a transplant waiting list, you might prefer more reliable coverage to a nicer phone.

It’s not enough to talk about probabilities of failure without also talking about consequences of failure. For example, the consequences of missing a phone call are greater for some people than for others.

Least-squares is a mathematically convenient way to place a cost on errors: the cost is proportional to the square of the size of the error. That’s often reasonable in application, but not always. In some applications, the cost is simply proportional to the size of error. In other applications, it doesn’t matter how large an error is once it above some threshold. Sometimes the cost of errors is asymmetric: over-estimating has a different cost than under-estimating by the same amount. Sometimes you’re more worried about the worst case than the average case. One size does not fit all.

[1] Stephen M. Stigler, The Changing History of Robustness, American Statistician, Vol. 64, No. 4. November 2010. (Written before Verizon announced it would be supporting the iPhone)

Related posts:

More theoretical power, less real power
Cost-benefit analysis versus benefit-only analysis

{ 7 comments }

More theoretical power, less real power

by John on January 24, 2011

Suppose you’re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision in theory while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of “in theory” changes because you have two competing theories.

When you compare the power of two methods, you’re evaluating each method’s probability of success under its own assumptions. In other words, you’re picking the method that has the better opinion of itself. Thus the more powerful method is not necessarily the method that has the better chance of leading you to a correct conclusion.

Comparing power alone is not enough. You also need to evaluate whether a method makes realistic assumptions and whether it is robust to deviations from its assumptions.

Related posts:

Most published research results are false

Canonical examples from robust statistics

{ 6 comments }

A couple preprints

by John on January 20, 2011

Here are a couple new preprints.

Block-adaptive randomization.
A proposed method for limiting the size of runs in a response-adaptive clinical trial.

Skeptical and optimistic robust priors for clinical trials.
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.

{ 1 comment }

Fitting an elephant

by John on January 18, 2011

“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” — John von Neumann

Related post: Occam’s razor and Bayes’ theorem

{ 2 comments }

Scientific results fading over time

by John on January 17, 2011

A recent article in The New Yorker gives numerous examples of scientific results fading over time. Effects that were large when first measured become smaller in subsequent studies. Firmly established facts become doubtful. It’s as if scientific laws are being gradually repealed. This phenomena is known as “the decline effect.” The full title of the article is The decline effect and the scientific method.

The article brings together many topics that have been discussed here: regression to the mean, publication bias, scientific fashion, etc. Here’s a little sample.

“… when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … After a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.

This excerpt happens to be talking about “fluctuating asymmetry,” the idea that animals prefer more symmetric mates because symmetry is a proxy for good genes. (I edited out references to fluctuating asymmetry from the quote to emphasize that the remarks could equally apply to any number of topics. ) Fluctuating asymmetry was initially confirmed by numerous studies, but then the tide shifted and more studies failed to find the effect.

When such a shift happens, it would be reassuring to believe that the initial studies were simply wrong and that the new studies are right. But both the positive and negative results confirmed the prevailing view at the time they were published. There’s no reason to believe the latter studies are necessarily more reliable.

Related posts:

Why microarray study conclusions are so often wrong
Popular research areas produce more false results
Five criticisms of significance testing

{ 6 comments }

Bayesian methods at the end

by John on December 16, 2010

I was looking at the preface of an old statistic book and read this:

The Bayesian techniques occur at the end of each chapter; therefore they can be omitted if time does not permit their inclusion.

This approach is typical. Many textbooks present frequentist statistics with a little Bayesian statistics at the end of each section or at the end of the book.

There are a couple ways to look at that. One is simply that Bayesian methods are optional. They must not be that important or they’d get more space. The author even recommends dropping them if pressed for time.

Another way to look at this is that Bayesian statistics must be simpler than frequentist statistics since the Bayesian approach to each task requires fewer pages.

Related posts:

Musicians, drunks, and Oliver Cromwell
What is a confidence interval?
Classical statistics in a nutshell

{ 11 comments }

Big data is not enough

by John on December 15, 2010

Given enough data, correct answers jump out at you, right?

In some ways I think that scientists have misled themselves into thinking that if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart. You can narrow down your questions; but enormous data sets often consist of enormous numbers of small sets of data, none of which by themselves are enough to solve the thing you are interested in, and they fit together in some complicated way.

Bradley Efron, quoted in Significance. Emphasis added.

Related posts:

Predicting height from genes
The data may not contain the answer

{ 11 comments }

How to test a random number generator

by John on December 6, 2010

Last year I wrote a chapter for O’Reilly’s book Beautiful Testing. The publisher gave each of us permission to post our chapters electronically, and so here is Chapter 10: How to test a random number generator.

Beautiful Testing: Leading Professionals Reveal How They Test

{ 16 comments }

New Twitter account: StatFact

by John on November 30, 2010

I’m starting a new daily tip account on Twitter. @StatFact will post one statement from statistics per day, drawing from Bayesian and frequentist statistics. Like my other daily tip accounts, StatFact will post Monday through Friday on a regular schedule with a few unscheduled tweets sprinkled in occasionally.

I’m using a product sign as the symbol for StatFact.

\prod

I thought the product sign might suggest a likelihood function. The most obvious symbol for a statistics account would be a bell curve, but that’s been overused.

If you’re interested in StatFact, here are some things you could do.

  • Follow StatFact on Twitter.
  • Tell friends about StatFact.
  • Suggest topics, or even better, specific tweets.
  • Propose a better icon.
  • Let me know if I say anything ambiguous or wrong.

To find out about my other daily tip accounts, please see the FAQ post.

{ 2 comments }

Bias and consistency

by John on November 1, 2010

Suppose you have two ways to estimate something you’re interested in. One is biased and one is unbiased. Surely the unbiased method is better, right? Not necessarily. Statistical bias is not as bad as it sounds.

Under ideal conditions, an unbiased estimator gives the correct answer on average, but each particular estimate may be ridiculous. Suppose you ask me to estimate how many dwarfs were in Snow White and the Seven Dwarfs. If I alternately guess 100 and -272, each guess will be wildly wrong. But if 75% of the time I guess 100 and 25% of the time guess -272, my average guess will be 7 and so my estimates will be unbiased. But if half the time I guess 8 and half the time I guess 7, my average guess will be 7.5 and my process will be biased. However, each estimate will be more accurate.

Consistency is a weaker condition than unbiasedness. Consistency says that if you feed your method enough data generated from your assumed model, your estimates will converge to the correct value.

But if your model is not exactly correct (and it never is) will you get a reasonably good result? It’s possible for an inconsistent method to provide good results in practice and it’s possible that a consistent method may not.

In his blog post on cross validation, Rob Hyndman mentions a paper that shows one validation method is consistent and another is not. Rob concludes

Frankly, I don’t consider this is a very important result as there is never a true model. In reality, every model is wrong, so consistency is not really an interesting property.

In the context of his post, Rob argues that the most important test of a statistical method is how well it predicts future data. Some people have commented that this comes down too hard on consistency. But we’re talking about a blog post, and blogs don’t use the same kind of carefully qualified language that formal papers do. Perhaps in a more formal setting Rob might argue that a gross failure of consistency gives one reason to suspect a method won’t predict well, but a lack of complete consistency shouldn’t remove a method from consideration. Such language may be inoffensive, but it lacks the verve of his original statement.

Too often bias and consistency are seen as all-or-nothing properties. In theoretical statistics, one typically asks whether a method is biased, not how biased it is. The same is true of consistency. Bias and consistency are only two criteria by which methods can be evaluated. A small amount of bias or inconsistency may be an acceptable trade-off in exchange for better performance by other criteria such as efficiency or robustness.

Related posts:

The Titanic Effect
What distribution does my data have?

{ 4 comments }