Posts tagged as:

Probability and Statistics

Interview with David Spiegelhalter

by John on February 2, 2011

Samuel Hansen interviews David Spiegelhalter on his mathematical podcast Strongly Connected Components. From the show notes:

On today’s episode of Strongly Connected Components Samuel Hansen called up the Winton Professor for the Public Understanding of Risk, as well as Senior Scientist in the MRC Biostatistics Unit, David Spiegelhalter. They discussed the true meaning of risk, the importance of the Bayesian Method, how to get a lot of citations, and even a bit about the bookies.

{ 1 comment }

When it works, it works really well

by John on January 27, 2011

Stephen Stigler [1] compares least-squares methods to the iPhone:

In the United States many consumers are entranced by the magic of the new iPhone, even though they can only use it with the AT&T system, a system noted for spotty coverage — even no receivable signal at all under some conditions. But the magic available when it does work overwhelms the very real shortcomings. Just so, least-squares will remain the tool of choice unless someone concocts a robust methodology that can perform the same magic, a step that would require the suspension of the laws of mathematics.

In other words, least-squares, like the iPhone, works so well when it does work that it’s OK that it fails miserably now and then. Maybe so, but that depends on context.

In his quote, Stigler argues that Americans feel that missing a phone call occasionally is an acceptable trade-off for the features of the iPhone. Many people would agree. But if you’re If you’re on a transplant waiting list, you might prefer more reliable coverage to a nicer phone.

It’s not enough to talk about probabilities of failure without also talking about consequences of failure. For example, the consequences of missing a phone call are greater for some people than for others.

Least-squares is a mathematically convenient way to place a cost on errors: the cost is proportional to the square of the size of the error. That’s often reasonable in application, but not always. In some applications, the cost is simply proportional to the size of error. In other applications, it doesn’t matter how large an error is once it above some threshold. Sometimes the cost of errors is asymmetric: over-estimating has a different cost than under-estimating by the same amount. Sometimes you’re more worried about the worst case than the average case. One size does not fit all.

[1] Stephen M. Stigler, The Changing History of Robustness, American Statistician, Vol. 64, No. 4. November 2010. (Written before Verizon announced it would be supporting the iPhone)

Related posts:

More theoretical power, less real power
Cost-benefit analysis versus benefit-only analysis

{ 7 comments }

More theoretical power, less real power

by John on January 24, 2011

Suppose you’re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision in theory while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of “in theory” changes because you have two competing theories.

When you compare the power of two methods, you’re evaluating each method’s probability of success under its own assumptions. In other words, you’re picking the method that has the better opinion of itself. Thus the more powerful method is not necessarily the method that has the better chance of leading you to a correct conclusion.

Comparing power alone is not enough. You also need to evaluate whether a method makes realistic assumptions and whether it is robust to deviations from its assumptions.

Related posts:

Most published research results are false

Canonical examples from robust statistics

{ 6 comments }

A couple preprints

by John on January 20, 2011

Here are a couple new preprints.

Block-adaptive randomization.
A proposed method for limiting the size of runs in a response-adaptive clinical trial.

Skeptical and optimistic robust priors for clinical trials.
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.

{ 1 comment }

Fitting an elephant

by John on January 18, 2011

“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” — John von Neumann

Related post: Occam’s razor and Bayes’ theorem

{ 2 comments }

Scientific results fading over time

by John on January 17, 2011

A recent article in The New Yorker gives numerous examples of scientific results fading over time. Effects that were large when first measured become smaller in subsequent studies. Firmly established facts become doubtful. It’s as if scientific laws are being gradually repealed. This phenomena is known as “the decline effect.” The full title of the article is The decline effect and the scientific method.

The article brings together many topics that have been discussed here: regression to the mean, publication bias, scientific fashion, etc. Here’s a little sample.

“… when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … After a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.

This excerpt happens to be talking about “fluctuating asymmetry,” the idea that animals prefer more symmetric mates because symmetry is a proxy for good genes. (I edited out references to fluctuating asymmetry from the quote to emphasize that the remarks could equally apply to any number of topics. ) Fluctuating asymmetry was initially confirmed by numerous studies, but then the tide shifted and more studies failed to find the effect.

When such a shift happens, it would be reassuring to believe that the initial studies were simply wrong and that the new studies are right. But both the positive and negative results confirmed the prevailing view at the time they were published. There’s no reason to believe the latter studies are necessarily more reliable.

Related posts:

Why microarray study conclusions are so often wrong
Popular research areas produce more false results
Five criticisms of significance testing

{ 6 comments }

Bayesian methods at the end

by John on December 16, 2010

I was looking at the preface of an old statistic book and read this:

The Bayesian techniques occur at the end of each chapter; therefore they can be omitted if time does not permit their inclusion.

This approach is typical. Many textbooks present frequentist statistics with a little Bayesian statistics at the end of each section or at the end of the book.

There are a couple ways to look at that. One is simply that Bayesian methods are optional. They must not be that important or they’d get more space. The author even recommends dropping them if pressed for time.

Another way to look at this is that Bayesian statistics must be simpler than frequentist statistics since the Bayesian approach to each task requires fewer pages.

Related posts:

Musicians, drunks, and Oliver Cromwell
What is a confidence interval?
Classical statistics in a nutshell

{ 11 comments }

Big data is not enough

by John on December 15, 2010

Given enough data, correct answers jump out at you, right?

In some ways I think that scientists have misled themselves into thinking that if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart. You can narrow down your questions; but enormous data sets often consist of enormous numbers of small sets of data, none of which by themselves are enough to solve the thing you are interested in, and they fit together in some complicated way.

Bradley Efron, quoted in Significance. Emphasis added.

Related posts:

Predicting height from genes
The data may not contain the answer

{ 13 comments }

How to test a random number generator

by John on December 6, 2010

Last year I wrote a chapter for O’Reilly’s book Beautiful Testing. The publisher gave each of us permission to post our chapters electronically, and so here is Chapter 10: How to test a random number generator.

Beautiful Testing: Leading Professionals Reveal How They Test

{ 19 comments }

New Twitter account: StatFact

by John on November 30, 2010

I’m starting a new daily tip account on Twitter. @StatFact will post one statement from statistics per day, drawing from Bayesian and frequentist statistics. Like my other daily tip accounts, StatFact will post Monday through Friday on a regular schedule with a few unscheduled tweets sprinkled in occasionally.

I’m using a product sign as the symbol for StatFact.

\prod

I thought the product sign might suggest a likelihood function. The most obvious symbol for a statistics account would be a bell curve, but that’s been overused.

If you’re interested in StatFact, here are some things you could do.

  • Follow StatFact on Twitter.
  • Tell friends about StatFact.
  • Suggest topics, or even better, specific tweets.
  • Propose a better icon.
  • Let me know if I say anything ambiguous or wrong.

To find out about my other daily tip accounts, please see the FAQ post.

{ 2 comments }

Bias and consistency

by John on November 1, 2010

Suppose you have two ways to estimate something you’re interested in. One is biased and one is unbiased. Surely the unbiased method is better, right? Not necessarily. Statistical bias is not as bad as it sounds.

Under ideal conditions, an unbiased estimator gives the correct answer on average, but each particular estimate may be ridiculous. Suppose you ask me to estimate how many dwarfs were in Snow White and the Seven Dwarfs. If I alternately guess 100 and -272, each guess will be wildly wrong. But if 75% of the time I guess 100 and 25% of the time guess -272, my average guess will be 7 and so my estimates will be unbiased. But if half the time I guess 8 and half the time I guess 7, my average guess will be 7.5 and my process will be biased. However, each estimate will be more accurate.

Consistency is a weaker condition than unbiasedness. Consistency says that if you feed your method enough data generated from your assumed model, your estimates will converge to the correct value.

But if your model is not exactly correct (and it never is) will you get a reasonably good result? It’s possible for an inconsistent method to provide good results in practice and it’s possible that a consistent method may not.

In his blog post on cross validation, Rob Hyndman mentions a paper that shows one validation method is consistent and another is not. Rob concludes

Frankly, I don’t consider this is a very important result as there is never a true model. In reality, every model is wrong, so consistency is not really an interesting property.

In the context of his post, Rob argues that the most important test of a statistical method is how well it predicts future data. Some people have commented that this comes down too hard on consistency. But we’re talking about a blog post, and blogs don’t use the same kind of carefully qualified language that formal papers do. Perhaps in a more formal setting Rob might argue that a gross failure of consistency gives one reason to suspect a method won’t predict well, but a lack of complete consistency shouldn’t remove a method from consideration. Such language may be inoffensive, but it lacks the verve of his original statement.

Too often bias and consistency are seen as all-or-nothing properties. In theoretical statistics, one typically asks whether a method is biased, not how biased it is. The same is true of consistency. Bias and consistency are only two criteria by which methods can be evaluated. A small amount of bias or inconsistency may be an acceptable trade-off in exchange for better performance by other criteria such as efficiency or robustness.

Related posts:

The Titanic Effect
What distribution does my data have?

{ 4 comments }

The Titanic Effect

by John on October 18, 2010

Gerald Weinberg’s book Secrets of Consulting is filled with great aphorisms. One of these he calls the Titanic Effect:

The thought that disaster is impossible often leads to an unthinkable disaster.

If your model says disaster is extremely unlikely, the weakest link may be your model.

In The Black Swan, Nassim Taleb looks at the risks facing a casino. The biggest risks have not been lucky gamblers. The actuaries working for casinos understand the risks of lucky customers very well and put policies into place to protect against these risks. But the actuaries didn’t account for the possibility that a tiger might maul an irreplaceable performer, costing the casino $100 million. Neither did they account for the possibility that an employee might forget to file tax paperwork or that someone might kidnap a casino owner’s daughter. No one could have foreseen these events, and that’s the point: there are always risks outside your model.

Related post:

Feasibility studies

{ 8 comments }

Probability that a number is prime

by John on October 6, 2010

The fastest ways to test whether a number is prime have some small probability of being wrong. Said another way, it’s easier to tell whether a number is “probably” prime than to tell with certainty that it’s prime. This post looks briefly at algorithms for primality testing then focuses on what exactly it means to say a number is “probably prime.”

[click to continue...]

{ 9 comments }

Probability and Statistics cheat sheet

by John on October 4, 2010

Matthias Vallentin posted a comment on my post about a math/CS cheat sheet to say that he’s been working on a probability and statistics cheat sheet. Looks great, though at 24 pages it stretches the definition of “cheat sheet” even more than the computer science cheat sheet did.

Anybody know of other cool cheat sheets?

Related links:

Diagram of probability relationships
Diagram of modes of convergence
Diagram of special function relationships

{ 11 comments }

Statistical dead end

by John on September 20, 2010

I get suspicious when I hear people ask about third and fourth moments (skewness and kurtosis). I’ve heard these terms far more often from people who don’t understand statistics than from people who do.

There are two common errors people often have in mind when they bring up skewness and kurtosis.

First, they implicitly believe that distributions can be boiled down to three or four numbers. Maybe they had an elementary statistics course in which everything boiled down to two moments — mean and variance — and they suspect that’s not enough, that advanced statistics extends elementary statistics by looking at third or fourth moments. “There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.” The path forward is not considering higher and higher moments.

This leads to a second and closely related problem. Interest in third and fourth moments sounds like hearkening back to the moment-matching approach to statistics. Moment matching was a simple idea for estimating distribution parameters:

  1. Set population means equal to sample means.
  2. Set population variances equal to sample variances.
  3. Solve the resulting equations for distribution parameters.

There’s more to moment matching that that, but that’s enough for this discussion. It’s a very natural approach, which is probably why it still persists. But it’s also a statistical dead end.

Moment matching is the most convenient approach to finding estimators in some cases. However, there is another approach to statistics that has largely replaced moment matching, and that’s maximum likelihood estimation: find the parameters that make the data most likely.

Both moment matching and maximum likelihood are intuitively appealing ideas. Sometimes they lead to the same conclusions but often they do not. They competed decades ago and maximum likelihood won. One reason is that maximum likelihood estimators have better theoretical properties. Another reason is that maximum likelihood estimation provides a unified approach that isn’t thwarted by difficulties in solving algebraic equations.

There are good reasons to be concerned about higher moments (including fractional moments) though these are primarily theoretical. For example, higher moments are useful in quantifying the error in the central limit theorem. But there are not a lot of elementary applications of higher moments in contemporary statistics.

{ 13 comments }