Move on to the next question

Here’s a recent discussion from Math Overflow.

Q: I have some data points and, when I plot them on R, it looks like a normal distribution. I want to know how well my data fits the normal distribution. What kind of test should I do?

A: There’s actually a much broader question that you should be asking yourself here: does it matter whether your data really is normally distributed, or will the procedures that you’re going to perform on the data be reasonably robust in the presence of a distribution that is only approximately normal? …

The person asking the question was already satisfied that his data were approximately normal. So it was time to move on to the next question: Does what I want to do next work well for approximately normal data? (There’s no point asking whether your data is normal; it’s not. Normality is an idealization.)

We’re often tempted to add decimal places to the answer to one question instead of moving on to the next question. Maybe we don’t even realize what the next question should be. Or maybe we do know but we want stay with the familiar. In either case, this quote from John Tukey comes to mind.

An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.

Related post: What distribution does my data have?

3 thoughts on “Move on to the next question”

Rick Wicklin

9 May 2011 at 11:58

I think Tukey would also ask “the next question.” Although he is best known in some circles for his work on exploratory data analysis, he laid the foundations for robust statistics and did important work on understanding the contaminated normal distribution, the jackknife, robust regression, and other topics in robust statistics. For details, see “John W. Tukey’s Contributions to Robust Statistics,” by Peter Huber, The Annals of Statistics, 30(6), 2002.

John

9 May 2011 at 12:01

In this case I think Tukey would say that robustness is the right problem and that an approximate answer to that question would be more valuable than an exact answer to the goodness of fit question.

Maximilian

9 May 2011 at 12:07

Now that is kind of mean:
For example in Radio communication you can determine the power of the noise from the parameter of a normal distribution (basically the received signal is Noise+wanted Signal).
If I have data points and a general idea how the data was created, I can test my hypothesis. Maybe its skewed, maybe the distribution changes over time etc…
Of course we don’t know what the Goal of the poster was in the first place.

Comments are closed.