Visualization, modeling, and surprises

This afternoon Hadley Wickham gave a great talk on data analysis. Here’s a paraphrase of something profound he said.

Visualization can surprise you, but it doesn’t scale well.
Modelling scales well, but it can’t surprise you.

Visualization can show you something in your data that you didn’t expect. But some things are hard to see, and visualization is a slow, human process.

Modeling might tell you something slightly unexpected, but your choice of model restricts what you’re going to find once you’ve fit it.

So you iterate. Visualization suggests a model, and then you use your model to factor out some feature of the data. Then you visualize again.

Related posts:

Amputating reality
R without Hadley Wickham
The IOT test

Tagged with:
Posted in Statistics
4 comments on “Visualization, modeling, and surprises
  1. Have you ever looked at Structure Learning for extending models? Very cool stuff. Daphne Koller used it to find new cell protein interactions.

  2. John:

    Yes, this is what we are trying to get at in Bayesian Data Analysis. You iterate the following 3 steps: (1) model building, (2) inference conditional on the model, (3) model checking. The better you do (1) and (2), the more informative step (3) will be.

    The paradox, if there is one, is that people tend to think of steps 2 and 3 as competing: in step 2 you (temporarily) commit to a belief, whereas in step 3 you look for problems. I think these go together–really, that’s what the scientific method is all about–but I’ve found that, in many cases, people who spend a lot of time with a model don’t want to check it, while people who spend a lot of time on exploratory data analysis don’t like models at all.

  3. John says:

    Andrew: I agree, and too often step 3 is missing.

    Statistics without model checking makes me uneasy, to put it mildly. Some argue that model checking is less important in Bayesian statistics, but I don’t buy that. If anything, because Bayesian analysis makes it easier to construct complex models, there may be more need for model checking.

  4. John:

    Yup. As Bayes once said, with great power comes great responsibility.

1 Pings/Trackbacks for "Visualization, modeling, and surprises"
  1. [...] p. 80: “a reasonable strategy in what ought to be an iterative process. Sometimes one has a data-related question and then draws a graph to try to answer it. After drawing the graph a new question might suggest itself, and hence a different graph, better suited to this new question (perhaps with additional data), is drawn. This in turn suggests something else, and so on, until either the data or the grapher is exhausted. [...] My experience suggests that if you begin with a general-purpose plot there is a greater chance of finding what you had not expected.” This is my experience as well, and reminds me also of Hadley Wickham’s description of statistics as iterating between models and graphics. [...]