Blog Archives

Computing skewness and kurtosis in one pass

If you compute the standard deviation of a data set by directly implementing the definition, you’ll need to pass through the data twice: once to find the mean, then a second time to accumulate the squared differences from the mean.

Tagged with:
Posted in Software development, Statistics

A statistical problem with “nothing to hide”

One problem with the nothing-to-hide argument is that it assumes innocent people will be exonerated certainly and effortlessly. That is, it assumes that there are no errors, or if there are, they are resolved quickly and easily. Suppose the probability

Tagged with: ,
Posted in Statistics

There are no outliers

Matt Brigg’s comment on outliers in his post Tyranny of the mean: Coontz used the word “outliers”. There are no such things. There can be mismeasured data, i.e. incorrect data, say when you tried to measure air temperature but your

Tagged with:
Posted in Statistics

Bad normal approximation

Sometimes you can approximate a binomial distribution with a normal distribution. Under the right conditions, a Binomial(n, p) has approximately the distribution of a normal with the same mean and variance, i.e. mean np and variance np(1-p). The approximation works

Tagged with:
Posted in Statistics

Moments of mixtures

I needed to compute the higher moments of a mixture distribution for a project I’m working on. I’m writing up the code here in case anyone else finds this useful. (And in case I’ll find it useful in the future.)

Tagged with: ,
Posted in Python, Statistics

Data calls the model’s bluff

I hear a lot of people saying that simple models work better than complex models when you have enough data. For example, here’s a tweet from Giuseppe Paleologo this morning: Isn’t it ironic that almost all known results in asymptotic

Tagged with:
Posted in Statistics

Robustness of equal weights

In Thinking, Fast and Slow, Daniel Kahneman comments on The robust beauty of improper linear models in decision making by Robyn Dawes. According to Dawes, or at least Kahneman’s summary of Dawes, simply averaging a few relevant predictors may work

Tagged with:
Posted in Statistics

Offended by conditional probability

It’s a simple rule of probability that if A makes B more likely, B makes A more likely. That is, if the conditional probability of A given B is larger than the probability of A alone, the the conditional probability

Tagged with: ,
Posted in Statistics

Visualization, modeling, and surprises

This afternoon Hadley Wickham gave a great talk on data analysis. Here’s a paraphrase of something profound he said. Visualization can surprise you, but it doesn’t scale well. Modelling scales well, but it can’t surprise you. Visualization can show you

Tagged with:
Posted in Statistics

Statistics stories wanted

Andrew Gelman is trying to collect 365 stories about life as a statistician: So here’s the plan. 365 of you write vignettes about your statistical lives. Get into the nitty gritty—tell me what you do, and why you’re doing it.

Tagged with:
Posted in Statistics

Elementary statistics book recommendation

I’ve thought about making a personal FAQ page. If I do, one of the questions would be what elementary statistics book I recommend. Unfortunately, I don’t have an answer for that one. I haven’t seen such a book I’d recommend

Tagged with:
Posted in Statistics

Closet Bayesian

When I was a grad student, a statistics postdoc confided to me that he was a “closet Bayesian.” This sounded absolutely bizarre. Why would someone be secretive about his preferred approach to statistics? I could not imagine someone whispering that

Tagged with: ,
Posted in Statistics

Statistics blogosphere

John Johnson did an analysis of the statistics blogosphere for the Coursera Social Networking Analysis class. His blog post about the analysis lists some of the lessons he learned from the project. It also includes a link to his paper

Tagged with:
Posted in Statistics

Random inequalities and Edgeworth approximation

I’ve written a lot about random inequalities. That’s because computers spend a lot of time computing random inequalities in the inner loop of simulations. I’m looking for ways to speed this up. Here’s my latest idea: Approximating random inequalities with

Tagged with:
Posted in Math, Statistics

Product of normal PDFs

The product of two normal PDFs is proportional to a normal PDF. This is well known in Bayesian statistics because a normal likelihood times a normal prior gives a normal posterior. But because Bayesian applications don’t usually need to know

Tagged with: ,
Posted in Math, Statistics