In applications we’d like to draw independent random samples from complicated probability distributions, often the posterior distribution on parameters in a Bayesian analysis. Most of the time this is impractical.
MCMC (Markov Chain Monte Carlo) gives us a way around this impasse. It lets us draw samples from practically any probability distribution. But there’s a catch: the samples are not independent. This lack of independence means that all the familiar theory on convergence of sums of random variables goes out the window.
There’s not much theory to guide assessing the convergence of sums of MCMC samples, but there are heuristics. One of these is effective sample size (ESS). The idea is to have a sort of “exchange rate” between dependent and independent samples. You might want to say, for example, that 1,000 samples from a certain Markov chain are worth about as much as 80 independent samples because the MCMC samples are highly correlated. Or you might want to say that 1,000 samples from a different Markov chain are worth about as much as 300 independent samples because although the MCMC samples are dependent, they’re weakly correlated.
Here’s the definition of ESS:
where n is the number of samples and ρ(k) is the correlation at lag k.
This behaves well in the extremes. If your samples are independent, your effective samples size equals the actual sample size. If the correlation at lag k decreases extremely slowly, so slowly that the sum in the denominator diverges, your effective sample size is zero.
Any reasonable Markov chain is between the extremes. Zero lag correlation is too much to hope for, but ideally the correlations die off fast enough that the sum in the denominator not only converges but also isn’t a terribly large value.
I’m not sure who first proposed this definition of ESS. There’s a reference to it in Handbook of Markov Chain Monte Carlo where the authors cite a paper [1] in which Radford Neal mentions it. Neal cites B. D. Ripley [2].
***
[1] Markov Chain Monte Carlo in Practice: A Roundtable Discussion. Robert E. Kass, Bradley P. Carlin, Andrew Gelman and Radford M. Neal. The American Statistician. Vol. 52, No. 2 (May, 1998), pp. 93-100
[2] Stochlastic Simulation, B. D. Ripley, 1987.
John:
The definition you gave for effective sample size doesn’t quite work in practice because (a) you can’t sum to infinity, and (b) it will be too optimistic for chains that haven’t mixed. We have an effective sample size estimate that addresses both these concerns. It’s in formulas (11.7) and (11.8) in Section 11.5 of BDA3.
If you don’t have a copy of BDA, it’s also in the Stan manual, on pages 353-354 of the manual for Stan version 2.14.0.
I guess I should post this on my blog…
Thanks. I don’t see (a) as a concern. In practice, the infinite sum means to sum until additional terms add little to the result. But I see how (b) could be a problem.