Bob Carpenter wrote today about how Markov chains cannot thoroughly cover high-dimensional spaces, and that they do not need to. That’s kinda the point of Monte Carlo integration. If you want systematic coverage, you can/must sample systematically, and that’s impractical in high dimensions.

Bob gives the example that if you want to get one integration point in each quadrant of a 20-dimensional space, you need a million samples. (2^{20} to be precise.) But you may be able to get sufficiently accurate results with a lot less than a million samples.

If you wanted to be certain to have one sample in each quadrant, you could sample at (±1, ±1, ±1, …, ±1). But if for some weird reason you wanted to sample randomly *and* hit each quadrant, you have a coupon collector problem. The expected number of samples to hit each of *N* cells with uniform [1] random sampling with replacement is

*N*( log(*N*) + γ )

where γ is Euler’s constant. So if *N* = 2^{20}, the expected number of draws would be over 15 million.

## Related posts

- The coupon collector problem and π
- Collecting a large number of coupons
- Consecutive coupon collector problem

[1] We’re assuming each cell is sampled independently with equal probability each time. Markov chain Monte Carlo is more complicated than that because the draws are *not* independent.

I was wondering about that! I would’ve guessed it would’ve taken longer. I used to have a job as a kid sorting cases of baseball cards into complete sets, but that was 660 entries (fact checked—can’t believe I remembered the number of Topps 1975 baseball cards!). We rediscovered radix sort (sorting by 100s, then by 10s, then by 1s).

P.S. I feel like a star showing up on your blog! I have been reading it since you had a ship on the cover and I was an industrial NLP developer. It was the crowdsourcing problem for NLP that drove me to learn Bayesian stats and MCMC in the first place.

I know this is a simplified example, but whether or not you have to have a coupon collector problem depends on how you define “sample randomly”.

You could sample 2^20 20-tuples from any nonnegative distribution you like, and then permute the signs so that you get one in each quadrant. I would still call that “random, but hitting each quadrant”. It’s just a lower-discrepancy kind of random. And if you take 2^24 samples instead, you can have 16 from each quadrant.