Bob Carpenter wrote today about how Markov chains cannot thoroughly cover high-dimensional spaces, and that they do not need to. That’s kinda the point of Monte Carlo integration. If you want systematic coverage, you can/must sample systematically, and that’s impractical in high dimensions.
Bob gives the example that if you want to get one integration point in each quadrant of a 20-dimensional space, you need a million samples. (220 to be precise.) But you may be able to get sufficiently accurate results with a lot less than a million samples.
If you wanted to be certain to have one sample in each quadrant, you could sample at (±1, ±1, ±1, …, ±1). But if for some weird reason you wanted to sample randomly and hit each quadrant, you have a coupon collector problem. The expected number of samples to hit each of N cells with uniform [1] random sampling with replacement is
N( log(N) + γ )
where γ is Euler’s constant. So if N = 220, the expected number of draws would be over 15 million.
Related posts
- The coupon collector problem and π
- Collecting a large number of coupons
- Consecutive coupon collector problem
[1] We’re assuming each cell is sampled independently with equal probability each time. Markov chain Monte Carlo is more complicated than that because the draws are not independent.
I was wondering about that! I would’ve guessed it would’ve taken longer. I used to have a job as a kid sorting cases of baseball cards into complete sets, but that was 660 entries (fact checked—can’t believe I remembered the number of Topps 1975 baseball cards!). We rediscovered radix sort (sorting by 100s, then by 10s, then by 1s).
P.S. I feel like a star showing up on your blog! I have been reading it since you had a ship on the cover and I was an industrial NLP developer. It was the crowdsourcing problem for NLP that drove me to learn Bayesian stats and MCMC in the first place.
I know this is a simplified example, but whether or not you have to have a coupon collector problem depends on how you define “sample randomly”.
You could sample 2^20 20-tuples from any nonnegative distribution you like, and then permute the signs so that you get one in each quadrant. I would still call that “random, but hitting each quadrant”. It’s just a lower-discrepancy kind of random. And if you take 2^24 samples instead, you can have 16 from each quadrant.