There are many ways to describe the distance between two probability distributions. The previous two posts looked at using the p-norm to measure the difference between the PDFs and using Kullbach-Leibler divergence. Earth mover’s distance (EMD) is yet another approach.
Imagine a probability distribution on ℝ² as a pile of dirt. Earth mover’s distance measures how different two distributions are by how much work it would take to reshape the pile of dirt representing one distribution into a pile of dirt representing the other distribution. Unlike KL divergence, earth mover’s distance is symmetric, and so it really is a distance. (EMD is a colorful name for what is more formally known as the Wasserstein metric.)
The concept of t-closeness in data privacy is based on EMD. Deidentification procedures such as k-anonymity that protect individual privacy may not protect group privacy. t-closeness measures the distribution of values of some attribute in a group and compares this distribution to that of the overall distribution using EMD.
Earth mover’s distance is difficult to compute, or even to rigorously define, when working in several dimensions, but in one dimension it is particularly simple. The 1-Wasserman distance between two probability distributions is simply the 1-norm distance between the corresponding CDFs.
There are p-Wasserstein metrics just as there are p-norms, but the case p = 1 is particularly simple and so we will focus on it for this post.
We can illustrate the univariate Wasserstein metric by returning to a problem in a recent post, namely now to optimally approximate a standard normal by a logistic distribution.
Logistic distribution example
One of the nice things about the logistic distribution is that its CDF is an elementary function. If X is a logistic distribution with mean 0 and scale s then the CDF is
The CDF of a normal distribution has no elementary form but can be written in terms of the complementary error function. If Z is a standard normal random variable, then
We get a distance of 0.05926 if we use the value of s = 0.5513 obtained from moment matching here. The optimal value is s = 0.5867, a little smaller than the optimal values of s when minimizing the 1, 2, and ∞ norms which were around 0.61.