There are many ways to describe the distance between two probability distributions. The previous two posts looked at using the *p*-norm to measure the difference between the PDFs and using Kullbach-Leibler divergence. **Earth mover’s distance** (EMD) is yet another approach.

Imagine a probability distribution on ℝ² as a pile of dirt. Earth mover’s distance measures how different two distributions are by how much work it would take to reshape the pile of dirt representing one distribution into a pile of dirt representing the other distribution. Unlike KL divergence, earth mover’s distance is symmetric, and so it really is a distance. (EMD is a colorful name for what is more formally known as the **Wasserstein metric**.)

The concept of ** t-closeness** in data privacy is based on EMD. Deidentification procedures such as

*k*-anonymity that protect

*individual*privacy may not protect

*group*privacy.

*t*-closeness measures the distribution of values of some attribute in a group and compares this distribution to that of the overall distribution using EMD.

Earth mover’s distance is difficult to compute, or even to rigorously define, when working in several dimensions, but in one dimension it is particularly simple. The 1-Wasserman distance between two probability distributions is simply the 1-norm distance between the corresponding CDFs.

There are *p*-Wasserstein metrics just as there are *p*-norms, but the case *p* = 1 is particularly simple and so we will focus on it for this post.

We can illustrate the univariate Wasserstein metric by returning to a problem in a recent post, namely now to optimally approximate a standard normal by a logistic distribution.

## Logistic distribution example

One of the nice things about the logistic distribution is that its CDF is an elementary function. If *X* is a logistic distribution with mean 0 and scale *s* then the CDF is

The CDF of a normal distribution has no elementary form but can be written in terms of the complementary error function. If *Z* is a standard normal random variable, then

We get a distance of 0.05926 if we use the value of *s* = 0.5513 obtained from moment matching here. The optimal value is *s* = 0.5867, a little smaller than the optimal values of *s* when minimizing the 1, 2, and ∞ norms which were around 0.61.

Related is this paper, “Using image and curve registration for measuring the goodness of fit of spatial and temporal predictions”: http://www.stat.columbia.edu/~gelman/research/published/biom_60_4-13-1.pdf

As its title indicates, that paper is about time series and spatial predictions, not densities. In general, I think densities are an overrated statistical topic; time series seem more common in applications.

In any case, you might be interested in that paper where we measure differences between curves in terms of how much they need to be perturbed to line up. We had a lot of challenges in this work, and we’ve always wanted to pursue it further, but we haven’t.