Turning K-L divergence into a metric

Kullback-Leibler divergence

Kullback-Leibler divergence is defined for two random variables X and Y by

D_{KL}(X || Y) = -\int f_X(x) \log \frac{f_Y(x)}{f_X(x)} \, dx

K-L divergence is non-negative, and it’s zero if and only if X and Y have the same distribution. But it is not a metric, for reasons explained here. For one thing, it’s not symmetric.

Jeffreys divergence

We can fix the symmetry problem by defining

J(X, Y) = D_{KL}(X || Y)  + D_{KL}(Y || X)

The J above stands for Jeffreys, for Harold Jeffreys. J is called either the symmetrized K-L divergence or Jeffreys’ divergence. It’s still a divergence, not a distance.

A distance (metric) d has to have four properties:

  1. d(x, x) = 0
  2. d(xy) > 0 if xy
  3. d(xy) = d(yx)
  4. d(xz) ≤ d(xy) + d(yz)

K-L divergence satisfies the first two properties. Jeffreys’ divergence satisfies the first three, but not the last one, the triangle inequality.

To show that J doesn’t satisfy the triangle inequality, let X, Y, and Z be Bernoulli random variables with p equal to 0.1, 0.2, and 0.3 respectively. Then the following Python code shows that the divergence from X to Y, plus the divergence from Y to Z, is less than the divergence from X to Z. This would be like saying you could get from LA to NYC faster by having a layover in Denver rather than taking a direct flight.

from math import log

kl = lambda p, q: p*log(p/q) + (1-p)*log((1-p)/(1-q))
j  = lambda p, q: kl(p, q) + kl(q, p)

a = j(0.1, 0.2)
b = j(0.2, 0.3)
c = j(0.1, 0.3)
print(a + b, c)

This prints 0.135 and 0.270.

Jensen-Shannon distance

Jensen-Shannon distance turns K-L divergence into a metric as follows. First, define the random variable M to be the average of X and Y. Then average the K-L divergence from M to each of X and Y. This defines the Jensen-Shannon divergence. It’s still not a metric, but it’s square root is, which defines the Jensen-Shannon distance.

\begin{align*} M &= (X + Y)/2 \\ \text{JSD}(X, Y) &= \tfrac{1}{2}D_{KL}(X || M) + \tfrac{1}{2}D_{KL}(Y || M) \\ d_{JS}(X, Y) &= \sqrt{\text{JSD}(X, Y)} \end{align*}

The following code gives an example of Jensen-Shannon distance satisfying the triangle inequality.

def d(p, q):
    m = 0.5*(p + q)
    jsd = 0.5*kl(p, m) + 0.5*kl(q, m) 
    return jsd**0.5

a = d(0.1, 0.2)
b = d(0.2, 0.3)
c = d(0.1, 0.3)
print(a + b, c)

This prints 0.1817 and 0.1801. Now a layover makes the trip longer.

Leave a Reply

Your email address will not be published. Required fields are marked *