Kullback-Leibler KL divergence between two normal rvs

The previous post looked at the best approximation to a normal density by normal density with a different mean. Dan Piponi suggested in the comments that it would be good to look at the Kullback-Leibler (KL) divergence.

The previous post looked at the difference from between two densities from an analytic perspective, solving the problem that an analyst would find natural. This post takes an information theoretic perspective. Just is p-norms are natural in analysis, KL divergence is natural in information theory.

The Kullback-Leibler divergence between two random variables X and Y is defined as

$KL(X || Y) = -\int f_X(x) \log \frac{f_Y(x)}{f_X(x)} \, dx$

There are many ways to interpret KL(X || Y), such as the average surprise in seeing Y when you expected X.

Unlike the p-norm distance, the KL divergence between two normal random variables can be computed in closed form.

Let X be a normal random variable with mean μ_X and variance σ²_X and Y a normal random variable with mean μ_Y and variance σ²_Y. Then

$KL(X || Y) = \log\frac{\sigma_Y}{\sigma_X} + \frac{\sigma_X^2 + (\mu_X - \mu_Y)^2}{2\sigma_Y^2} - \frac{1}{2}$

If μ_X = 0 and σ_X = 1, then for fixed μ_Y the value of σ²_Y that minimizes KL(X || Y) is

$\sigma_Y^2 = 1 + \mu_Y^2$

KL divergence is not symmetric, hence we say divergence rather than distance. More on that here. If we want to solve the opposite problem, minimizing KL(X || Y), the optimal value of σ²_Y is simply 1.