In my previous post on approximating a logistic distribution with a normal distribution I accidentally said something about approximating a normal with a normal.

Obviously the best approximation to a probability distribution is itself. As Norbert Wiener said “The best material model of a cat is another, or preferably the same, cat.”

But this made me think of the following problem. Let *f* be the density function of a standard normal random variable, i.e. one with mean zero and standard deviation 1. Let *g* be the density function of a normal random variable with mean μ > 0 and standard deviation σ.

For what value of σ does *g* best approximate *f*? Is it simply σ = 1? Does it depend on μ?

I looked at the 1, 2, and ∞ norms, and in each case the optimal value of σ is not 1, and the optimal value does depend on μ. When μ is close to 0, σ is close to 1, as you’d probably expect. But for larger μ the results are surprising.

For the 1-norm and 2-norm, the optimal value of σ increases with μ and reaches a maximum of 2, then remains constant.

For the ∞ norm, the optimal value of σ increases briefly then decreases.

Maybe it’s more appropriate to use the KL-divergence (and consider both ways).