Sam Walters posted an elegant theorem on his Twitter account this morning. The theorem follows the pattern of an equality for linear functions generalizing to an inequality for convex functions. We’ll give a little background, state the theorem, and show an example application.
Let A be a real symmetric n×n matrix, or more generally a complex n×n Hermitian matrix, with entries aij. Note that the diagonal elements aii are real numbers even if some of the other entries are complex. (A Hermitian matrix equals its conjugate transpose, which means the elements on the diagonal equal their own conjugate.)
A general theorem says that A has n eigenvalues. Denote these eigenvalues λ1, λ2, …, λn.
It is well known that the sum of the diagonal elements of A equals the sum of its eigenvalues.
We could trivially generalize this to say that for any linear function φ: R → R,
because we could pull any shifting and scaling constants out of the sum.
The theorem Sam Walters posted says that the equality above extends to an inequality if φ is convex.
Here’s an application of this theorem. Assume the eigenvalues of A are all positive and let φ(x) = – log(x). Then φ is convex, and
and so
i.e. the product of the diagonals of A is an upper bound on the determinant of A.
This post illustrates two general principles:
- Linear equalities often generalize to convex inequalities.
- When you hear a new theorem about convex functions, see what it says about exp or −log.
John, do you have a reference for this theorem?
Only Sam Walter’s tweet. But it might not be hard to prove starting from the fact that A can be diagonalized with orthogonal matrices.
This is really cool! Then if any diagonal element of a full-rank matrix is 0, its determinant is negative. Powerful theorem indeed.
I don’t know about that. The derivation started by assuming all the eigenvalues are positive so we could take logs.
The proof seems mildly interesting too. It follows from two facts:
– If f is convex and B is doubly-stochastic, then sum_i f((Bx)_i) <= sum_i f(x_i).
– If we write H = U Diag(lambda) U* for unitary U, then diag(H) = (U o conj(U)) lambda. ('o' denotes Hadamard product.)
Sorry, I also meant to say that (U o conj(U)) is doubly-stochastic.
The final result is Hadamard’s inequality isn’t it?