Convex function of diagonals and eigenvalues

Sam Walters posted an elegant theorem on his Twitter account this morning. The theorem follows the pattern of an equality for linear functions generalizing to an inequality for convex functions. We’ll give a little background, state the theorem, and show an example application.

Let A be a real symmetric n×n matrix, or more generally a complex n×n Hermitian matrix, with entries aij. Note that the diagonal elements aii are real numbers even if some of the other entries are complex. (A Hermitian matrix equals its conjugate transpose, which means the elements on the diagonal equal their own conjugate.)

A general theorem says that A has n eigenvalues. Denote these eigenvalues λ1, λ2, …, λn.

It is well known that the sum of the diagonal elements of A equals the sum of its eigenvalues.

\sum_{i=1}^n a_{ii} = \sum_{i=1}^n \lambda_i

We could trivially generalize this to say that for any linear function φ: RR,

\sum_{i=1}^n \varphi(a_{ii}) = \sum_{i=1}^n \varphi({\lambda_i})

because we could pull any shifting and scaling constants out of the sum.

The theorem Sam Walters posted says that the equality above extends to an inequality if φ is convex.

\sum_{i=1}^n \varphi(a_{ii}) \leq \sum_{i=1}^n \varphi({\lambda_i})

Here’s an application of this theorem. Assume the eigenvalues of A are all positive and let φ(x) = – log(x). Then φ is convex, and

-\sum_{i=1}^n \log(a_{ii}) \leq -\sum_{i=1}^n \log({\lambda_i})

and so

\prod_{i=1}^n a_{ii} \geq \prod_{i=1}^n \lambda_i = \det(A)

i.e. the product of the diagonals of A is an upper bound on the determinant of A.

This post illustrates two general principles:

  1. Linear equalities often generalize to convex inequalities.
  2. When you hear a new theorem about convex functions, see what it says about exp or -log.

More linear algebra posts

7 thoughts on “Convex function of diagonals and eigenvalues

  1. Only Sam Walter’s tweet. But it might not be hard to prove starting from the fact that A can be diagonalized with orthogonal matrices.

  2. I don’t know about that. The derivation started by assuming all the eigenvalues are positive so we could take logs.

  3. The proof seems mildly interesting too. It follows from two facts:

    – If f is convex and B is doubly-stochastic, then sum_i f((Bx)_i) <= sum_i f(x_i).

    – If we write H = U Diag(lambda) U* for unitary U, then diag(H) = (U o conj(U)) lambda. ('o' denotes Hadamard product.)

Comments are closed.