Optimal low-rank matrix approximation

Matrix compression

Suppose you have an m by n matrix A, where m and n are very large, that you’d like to compress. That is, you’d like to come up with an approximation of A that takes less data to describe.

For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix could be a lower resolution representation that takes less space.

I heard a rumor many years ago that space probes would compress images by interpreting an image as a matrix and sending back a few eigenvalues and eigenvectors. That sounded fascinating, but what about images that aren’t square? If a matrix M is not square, then you can’t have Mx = λx for a scalar λ because the left and right sides of the equation would have different dimensions.

Truncated SVD

Approximate a rectangular matrix requires using something more general than eigenvalues and eigenvectors, and that is singular values and singular vectors.

Suppose our matrix A has the singular value decomposition

A = U \Sigma V^*

This can be written as

A = \sum_{i=1}^p \sigma_i u_i v_i^*

where the σi are the singular values and p is the number of singular values that are non-zero. The ui and vi are the ith columns of U and V respectively. Singular values are nonnegative, and listed in decreasing order.

We can form an approximation to A by truncating the sum above. Instead of taking all the singular values, and their corresponding left and right singular vectors, we only take the k largest singular values and their corresponding vectors.

A_k = \sum_{i=1}^k \sigma_i u_i v_i^*

This is called the truncated SVD.

We started by assuming A had dimensions m by n and that both were large. Storing A requires mn numbers. Storing Ak requires only k(1 + mn) numbers. This is a big savings if k is much smaller than m and n.

Eckart-Young theorem

So how good an approximation is Ak ? Turns out it is optimal, in the least squares sense. This is the content of the Eckart-Young theorem. It says that the best least squares (2-norm) approximation of A by a rank k matrix is given by Ak.

Not only that, the theorem says the 2-norm error is given by the first singular value that we didn’t use, i.e.

|| A - A_k ||_2 = \sigma_{k+1}

Related post

Singular value decomposition and pseudoinverse

One thought on “Optimal low-rank matrix approximation


    A_k is optimal not only in the LS sense, but in the sense of any unitarily invariant norm; therefore including nuclear, Schatten or Ky Fan.
    This important extension was proved by Mirsky in 1958 and published in 1968 (“Symmetric Gauge Functions and Unitarily Invariant Norms”).

Leave a Reply

Your email address will not be published. Required fields are marked *