Low rank matrix approximation using truncated SVD

Matrix compression

Suppose you have an m by n matrix A, where m and n are very large, that you’d like to compress. That is, you’d like to come up with an approximation of A that takes less data to describe.

For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix could be a lower resolution representation that takes less space.

I heard a rumor many years ago that space probes would compress images by interpreting an image as a matrix and sending back a few eigenvalues and eigenvectors. That sounded fascinating, but what about images that aren’t square? If a matrix M is not square, then you can’t have Mx = λx for a scalar λ because the left and right sides of the equation would have different dimensions.

Truncated SVD

Approximate a rectangular matrix requires using something more general than eigenvalues and eigenvectors, and that is singular values and singular vectors.

Suppose our matrix A has the singular value decomposition

$A = U \Sigma V^*$

This can be written as

$A = \sum_{i=1}^p \sigma_i u_i v_i^*$

where the σ_i are the singular values and p is the number of singular values that are non-zero. The u_i and v_i are the ith columns of U and V respectively. Singular values are nonnegative, and listed in decreasing order.

We can form an approximation to A by truncating the sum above. Instead of taking all the singular values, and their corresponding left and right singular vectors, we only take the k largest singular values and their corresponding vectors.

$A_k = \sum_{i=1}^k \sigma_i u_i v_i^*$

This is called the truncated SVD.

We started by assuming A had dimensions m by n and that both were large. Storing A requires mn numbers. Storing A_k requires only k(1 + m + n) numbers. This is a big savings if k is much smaller than m and n.

Eckart-Young theorem

So how good an approximation is A_k ? Turns out it is optimal, in the least squares sense. This is the content of the Eckart-Young theorem. It says that the best least squares (2-norm) approximation of A by a rank k matrix is given by A_k.

Not only that, the theorem says the 2-norm error is given by the first singular value that we didn’t use, i.e.

$|| A - A_k ||_2 = \sigma_{k+1}$

One thought on “Optimal low-rank matrix approximation”

GIUSEPPE A PALEOLOGO

7 May 2018 at 09:06

A_k is optimal not only in the LS sense, but in the sense of any unitarily invariant norm; therefore including nuclear, Schatten or Ky Fan.
This important extension was proved by Mirsky in 1958 and published in 1968 (“Symmetric Gauge Functions and Unitarily Invariant Norms”).

Comments are closed.

Matrix compression

Truncated SVD

Eckart-Young theorem

More linear algebra posts

One thought on “Optimal low-rank matrix approximation”