Suppose you have an m by n matrix A, where m and n are very large, that you’d like to compress. That is, you’d like to come up with an approximation of A that takes less data to describe.
For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix could be a lower resolution representation that takes less space.
I heard a rumor many years ago that space probes would compress images by interpreting an image as a matrix and sending back a few eigenvalues and eigenvectors. That sounded fascinating, but what about images that aren’t square? If a matrix M is not square, then you can’t have Mx = λx for a scalar λ because the left and right sides of the equation would have different dimensions.
Approximate a rectangular matrix requires using something more general than eigenvalues and eigenvectors, and that is singular values and singular vectors.
Suppose our matrix A has the singular value decomposition
This can be written as
where the σi are the singular values and p is the number of singular values that are non-zero. The ui and vi are the ith columns of U and V respectively. Singular values are nonnegative, and listed in decreasing order.
We can form an approximation to A by truncating the sum above. Instead of taking all the singular values, and their corresponding left and right singular vectors, we only take the k largest singular values and their corresponding vectors.
This is called the truncated SVD.
We started by assuming A had dimensions m by n and that both were large. Storing A requires mn numbers. Storing Ak requires only k(1 + m + n) numbers. This is a big savings if k is much smaller than m and n.
So how good an approximation is Ak ? Turns out it is optimal, in the least squares sense. This is the content of the Eckart-Young theorem. It says that the best least squares (2-norm) approximation of A by a rank k matrix is given by Ak.
Not only that, the theorem says the 2-norm error is given by the first singular value that we didn’t use, i.e.
One thought on “Optimal low-rank matrix approximation”
A_k is optimal not only in the LS sense, but in the sense of any unitarily invariant norm; therefore including nuclear, Schatten or Ky Fan.
This important extension was proved by Mirsky in 1958 and published in 1968 (“Symmetric Gauge Functions and Unitarily Invariant Norms”).