Example of not inverting a matrix: optimization

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

If a matrix A is large and sparse, it may be possible to solve Ax = b but impossible to even store the matrix A⁻¹ because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products Ax. Notice that this doesn’t mean you have to store the matrix A; you have to produce the product Ax as if you had stored the matrix A and multiplied it by x.

Very often there are physical reasons why the matrix A is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of A, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute Ax much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.

Iterative methods of solving Ax = b, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products Ax and updating x as a result. These methods might be very useful for a couple reasons.

You only have to form products of a sparse matrix and a vector.
If don’t need a very accurate solution, you may be able to stop very early.

In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.

The function scipy.optimize.fmin_ncg can take as an argument a function fhess that computes the Hessian matrix H of the objective function. But more importantly, it lets you provide instead a function fhess_p that computes the product of the H with a vector. You don’t have to supply the actual Hessian matrix because the fmin_ncg method doesn’t need it. It only needs a way to compute matrix-vector products Hx to find approximate Newton search directions.

For more information, see the SciPy documentation for fmin_ncg.

More: Applied linear algebra

5 thoughts on “Example of not inverting a matrix: optimization”

Rick Wicklin

8 February 2012 at 13:39

Excellent points. I have also blogged about the importance of trying to avoid computing an explicit niverse. Some people are surprised when they see a graph that compares the time taken to solve a system by computing an inverse versus the time to solve by LU factorization.
Jan Van lent

9 February 2012 at 12:51

A related issue is the accuracy of the matrix inverse method of solving linear systems. See for example the technical report [1].

[1] “How Accurate is inv(A)*b?”, Alex Druinsky, Sivan Toledo, Technical report, Tel-Aviv University, January 2012, http://arxiv.org/abs/1201.6035
Gregor Gorjanc

17 February 2012 at 17:08

What if I really need to get diagonal elements of inverse of A? In stats. these values are used to get standard errors of some linear model estimates.
Konstantin Savenkov

30 October 2012 at 10:48

Nice, thanks. Will see if that’ll speed up the ALS approach to matrix factorisation.
Max

11 January 2020 at 07:10

Gregor, are you sure you need the the individual values? It might be the trace (sum) or just the largest values. Often also rather the eigenvalues then the diagonal values. The sums are the same since basis independent so you can wlog consider the matrix in diagonalized (at least Jordan) form. The largest eigenvalues are also computed iteratively using power method or the related Lancosz method (which also doesn’t need the matrix but only the product- with-vector function and can be very fast for sparse matrices)

Comments are closed.