Example of not inverting a matrix: optimization

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

If a matrix A is large and sparse, it may be possible to solve Ax = b but impossible to even store the matrix A-1 because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products Ax. Notice that this doesn’t mean you have to store the matrix A; you have to produce the product Ax as if you had stored the matrix A and multiplied it by x.

Very often there are physical reasons why the matrix A is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of A, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute Ax much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.

Iterative methods of solving Ax = b, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products Ax and updating x as a result. These methods might be very useful for a couple reasons.

  1. You only have to form products of a sparse matrix and a vector.
  2. If don’t need a very accurate solution, you may be able to stop very early.

In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.

The function scipy.optimize.fmin_ncg can take as an argument a function fhess that computes the Hessian matrix H of the objective function. But more importantly, it lets you provide instead a function fhess_p that computes the product of the H with a vector. You don’t have to supply the actual Hessian matrix because the fmin_ncg method doesn’t need it. It only needs a way to compute matrix-vector products Hx to find approximate Newton search directions.

For more information, see the SciPy documentation for fmin_ncg.

Tagged with: , ,
Posted in Python
4 comments on “Example of not inverting a matrix: optimization
  1. Rick Wicklin says:

    Excellent points. I have also blogged about the importance of trying to avoid computing an explicit niverse. Some people are surprised when they see a graph that compares the time taken to solve a system by computing an inverse versus the time to solve by LU factorization.

  2. Jan Van lent says:

    A related issue is the accuracy of the matrix inverse method of solving linear systems. See for example the technical report [1].

    [1] “How Accurate is inv(A)*b?”, Alex Druinsky, Sivan Toledo, Technical report, Tel-Aviv University, January 2012, http://arxiv.org/abs/1201.6035

  3. Gregor Gorjanc says:

    What if I really need to get diagonal elements of inverse of A? In stats. these values are used to get standard errors of some linear model estimates.

  4. Konstantin Savenkov says:

    Nice, thanks. Will see if that’ll speed up the ALS approach to matrix factorisation.

1 Pings/Trackbacks for "Example of not inverting a matrix: optimization"
  1. [...] invert that matrix Example of not inverting a matrix Ten surprises in numerical linear [...]