People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form *Ax* = *b*, but in practice you almost never do this by inverting *A*. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function `fmin_ncg`

.

If a matrix *A* is large and sparse, it may be possible to solve *Ax* = *b* but impossible to even store the matrix *A*^{−1} because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products *Ax*. Notice that this doesn’t mean you have to store the matrix *A*; you have to produce the product *Ax* as if you had stored the matrix *A* and multiplied it by *x*.

Very often there are physical reasons why the matrix *A* is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of *A*, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute *Ax* much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.

Iterative methods of solving *Ax* = *b*, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products *Ax* and updating *x* as a result. These methods might be very useful for a couple reasons.

- You only have to form products of a sparse matrix and a vector.
- If don’t need a very accurate solution, you may be able to stop very early.

In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.

The function `scipy.optimize.fmin_ncg`

can take as an argument a function `fhess`

that computes the Hessian matrix *H* of the objective function. But more importantly, it lets you provide instead a function `fhess_p`

that computes the product of the *H* with a vector. You don’t have to supply the actual Hessian matrix because the `fmin_ncg`

method doesn’t need it. It only needs a way to compute matrix-vector products *Hx* to find approximate Newton search directions.

For more information, see the SciPy documentation for `fmin_ncg`

.

**More**: Applied linear algebra

Excellent points. I have also blogged about the importance of trying to avoid computing an explicit niverse. Some people are surprised when they see a graph that compares the time taken to solve a system by computing an inverse versus the time to solve by LU factorization.

A related issue is the accuracy of the matrix inverse method of solving linear systems. See for example the technical report [1].

[1] “How Accurate is inv(A)*b?”, Alex Druinsky, Sivan Toledo, Technical report, Tel-Aviv University, January 2012, http://arxiv.org/abs/1201.6035

What if I really need to get diagonal elements of inverse of A? In stats. these values are used to get standard errors of some linear model estimates.

Nice, thanks. Will see if that’ll speed up the ALS approach to matrix factorisation.

Gregor, are you sure you need the the individual values? It might be the trace (sum) or just the largest values. Often also rather the eigenvalues then the diagonal values. The sums are the same since basis independent so you can wlog consider the matrix in diagonalized (at least Jordan) form. The largest eigenvalues are also computed iteratively using power method or the related Lancosz method (which also doesn’t need the matrix but only the product- with-vector function and can be very fast for sparse matrices)