People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function
If a matrix A is large and sparse, it may be possible to solve Ax = b but impossible to even store the matrix A-1 because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products Ax. Notice that this doesn’t mean you have to store the matrix A; you have to produce the product Ax as if you had stored the matrix A and multiplied it by x.
Very often there are physical reasons why the matrix A is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of A, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute Ax much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.
Iterative methods of solving Ax = b, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products Ax and updating x as a result. These methods might be very useful for a couple reasons.
- You only have to form products of a sparse matrix and a vector.
- If don’t need a very accurate solution, you may be able to stop very early.
In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.
scipy.optimize.fmin_ncg can take as an argument a function
fhess that computes the Hessian matrix H of the objective function. But more importantly, it lets you provide instead a function
fhess_p that computes the product of the H with a vector. You don’t have to supply the actual Hessian matrix because the
fmin_ncg method doesn’t need it. It only needs a way to compute matrix-vector products Hx to find approximate Newton search directions.
For more information, see the SciPy documentation for