Don’t invert that matrix

There is hardly ever a good reason to invert a matrix.

What do you do if you need to solve Ax = b where A is an n x n matrix? Isn’t the solution A-1 b? Yes, theoretically. But that doesn’t mean you need to actually find A-1. Solving the equation Ax = b is faster than finding A-1. Books might write the problem as x = A-1 b, but that doesn’t mean they expect you to calculate it that way.

What if you have to solve Ax = b for a lot of different b‘s? Surely then it’s worthwhile to find A-1. No. The first time you solve Ax = b, you factor A and save that factorization. Then when you solve for the next b, the answer comes much faster. (Factorization takes O(n3) operations. But once the matrix is factored, solving Ax = b takes only O(n2) operations. Suppose n = 1,000. This says that once you’ve solved Ax = b for one b, the equation can be solved again for a new b 1,000 times faster than the first one. Buy one get one free.)

What if, against advice, you’ve computed A-1. Now you might as well use it, right? No, you’re still better off solving Ax = b than multiplying by A-1, even if the computation of A-1 came for free. Solving the system is more numerically accurate than the performing the matrix multiplication.

It is common in applications to solve Ax = b even though there’s not enough memory to store A-1. For example, suppose n = 1,000,000 for the matrix A but A has a special sparse structure — say it’s banded — so that all but a few million entries of A are zero.  Then A can easily be stored in memory and Ax = b can be solved very quickly. But in general A-1 would be dense. That is, nearly all of the 1,000,000,000,000 entries of the matrix would be non-zero.  Storing A requires megabytes of memory, but storing A-1 would require terabytes of memory.

Related posts:

Floating point numbers are a leaky abstraction
Ten surprises from numerical linear algebra

Tagged with:
Posted in Math
77 comments on “Don’t invert that matrix
  1. John Moeller says:

    @zortharg : They went over this in comment 11. You invert a matrix and keep it, but it’s the matrix from the Cholesky decomposition, not the original covariance matrix. You should read all the comments before posting your own.

  2. Warwick Dumas says:

    John :

    Thanks. I missed your comment (!) but in the end I ended up doing exactly this, using the band-diagonal LU algorithm in Press et al (not sure to whom they credited it). This for a genuine quintic spline curve not a weighted average. This then worked in practice about 10x slower than just taking derivatives using local quadratic approx, but the improvement was worthwhile in certain respects.


  3. Samuel says:

    I havent read every comment, but I have a simple question: inverting A is not equivalent to solving A x_1=b_1,…,A x_n = b_n for b_1 = (1,0,…,0) , b_2= (0,1, ,0 ,…,0), etc. ??

  4. Dave Webb says:

    Question: What if I need to multiply something by the inverse, instead of just solving linear equations? For example, say I have a positive definite symmetric matrix Sigma. If I need to calculate a multivariate normal density with this matrix as the covariance, I need to do something like (X^T * inverse(Sigma) * X), where X is a column vector. Can I do this w/o explicitly taking the inverse?

  5. John Moeller says:

    @Dave: Yes, you can do that by taking the inverse of the Cholesky decomposition. I.e., decompose as $latex L^TL=Sigma$, solve $latex Ly=x$ for $latex y$ (which is fast because $latex L$ is triangular), and then compute $latex y^Ty$. Or you could invert $latex L$ (which is easy because it’s triangular). This is mentioned in comment 11.

  6. Its clear from the comments above that solving a system of linear equations is NOT best done by finding the inverse of A. So, then what IS the best way to solve them? I assume some of the acronymns above (eg LU) hold the answer. Sorry, I know you guys are obviously light years ahead of me, but in linear algebra I only learned the “invert the matrix” method. Could you point me to some intro sources?

  7. John says:

    The “L” and “U” in LU stand for lower-triangular and upper-triangular. It’s a way of factoring a matrix A into the product of two special matrices L and U.

    As for resources, I’d recommend any of the linear algebra books Ward Cheney has written. His latest book is very expensive, but you could look for a used copy or an earlier edition.

  8. Thanks so much for your quick response, John – appreciate your willingness to help a newbie out! I’ve checked out Ward’s books, and I guess the one titled “linear algebra theory and applications” would be a good place to start and I assume it would address the techniques you are describing. Looks like the 2nd edition just came out, so maybe I can pick the first one up at a discount, since I don’t have $160 at hand right now. :-). Until then ill keep checking out your great blog, as I attempt to make the shift from someone with a pretty good traditional background in stat and research methods to someone like you guys that actually knows how this stuff functions at a deep level – to the point of knowing things like how best to calculate standard deviation (one of your blog entries) – something I always assumed was very cut and dry! Anyway, thanks!

  9. Andrew says:

    Thanks for the advice. I now see why my matlab scrip is more stable than my C equivalent. In matlab I use b/A, but in C I explicitly calculate the inverse.

  10. raequin says:

    I came across this article because my equations for Kalman filtering call for a matrix inversion, and I’m not sure about the invertibility of the matrix sum in question. The following matrix is part of the update step:

    (HPH’ + W)^-1

    where H is my output map, P is the estimate of the error covariance, and W is the covariance matrix for the measurement noise. I gather that there’s no way to be sure of the invertibility of this sum (besides, though I guess that P and W are invertible I know that H for my case does not have full row rank, as m > n). Here is the entire equation for the state update.

    x_{k+1|k+1} = x_{k|k} + PH(HPH’ + W)^-1(y – Hx_{k+1|k})

    Anyway, that’s the background for this comment; I thought my matrix was not guaranteed to be invertible. I’m now looking for tips on how to implement this equation. Comment #28 states that “implementations [of the Kalman filter] never use the inverse,” but yet the first one I looked at does ( Other comments above mention Cholesky decomposition for multiplication tasks, so I’m guessing that might be the approach to take. Though now, after reading a Wikipedia entry, I’m guessing this can’t be the right approach since I don’t think the sum will necessarily be Hermitian, positive definite. What do I use instead of the inverse here?

  11. Warwick Dumas says:

    If the matrix is noninvertible then there is not exactly one solution.
    If it is invertible then you can use LU.
    I expect others more learned will add their comments.

  12. Matt says:

    @raequin : Cholesky decompositions, like LL^T and LDL^T, are not the only choice. For instance, consider the catalogue of matrix decompositions here (note: only LLT and LDLT have (semi)definetness requirements):

  13. raequin says:

    @Warwick There is not expected to be an exact solution to the system. We’ve been using Gauss-Newton (ordinary least squares) to compute the state. I am working on an alternative with Kalman filtering. So, I guess I still have the question of invertibility, whether or not I end up doing that or some other method (such as LU)?

  14. raequin says:

    I still will need to know what to do about the inversion, but it occurs to me that the matrix

    S = (HPH’ + W)

    is not guaranteed to be invertible even if H, P, and W are. I don’t understand how this update equation can be used then (I’m not saying it can’t, I just don’t see how one is to use an update with an inversion that may not be possible at every step).

  15. Warwick Dumas says:

    I would have thought (could be wrong) that for the sum of invertibles (with physical meanings) to be noninvertible implies some kind of special relationship between the summands. Given your physical definitions I don’t see how you’d get that. Degenerate matrices don’t occur by chance.

    If your equations do not have a solution (overdetermined), then you can get the least-squares fit with a pseudoinverse, and I guess there are then similar decompositions for that. I don’t know whether a Kalman filter should lead to insoluble equations.

    When you say m > n, I am taking it that means your observation y is an m-vector but your underlying process x is an n-vector. Correct?

  16. raequin says:

    @Matt I know that a covariance matrix is positive semidefinite, but I’m not sure about a) the characteristics of HPH’, and b) the characteristics of two positive semidefinite matrices.

    @ Warwick You are correct about my meaning for m > n. As per your comment on invertibility, I don’t know that I have a sum of invertibles. HH’ is definitely not invertible, am not sure about HPH’. The current method is to get a least-squares fit with a pseudoinverse.


    I hate to clog the comments section like this, so bear with me. The following excerpt bears directly on my application. It covers my implementation question I started off asking, but I still don’t see how the matrix sum is guaranteed to be symmetric, positive-definite. Any thoughts?

    From _Kalman Filtering; Theory and Practice Using MATLAB_, second edition, by Grewal and Andrews:

    Cholesky decomposition provides an effcient and numerically stable method for solving equations of the form AX = Y when A is a symmetric, positive-definite matrix. The modified Cholesky decomposition is even better, because it avoids taking scalar square roots. It is the recommended method for forming the term

    (HPH’ + R)^-1 H

    in the conventional Kalman filter without explicitly inverting a matrix. That is, if one
    decomposes HPH’ + R as UDU’ , then

    (UDU’)(HPH’ + R)^-1 H = H.

    It then suffices to solve

    UDU’X = H

    for X.

    I’d say that this answers the implementation question I posed. There’s still the matter of invertibility of

    HPH’ + W

    but this page is for alternatives to matrix inversion so I’ll leave off posting any more questions here. Thanks Matt and Warwick!

  17. John Moeller says:

    If P is PSD, then so is HPH’. Just consider y’HPH’y. x’Px is positive for any x, therefore it’s positive for any H’y. Since y’HPH’y is positive for any y, then HPH’ is PSD.

    The sum of two PSD matrices is always PSD (the positive semidefinite matrices form a cone in the vector space of all symmetric matrices; cones are closed under sum). So if P and W are PSD, then so is HPH’ + W. If W is not singular, or if H and P are not singular, then HPH’ + W is invertible.

  18. raequin says:

    @John Moeller Thanks, that’s clear.

  19. Warwick Dumas says:

    OK I could be wrong about this, but a back-of-envelope reckoning indicates that for a vector z,
    z^T HH^T z = sum{columns of H} [ (z.h column j)^2 ]
    So according to that, HH^T simply cannot have a nontrivial kernel, unless H itself is something strange.
    However, I could always be wrong.

    If I am correct that HH^T has to be invertible, then it follows that HPH^T is invertible.

    I don’t see physically why your stepping equation should be insoluble when m > n. In fact I think it should not be insoluble even if m < n, as the Kalman filter is just giving you the best estimate.

  20. Warwick Dumas says:

    Wow someone gave the proper answer.

  21. Yang says:

    “In fact, under reasonable assumptions on how the inverse is computed, x=inv(A)*b is as accurate as the solution computed by the best backward-stable solvers.”

  22. Iray says:

    Hi John,

    Do you know of a Python implementation of the ‘save the factors’ idea that you mentioned in your article (MATLABician is also referring to a similar .. endeavour in Matlab, see comment 14)?

    Can you tell more about the various possible direct methods of inverting matrices as the article mentioned in the comment 71 by Yang seems to tell that apart from the efficiency part, using $A^{-1}$ is not so inaccurate? Anyway, if the rest of your article is still valid, this article questions the penutltimate paragraph.

  23. Seetha Rama Raju Sanapala says:

    A noninvertible A will always give infinite solutions.

  24. h.m. says:

    Suppose you must sole a system Ax=b for a constant A (with respect to time) in a consequence of time steps, as in the unsteady finite element case. Then isn’t it more efficient to compute A-1 once for ever than solving Ax=b in each time step?

  25. John says:

    h.m. It’s more efficient to factor the matrix A once, but not to invert it. If you do invert it, you still have to multiply by that inverse. Back-substitution with the factored matrix takes fewer operations than a matrix-vector multiply.

    Another problem with inverting a matrix is that the inverse of a sparse matrix is generally dense. In finite elements and many other applications, matrices are sparse and benefit from iterative methods that are faster than Gaussian elimination.

  26. h.m. says:

    you said ‘In finite elements and many other applications, matrices are sparse and benefit from iterative methods.’ You mean iterative method for inverting or solving the system is more efficient than LU factorization in these applications?

  27. John says:

    h.m. Yes, absolutely. An LU factorization takes the same amount of time no matter the content of the matrix. But iterative methods, based on matrix-vector products, can run much faster when the matrix is sparse, i.e. largely zeros. If the zeros elements are in a structure that software can exploit, there’s no need to store them or to explicitly multiply by them.

7 Pings/Trackbacks for "Don’t invert that matrix"
  1. [...] This post was mentioned on Twitter by Daniel Lemire, fay. fay said: Don’t invert that matrix [...]

  2. [...] 20, 2010 I was going to, and then I read why I shouln’t invert a matrix.  I’ll need to re-read again this in the future, I’m sure. Posted by Matthew [...]

  3. [...] against that because it is possibly a lot more computationally intensive than solving directly. John Cook’s blog explains more about [...]

  4. [...] are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you [...]

  5. [...] was a very fun problem to solve. You can’t invert a projection matrix and even then you shouldn’t invert matrices. The solution is a simple function that, given a point inside the screen and a depth value, returns [...]

  6. [...] Finalmemte, y con respecto a la inversión de matrices, les recomiendo que lean este artículo Don’t Invert that matrix, del blog de John D. [...]

  7. [...] to design an implementation of LSPI that scales. Using the theory that it is never a good idea to invert a matrix, I’m using a solver for sparse linear systems instead of computing a (dense) pseudoinverse. [...]