There is hardly ever a good reason to invert a matrix.
What do you do if you need to solve Ax = b where A is an n x n matrix? Isn’t the solution A-1 b? Yes, theoretically. But that doesn’t mean you need to actually find A-1. Solving the equation Ax = b is faster than finding A-1. Books might write the problem as x = A-1 b, but that doesn’t mean they expect you to calculate it that way.
What if you have to solve Ax = b for a lot of different b’s? Surely then it’s worthwhile to find A-1. No. The first time you solve Ax = b, you factor A and save that factorization. Then when you solve for the next b, the answer comes much faster. (Factorization takes O(n3) operations. But once the matrix is factored, solving Ax = b takes only O(n2) operations. Suppose n = 1,000. This says that once you’ve solved Ax = b for one b, the equation can be solved again for a new b 1,000 times faster than the first one. Buy one get one free.)
What if, against advice, you’ve computed A-1. Now you might as well use it, right? No, you’re still better off solving Ax = b than multiplying by A-1, even if the computation of A-1 came for free. Solving the system is more numerically accurate than the performing the matrix multiplication.
It is common in applications to solve Ax = b even though there’s not enough memory to store A-1. For example, suppose n = 1,000,000 for the matrix A but A has a special sparse structure — say it’s banded — so that all but a few million entries of A are zero. Then A can easily be stored in memory and Ax = b can be solved very quickly. But in general A-1 would be dense. That is, nearly all of the 1,000,000,000,000 entries of the matrix would be non-zero. Storing A requires megabytes of memory, but storing A-1 would require terabytes of memory.
Related posts:
Floating point numbers are a leaky abstraction
Ten surprises from numerical linear algebra

{ 2 trackbacks }
{ 35 comments… read them below or add one }
Jason Adams 01.19.10 at 07:08
Good advice. Can you think of a case where it would be a good idea to solve A^-1?
Daniel Lemire 01.19.10 at 07:27
Great post. But you missed the best part: Ax = b may have a (unique) solution even when A is not invertible!
Anyhow, my experience has been that once you are down to inverting matrices, things have gone badly for you. Better rethink your models.
Peter Turney 01.19.10 at 07:39
If you want to calculate a projection matrix (e.g., for linear regression), you may need to do some inverting (http://en.wikipedia.org/wiki/Projection_matrix). For numerical stability, I often use the Moore-Penrose pseudoinverse (http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse) via SVD. If the original matrix is sparse, you may be able to truncate the SVD and store the component matrices (U, Sigma, V) in relatively little space. Most math packages (e.g., MATLAB) come with SVD, so it’s easy to do this.
Stuart Buck 01.19.10 at 07:54
Is it possible to do econometrics w/o inverting matrices? (That is, w/o the software doing it when using estimators that are presented theoretically as involving inversion).
John 01.19.10 at 08:12
@Stuart What would you do with the inverse once you had it? If you’re going to multiply a matrix by it, then just solve the system.
To put it another way, pretend you have a software object
AInverse. You can’t access the inverse directly, butAInversehas methods on it to let you do whatever you want:MultiplyByVector,GetDeterminant, etc. But when you look insideAInverse, there’s no inverse inside, just code that acts as if there were an inverse inside.Robert Talbert 01.19.10 at 08:23
There are some applications of matrix multiplication to cryptosystems (the classical example is the Hill cipher; more modern application is AES/Rijndael). In those settings the inverse of a matrix is actually a useful object — namely it’s the decryption key for the cipher. So it’s something that a user of the system would want to compute and have lying around.
A mitigating factor here is that the linear algebra is usually done over a finite field, so the inverse computations could take significantly less time.
Gene Harris 01.19.10 at 08:24
John, in molecular orbital theory, one of the key steps is S(-1) x E x S where E is the eigenvector matrix and S is the overlapped orbital matrix to test for an orthogonal set. I haven’t dealt with the math in years, but I notice the technique still seems to be used in the public domain(?) program MINDO. I wouldn’t have a clue how to attack this problem without inverting the matrix S.
John 01.19.10 at 09:26
@Robert: I assume that since encryption works over a finite field, the inverse is known exactly and accuracy isn’t an issue. But I wonder whether ideas from numerical linear algebra could make cryptosystems more efficient. Maybe deep down in code somewhere there is an LU factorization that you don’t see in the theory.
@Jason et al: I’m not saying there’s never a reason to invert a matrix. There may be, but I don’t think I’ve never needed to (except on a homework assignment).
@Gene: In your example, you multiply S-1 by the matrix ES. You could solve Sx = b for each column b of the matrix ES. I’m not familiar with your particular problem, but I wouldn’t be surprised if there’s some special structure you could exploit to do even better.
Roger 01.19.10 at 11:13
The only thing I can think of is getting an asymptotic covariance matrix from a likelihood optimization (i.e. for doing subsequent inference). Usually, that involves inverting the Hessian. Or maybe there’s a better way for that too?
John Moeller 01.19.10 at 11:31
I know that MATLAB has a right division and left division operations. IIRC they solve several linear systems instead of inverting and multiplying.
Michael Turmon 01.19.10 at 12:00
In the next to last paragraph (”Now you might as well use it, right?”…) there’s some advice that’s not always right IMHO. If you have to do a lot of calculations of A^{-1} b for many different b’s, the invert-and-multiply approach, although not numerically as accurate as back-substitution, will be much faster (all inner products, no divisions). You often don’t care about the ultimate in accuracy, or know A is well-conditioned anyway.
This is the typical situation in computations of Gaussian probabilities, which require x’ A^{-1} x for a positive-definite A. (You don’t use vanilla inverse in this case…you use cholesky decomposition followed by inverse, but the principle is the same.)
John 01.19.10 at 12:56
@Michael: Good point about inner products. Both matrix multiplication and back-substitution require O(n^2) operations, but with a large matrix the multiplication could be faster because it uses memory more efficiently etc.
Michael Nielsen 01.19.10 at 13:07
Computing the personalized PageRank (but not the standard PageRank) is often best done using matrix inversion.
MATLABician 01.19.10 at 13:25
If the matrix inverse is in any way going to be used for “solving” a system of linear equations, then it’s not accurate, stable, or efficient to solve the system using the inverse matrix; the system can be solved using a variety of direct (say LU, QR, SVD or Cholesky), semidirect (say, conjugate gradients), or iterative (say multigrid or overrelaxation) methods depending on the type of the problem (consistent linear system, least squares, etc.), special properties of the matrix involved (symmetric, triangular, Toeplitz, bi-diagonal, positive definite, etc.), size of the matrix, frequency with which the matrix is updated (time invariant problem, time dependent dependent problem, such as system identification/control problems, etc.), stability vs. efficiency considerations, among other things. (In MATLAB, an excellent implementation for direct methods is Tim Davis’s factorize available on FEx, Don’t let that INV go past your eyes; to solve that system, FACTORIZE!.)
There are, however, rare cases that the inverse matrix itself is needed. One such cases is calculating the standard errors of the estimates, in an identification problem, based on the inverse of the Fisher information matrix. Another is computing the coefficients of an exponential time differencing (ETD) Runge-Kutta scheme via complex contour quadrature—to avoid catastrophic cancellation due to roundoff in the operator $latex (.)^{-1}(e^{(.)} – 1)$ or its higher order variants—when the spatial discretization operator is nondiagonal. You may read more about this latter case here: Complex Magic, Part 3: Function Evaluation via Integration.
Michael Bubb 01.19.10 at 15:54
pickiness…
First line should be “hardly ever”
Michael Bubb 01.19.10 at 15:54
Nice article – I meant to add
Fasih 01.19.10 at 17:14
We use inv(Fisher information matrix) for Cramer-Rao statistical bounds on probabilistic estimators all the time. We usually look at the condition number of the matrix to make sure it’s invertible first (the SVD of a singular FIM can give us the basis of the null space, that is, the combinations of the columns of the FIM that yield zero and thus prevent inversion).
John 01.19.10 at 17:28
Once you have the inverse of the Fisher information matrix (or any other matrix) what do you do with it?
Multiply it by something? Then you could use back-substitution.
Find its determinant? You could find the determinant of the original matrix and take it’s reciprocal.
Find its eigenvalues or eigenvectors? A and A-1 have the same eigenvectors, and the eigenvalues of A-1 are the reciprocals of the eigenvalues of A.
If an application needs A-1, it must do something unusual with it because the usual operations can be carried out other ways.
Matt 01.19.10 at 17:46
In econometrics sometimes one needs so-called sandwich estimators, which involve matrix terms like A^{-1} B A^{-1}, where A and B are square (n x n) dense matrices. Is there any way to completely avoid the matrix inversion here?
John 01.19.10 at 17:49
@Matt: What do you do with the matrix A^{-1} B A^{-1} once you’ve got it? Maybe there’s a way to accomplish that task without computing A^{-1}.
Matt 01.19.10 at 17:53
@John: eventually I access the elements on the diagonals of the resulting matrix (and perform simple ops, like taking square root) — those are used directly (e.g. as standard errors).
foobar 01.19.10 at 17:53
Is there a way to quantify how much worse solving using the inverse is? I have been challenged in the past and could not provide evidence that it was really not a good idea. If I remember correctly I don’t think I could even find this “common knowledge” in Golub and van Loan, never mind a comparative analysis.
John 01.19.10 at 18:46
@Matt: That’s interesting. I’d never seen that problem. Are your A and B matrices large? Sparse?
Here’s one way you could find the ith column of A^{-1} B A^{-1} without computing A^{-1}. This approach may not be a good idea in the context of your problem. It’s just an idea.
Define x_i = A^{-1} B A^{-1} e_i where e_i is the column vector with a “1″ in the ith position and zeros everywhere else. Note that x_i is the ith column of A^{-1} B A^{-1}. Then A x_i = B (A^{-1} e_i). Compute A^{-1} e_i by solving Ax = e_i, then multiply the result on the left by B, then solve for x_i. That would let you find one column of A^{-1} B A^{-1} without having to compute the entire matrix. This would be a win if you don’t need the entire matrix, but perhaps not if you do need the entire matrix. It could also be a win if A has a special structure.
John 01.19.10 at 19:15
@foobar: I think Applied Numerical Linear Algebra by James Demmel would have the theorem you’re looking for. Unfortunately I don’t have the book with me so I can’t be sure.
Matt 01.19.10 at 19:23
@John: Thanks for the post. Unfortunately, I need all of the diagonal elements. All of the matrices involved are dense, however, in the end I’m not interested in the off-diagonal entries of the resulting matrix and would be happy to disregard them (and save some computation time!). Still, haven’t found any interesting/smart way to solve that.
Currently I’m even explicitly calculating A^{-1} (even though doing explicit inverses is a no-no from a numerical analysis’ point of view…) and pre- & postmultiplying B. I suppose I could use LU factorization to calculate (A^{-1}B) and then postmultiply by (this time explicitly obtained) A^{-1} but I’m skeptical regarding the gains in doing so. Still, I still have this subconscious feeling that I’m doing this in a sub-optimal way, especially since in the end I only need the diagonal entries…
Gene Harris 01.19.10 at 20:51
It turned out to be somewhat more complicated, after reviewing the circa 1970 FORTRAN code. The term is S-1/2CS1/2. I’m not sure if this makes any difference to your thesis/assertion. I just powered through the code way back in 1970. I did run it by Chuck Katholi, my numerical analysis professor and we made some changes to search for the largest off diagonal elements to force convergence a little faster.
Francisco Gimenez 01.20.10 at 00:20
What about using a Kalman Filter, where you need to invert potential large matrices to do updates?
MATLABician 01.20.10 at 06:08
@Francisco Gimenez
The $A^T A$ involved in Kalman filter is a block tridiagonal matrix. At every step, the lower right block of this matrix is changed. Although inverse matrix appears in the Kalman filter formulas, implementations never use the inverse, as that would be unstable and inefficient. Instead, the factorization $A^T A = LDL^T$ is updated—hence the name “square root” filter. (To be exact, $LDL^T$ approach is not used either. QR decomposition is used to orthogonalize the columns of A to achieve more stability.)
Stuart Buck 01.20.10 at 14:06
What would you do with the inverse once you had it? If you’re going to multiply a matrix by it, then just solve the system.
I wouldn’t do anything with the inverse; I’m certainly not inverting large matrices by hand. I’m just asking whether econometrics software (SAS, STATA, etc.) is actually calculating all of the matrix inversions that are seemingly required by econometric theory?
alfC 01.25.10 at 21:53
@Jason Adams, I am wondering whether it makes sense to HAVE the inverse if the matrix involved is small (let’s say 3×3 or 4×4) and you need to apply it many times.
Could it be? I don’t know. Maybe John C. knows the answer.
Eamon Nerbonne 01.28.10 at 04:08
From a completely pragmatic point of view, many software systems expect linear transformations in the form of a matrix. To be able to reuse such components without (complicated, time-consuming, possibly license-violating) modification, you’d expect that (when sufficiently accurate), an inverse would be just fine for lots of applications.
Perhaps it’s not a perfect solution, but isn’t it said that the perfect is the enemy of the good?
Not that this is really the kind of reason you were looking for, I suppose…
John 01.28.10 at 04:13
@Eamon, you give a very good reason. If you need the inverse, not for your own person use, but in order to hand other software expecting a matrix as input, then you need to invert the matrix.
Regis 02.04.10 at 05:04
Very interesting. Do you know any method to invert matrix for using in GRID computing? For example, a method that split a larger matrix in smallers ones. Each smaller matrix would be processed by a node and, at the final, deliver the results to the main node that would joins that pieces and deliver the final result (something like that – sorry my english…).
Ganesh Swami 02.04.10 at 11:47
John: Big fan of your articles, and I always learn something new. Thanks.
Frequently one comes across cases where the matrix has some special structure. A common example is matrices belonging to the orthogonal group, where the inverse is simply the transpose. You don’t even have to compute/store the transpose as libraries such as BLAS take in a parameter that specifies whether you want to use the transpose. Similarly, in numpy you can use matrix.T to create a view.
Stephen 02.16.10 at 23:09
I think Stuart and Fasih’s points are good ones, that for calculating the standard errors in statistics often requires inverting a matrix. If we want to find the least squares fit for the overdetermined system Y=XB, then B=(X’X)^-1X’Y
To calculate the standard error for our estimate of B, we need to know the diagonal elements of (X’X)^-1.
This same basic problem comes up in maximum likelihood estimation. The standard errors of the coefficient estimates come from the inverse of the Hessian matrix for the optimization problem.
http://en.wikipedia.org/wiki/Cram%C3%A9r-Rao_inequality