There is hardly ever a good reason to invert a matrix.

What do you do if you need to solve *Ax* = *b* where *A* is an *n* x *n* matrix? Isn’t the solution *A*^{-1} *b*? Yes, theoretically. But that doesn’t mean you need to actually find *A*^{-1}. Solving the equation *Ax* = *b* is faster than finding *A*^{-1}. Books might write the problem as *x* = A^{-1} *b*, but that doesn’t mean they expect you to calculate it that way.

What if you have to solve *Ax* = *b* for a lot of different *b*‘s? Surely then it’s worthwhile to find *A*^{-1}. No. The first time you solve *Ax* = *b*, you factor *A* and save that factorization. Then when you solve for the next *b*, the answer comes much faster. (Factorization takes O(*n*^{3}) operations. But once the matrix is factored, solving *Ax* = *b* takes only O(*n*^{2}) operations. Suppose* n* = 1,000. This says that once you’ve solved *Ax* = *b* for one *b*, the equation can be solved again for a new *b* 1,000 times faster than the first one. Buy one get one free.)

What if, against advice, you’ve computed *A*^{-1}. Now you might as well use it, right? No, you’re still better off solving *Ax* = *b* than multiplying by *A*^{-1}, even if the computation of *A*^{-1} came for free. Solving the system is more numerically accurate than the performing the matrix multiplication.

It is common in applications to solve *Ax* = *b* even though there’s not enough memory to store *A*^{-1}. For example, suppose *n* = 1,000,000 for the matrix *A* but *A* has a special sparse structure — say it’s banded — so that all but a few million entries of *A* are zero. Then *A* can easily be stored in memory and *Ax* = *b* can be solved very quickly. But in general *A*^{-1} would be dense. That is, nearly all of the 1,000,000,000,000 entries of the matrix would be non-zero. Storing *A* requires megabytes of memory, but storing *A*^{-1} would require terabytes of memory.

**Related post**: Applied linear algebra

I have systems of equations that fit nicely in an Ax=b matrix format and usually solve them by inverting A. They’re relatively small (also sparse but not banded and not necessarily positive definite), so this works well. However, I have to make incremental changes in the values of A (say, change two of the three non-zero values in one row) and find the corresponding x and have to do this many, many times. Can factoring help me preserve the work from previous iterations and reduce my ridiculous run times?

If you change A then you have a new problem and so you can’t reuse the old factorization.

On the other hand, if your change to A is small, say A’ = A + E where E has small norm, then a solution x to Ax = b may be approximately a solution to A’x = b. If you’re solving the latter by an iterative method, then you could give it a head start by using the old x as your starting point.

Someone recently gave me some code on reddit that solves Ax=b in fewer steps than general matrix inversion, using gaussian elimination. You can check it out here:

https://www.reddit.com/r/programming/comments/5jv6ya/incremental_least_squares_curve_fitting/dbjt9zx/

Great Post!

Here is a follow-up question: what if I have a matrix A and vector b and I need the quantity b’A^{-1}b. Of course I can get A^{-1} explicitly and compute the product. This is very unstable, even in small examples. What would be the “solving the system” formulation of this problem?

Guillaume, you could solve Ax = b, then form the inner product of x and b.

very nice! Thank you!

Hi John, thanks for the great post.

I have a question: Assuming that M and N have the same size, which would be faster?

(I). Solving (Mx = b) 100 times with the same b but different Ms,

(II). Solving (Nx = c) and (Nx = d) 100 times with the same N but different c,d.

As you pointed out, N^-1 could be calculated and used repeatedly in case (II). This way, my guess is that case (II) may be faster.

Thanks in advance.

How do you solve Ax = b if A is under-constrained? (i.e. if I need the minimum norm solution or the equivalent as the pseudo-inverse. I assume you don’t compute the pseudo-inverse according to you post)

But how do you get the standard errors for least squares without inversion?

In [A]{X}=[b], for m different right-hand sides and where T is clock cycle time on a typical AMD chip,

time would be 8/3n^3T+8mn^2T by decomposition (factor) method,

time would be 32/3n^3T+2mn^2T by finding inverse first via decomposition method and then using matrix multiplication.

For m>4*n/3, the latter is computationally more efficient (aside from the round-off issues).