There is hardly ever a good reason to invert a matrix.

What do you do if you need to solve *Ax* = *b* where *A* is an *n* x *n* matrix? Isn’t the solution *A*^{−1} *b*? Yes, theoretically. But that doesn’t mean you need to actually find *A*^{−1}. Solving the equation *Ax* = *b* is faster than finding *A*^{−1}. Books might write the problem as *x* = A^{−1} *b*, but that doesn’t mean they expect you to calculate it that way.

What if you have to solve *Ax* = *b* for a lot of different *b*‘s? Surely then it’s worthwhile to find *A*^{−1}. No. The first time you solve *Ax* = *b*, you factor *A* and save that factorization. Then when you solve for the next *b*, the answer comes much faster. (Factorization takes O(*n*^{3}) operations. But once the matrix is factored, solving *Ax* = *b* takes only O(*n*^{2}) operations. Suppose* n* = 1,000. This says that once you’ve solved *Ax* = *b* for one *b*, the equation can be solved again for a new *b* 1,000 times faster than the first one. Buy one get one free.)

What if, against advice, you’ve computed *A*^{−1}. Now you might as well use it, right? No, you’re still better off solving *Ax* = *b* than multiplying by *A*^{−1}, even if the computation of *A*^{−1} came for free. Solving the system is more numerically accurate than the performing the matrix multiplication.

It is common in applications to solve *Ax* = *b* even though there’s not enough memory to store *A*^{−1}. For example, suppose *n* = 1,000,000 for the matrix *A* but *A* has a special sparse structure — say it’s banded — so that all but a few million entries of *A* are zero. Then *A* can easily be stored in memory and *Ax* = *b* can be solved very quickly. But in general *A*^{−1} would be dense. That is, nearly all of the 1,000,000,000,000 entries of the matrix would be non-zero. Storing *A* requires megabytes of memory, but storing *A*^{−1} would require terabytes of memory.

**Related post**: Applied linear algebra

dont know if true but, substitution following the factorization would be a better approach if you have big mats due to higher precision, but in other hand if you have a bunch of vectors to invert at once, if your original matrix is well conditioned, and you dont need higher precision, due code vectorization, mul by the inverse may run faster, 1. fops for back/forward subs and matmul are booth the same o(n^2), 2. substituitions depend on last result s and vectorization works better if you dont use previous registers results on subsequent operations (letting regs rest a while for writes) also parallelization takes advantage of no dependency. 3. vectorization and paralellism also rakes advantages of the fixed format of both matrices, and since row format is the native way to loadstore data on simd regs the same should be true for doing inner products. the same for inner products. of course, all those matters should be taking in account in the code that exploits the hardware to minimize those diferences between evaluation and matmul. dont let fops fool you! how data depends on other data and how is stored and how much is streamlined is crucial on modern hardware so fops may end more in a measure of how tired youll be if hand calculus or artritis i think

What is the best way to solve Ax=b for x please?

There are many people who devote their career to answering that question.

It depends on the size and structure of A. If A is small, maybe Gaussian elimination with partial pivoting.

In practice, matrices are often large and sparse (i.e. most of the entries are zeros, and there’s an exploitable pattern to the zeros) and so you can be more efficient and able to solve larger problems by picking the best method for a particular class of problem.

Thanks for responding!

At work I have to write an algorithm in C++ that will invert a 382 by 382 matrix which is symmetric, but not necessarily positive definite (I still don’t really understand what that means). I have written a Gaussian-Jacobi elimination method, and an eigendecomposition method. They both give the right answer for small matrices, but not 382 by 382.

I am writing it to replace a NAG function which we are not allowed to use, so it’s difficult to avoid it! We can afford a O(n^3) algorithm (one or two minutes) but not a O(n^4) algorithm (several hours).

I was sneakily thinking of writing a function to solve Ax=b and then letting b be each column of the identity matrix, solving for x each time, but I suspect you might tell me that’s a bad idea!

I’d use LAPACK if at all possible. It’s a very mature piece of software. Specialists have been refining it for decades.

Even for something that seems so simple as a matrix-vector product, the LAPACK version is going to be considerably faster than the obvious implementation because they’ve thought about details like memory management for a particular processor.

Dear Tim,

If the task is still relevant, then I can suggest to use a modification of the Gaussian elimination. Unlike the Cholesky method, the matrix may not be positive definite. In addition, in this modification we have slightly reduced number of required operations, besides that it is no need to calculate square roots, which also allows you to increase speed of calculations.

The modification is described in the article “Symmetric matrix inversion using modified Gaussian elimination”, link: https://arxiv.org/abs/1504.06734.

If you have any further questions on this topic, please do not hesitate to contact me, I will be happy to help.