There is hardly ever a good reason to invert a matrix.
What do you do if you need to solve Ax = b where A is an n x n matrix? Isn’t the solution A-1 b? Yes, theoretically. But that doesn’t mean you need to actually find A-1. Solving the equation Ax = b is faster than finding A-1. Books might write the problem as x = A-1 b, but that doesn’t mean they expect you to calculate it that way.
What if you have to solve Ax = b for a lot of different b’s? Surely then it’s worthwhile to find A-1. No. The first time you solve Ax = b, you factor A and save that factorization. Then when you solve for the next b, the answer comes much faster. (Factorization takes O(n3) operations. But once the matrix is factored, solving Ax = b takes only O(n2) operations. Suppose n = 1,000. This says that once you’ve solved Ax = b for one b, the equation can be solved again for a new b 1,000 times faster than the first one. Buy one get one free.)
What if, against advice, you’ve computed A-1. Now you might as well use it, right? No, you’re still better off solving Ax = b than multiplying by A-1, even if the computation of A-1 came for free. Solving the system is more numerically accurate than the performing the matrix multiplication.
It is common in applications to solve Ax = b even though there’s not enough memory to store A-1. For example, suppose n = 1,000,000 for the matrix A but A has a special sparse structure — say it’s banded — so that all but a few million entries of A are zero. Then A can easily be stored in memory and Ax = b can be solved very quickly. But in general A-1 would be dense. That is, nearly all of the 1,000,000,000,000 entries of the matrix would be non-zero. Storing A requires megabytes of memory, but storing A-1 would require terabytes of memory.
Related posts:
Floating point numbers are a leaky abstraction
Ten surprises from numerical linear algebra


{ 4 trackbacks }
{ 70 comments… read them below or add one }
Jason Adams 01.19.10 at 07:08
Good advice. Can you think of a case where it would be a good idea to solve A^-1?
Daniel Lemire 01.19.10 at 07:27
Great post. But you missed the best part: Ax = b may have a (unique) solution even when A is not invertible!
Anyhow, my experience has been that once you are down to inverting matrices, things have gone badly for you. Better rethink your models.
Peter Turney 01.19.10 at 07:39
If you want to calculate a projection matrix (e.g., for linear regression), you may need to do some inverting (http://en.wikipedia.org/wiki/Projection_matrix). For numerical stability, I often use the Moore-Penrose pseudoinverse (http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse) via SVD. If the original matrix is sparse, you may be able to truncate the SVD and store the component matrices (U, Sigma, V) in relatively little space. Most math packages (e.g., MATLAB) come with SVD, so it’s easy to do this.
Stuart Buck 01.19.10 at 07:54
Is it possible to do econometrics w/o inverting matrices? (That is, w/o the software doing it when using estimators that are presented theoretically as involving inversion).
John 01.19.10 at 08:12
@Stuart What would you do with the inverse once you had it? If you’re going to multiply a matrix by it, then just solve the system.
To put it another way, pretend you have a software object
AInverse. You can’t access the inverse directly, butAInversehas methods on it to let you do whatever you want:MultiplyByVector,GetDeterminant, etc. But when you look insideAInverse, there’s no inverse inside, just code that acts as if there were an inverse inside.Robert Talbert 01.19.10 at 08:23
There are some applications of matrix multiplication to cryptosystems (the classical example is the Hill cipher; more modern application is AES/Rijndael). In those settings the inverse of a matrix is actually a useful object — namely it’s the decryption key for the cipher. So it’s something that a user of the system would want to compute and have lying around.
A mitigating factor here is that the linear algebra is usually done over a finite field, so the inverse computations could take significantly less time.
Gene Harris 01.19.10 at 08:24
John, in molecular orbital theory, one of the key steps is S(-1) x E x S where E is the eigenvector matrix and S is the overlapped orbital matrix to test for an orthogonal set. I haven’t dealt with the math in years, but I notice the technique still seems to be used in the public domain(?) program MINDO. I wouldn’t have a clue how to attack this problem without inverting the matrix S.
John 01.19.10 at 09:26
@Robert: I assume that since encryption works over a finite field, the inverse is known exactly and accuracy isn’t an issue. But I wonder whether ideas from numerical linear algebra could make cryptosystems more efficient. Maybe deep down in code somewhere there is an LU factorization that you don’t see in the theory.
@Jason et al: I’m not saying there’s never a reason to invert a matrix. There may be, but I don’t think I’ve never needed to (except on a homework assignment).
@Gene: In your example, you multiply S-1 by the matrix ES. You could solve Sx = b for each column b of the matrix ES. I’m not familiar with your particular problem, but I wouldn’t be surprised if there’s some special structure you could exploit to do even better.
Roger 01.19.10 at 11:13
The only thing I can think of is getting an asymptotic covariance matrix from a likelihood optimization (i.e. for doing subsequent inference). Usually, that involves inverting the Hessian. Or maybe there’s a better way for that too?
John Moeller 01.19.10 at 11:31
I know that MATLAB has a right division and left division operations. IIRC they solve several linear systems instead of inverting and multiplying.
Michael Turmon 01.19.10 at 12:00
In the next to last paragraph (”Now you might as well use it, right?”…) there’s some advice that’s not always right IMHO. If you have to do a lot of calculations of A^{-1} b for many different b’s, the invert-and-multiply approach, although not numerically as accurate as back-substitution, will be much faster (all inner products, no divisions). You often don’t care about the ultimate in accuracy, or know A is well-conditioned anyway.
This is the typical situation in computations of Gaussian probabilities, which require x’ A^{-1} x for a positive-definite A. (You don’t use vanilla inverse in this case…you use cholesky decomposition followed by inverse, but the principle is the same.)
John 01.19.10 at 12:56
@Michael: Good point about inner products. Both matrix multiplication and back-substitution require O(n^2) operations, but with a large matrix the multiplication could be faster because it uses memory more efficiently etc.
Michael Nielsen 01.19.10 at 13:07
Computing the personalized PageRank (but not the standard PageRank) is often best done using matrix inversion.
MATLABician 01.19.10 at 13:25
If the matrix inverse is in any way going to be used for “solving” a system of linear equations, then it’s not accurate, stable, or efficient to solve the system using the inverse matrix; the system can be solved using a variety of direct (say LU, QR, SVD or Cholesky), semidirect (say, conjugate gradients), or iterative (say multigrid or overrelaxation) methods depending on the type of the problem (consistent linear system, least squares, etc.), special properties of the matrix involved (symmetric, triangular, Toeplitz, bi-diagonal, positive definite, etc.), size of the matrix, frequency with which the matrix is updated (time invariant problem, time dependent dependent problem, such as system identification/control problems, etc.), stability vs. efficiency considerations, among other things. (In MATLAB, an excellent implementation for direct methods is Tim Davis’s factorize available on FEx, Don’t let that INV go past your eyes; to solve that system, FACTORIZE!.)
There are, however, rare cases that the inverse matrix itself is needed. One such cases is calculating the standard errors of the estimates, in an identification problem, based on the inverse of the Fisher information matrix. Another is computing the coefficients of an exponential time differencing (ETD) Runge-Kutta scheme via complex contour quadrature—to avoid catastrophic cancellation due to roundoff in the operator $latex (.)^{-1}(e^{(.)} – 1)$ or its higher order variants—when the spatial discretization operator is nondiagonal. You may read more about this latter case here: Complex Magic, Part 3: Function Evaluation via Integration.
Michael Bubb 01.19.10 at 15:54
pickiness…
First line should be “hardly ever”
Michael Bubb 01.19.10 at 15:54
Nice article – I meant to add
Fasih 01.19.10 at 17:14
We use inv(Fisher information matrix) for Cramer-Rao statistical bounds on probabilistic estimators all the time. We usually look at the condition number of the matrix to make sure it’s invertible first (the SVD of a singular FIM can give us the basis of the null space, that is, the combinations of the columns of the FIM that yield zero and thus prevent inversion).
John 01.19.10 at 17:28
Once you have the inverse of the Fisher information matrix (or any other matrix) what do you do with it?
Multiply it by something? Then you could use back-substitution.
Find its determinant? You could find the determinant of the original matrix and take it’s reciprocal.
Find its eigenvalues or eigenvectors? A and A-1 have the same eigenvectors, and the eigenvalues of A-1 are the reciprocals of the eigenvalues of A.
If an application needs A-1, it must do something unusual with it because the usual operations can be carried out other ways.
Matt 01.19.10 at 17:46
In econometrics sometimes one needs so-called sandwich estimators, which involve matrix terms like A^{-1} B A^{-1}, where A and B are square (n x n) dense matrices. Is there any way to completely avoid the matrix inversion here?
John 01.19.10 at 17:49
@Matt: What do you do with the matrix A^{-1} B A^{-1} once you’ve got it? Maybe there’s a way to accomplish that task without computing A^{-1}.
Matt 01.19.10 at 17:53
@John: eventually I access the elements on the diagonals of the resulting matrix (and perform simple ops, like taking square root) — those are used directly (e.g. as standard errors).
foobar 01.19.10 at 17:53
Is there a way to quantify how much worse solving using the inverse is? I have been challenged in the past and could not provide evidence that it was really not a good idea. If I remember correctly I don’t think I could even find this “common knowledge” in Golub and van Loan, never mind a comparative analysis.
John 01.19.10 at 18:46
@Matt: That’s interesting. I’d never seen that problem. Are your A and B matrices large? Sparse?
Here’s one way you could find the ith column of A^{-1} B A^{-1} without computing A^{-1}. This approach may not be a good idea in the context of your problem. It’s just an idea.
Define x_i = A^{-1} B A^{-1} e_i where e_i is the column vector with a “1″ in the ith position and zeros everywhere else. Note that x_i is the ith column of A^{-1} B A^{-1}. Then A x_i = B (A^{-1} e_i). Compute A^{-1} e_i by solving Ax = e_i, then multiply the result on the left by B, then solve for x_i. That would let you find one column of A^{-1} B A^{-1} without having to compute the entire matrix. This would be a win if you don’t need the entire matrix, but perhaps not if you do need the entire matrix. It could also be a win if A has a special structure.
John 01.19.10 at 19:15
@foobar: I think Applied Numerical Linear Algebra by James Demmel would have the theorem you’re looking for. Unfortunately I don’t have the book with me so I can’t be sure.
Matt 01.19.10 at 19:23
@John: Thanks for the post. Unfortunately, I need all of the diagonal elements. All of the matrices involved are dense, however, in the end I’m not interested in the off-diagonal entries of the resulting matrix and would be happy to disregard them (and save some computation time!). Still, haven’t found any interesting/smart way to solve that.
Currently I’m even explicitly calculating A^{-1} (even though doing explicit inverses is a no-no from a numerical analysis’ point of view…) and pre- & postmultiplying B. I suppose I could use LU factorization to calculate (A^{-1}B) and then postmultiply by (this time explicitly obtained) A^{-1} but I’m skeptical regarding the gains in doing so. Still, I still have this subconscious feeling that I’m doing this in a sub-optimal way, especially since in the end I only need the diagonal entries…
Gene Harris 01.19.10 at 20:51
It turned out to be somewhat more complicated, after reviewing the circa 1970 FORTRAN code. The term is S-1/2CS1/2. I’m not sure if this makes any difference to your thesis/assertion. I just powered through the code way back in 1970. I did run it by Chuck Katholi, my numerical analysis professor and we made some changes to search for the largest off diagonal elements to force convergence a little faster.
Francisco Gimenez 01.20.10 at 00:20
What about using a Kalman Filter, where you need to invert potential large matrices to do updates?
MATLABician 01.20.10 at 06:08
@Francisco Gimenez
The $A^T A$ involved in Kalman filter is a block tridiagonal matrix. At every step, the lower right block of this matrix is changed. Although inverse matrix appears in the Kalman filter formulas, implementations never use the inverse, as that would be unstable and inefficient. Instead, the factorization $A^T A = LDL^T$ is updated—hence the name “square root” filter. (To be exact, $LDL^T$ approach is not used either. QR decomposition is used to orthogonalize the columns of A to achieve more stability.)
Stuart Buck 01.20.10 at 14:06
What would you do with the inverse once you had it? If you’re going to multiply a matrix by it, then just solve the system.
I wouldn’t do anything with the inverse; I’m certainly not inverting large matrices by hand. I’m just asking whether econometrics software (SAS, STATA, etc.) is actually calculating all of the matrix inversions that are seemingly required by econometric theory?
alfC 01.25.10 at 21:53
@Jason Adams, I am wondering whether it makes sense to HAVE the inverse if the matrix involved is small (let’s say 3×3 or 4×4) and you need to apply it many times.
Could it be? I don’t know. Maybe John C. knows the answer.
Eamon Nerbonne 01.28.10 at 04:08
From a completely pragmatic point of view, many software systems expect linear transformations in the form of a matrix. To be able to reuse such components without (complicated, time-consuming, possibly license-violating) modification, you’d expect that (when sufficiently accurate), an inverse would be just fine for lots of applications.
Perhaps it’s not a perfect solution, but isn’t it said that the perfect is the enemy of the good?
Not that this is really the kind of reason you were looking for, I suppose…
John 01.28.10 at 04:13
@Eamon, you give a very good reason. If you need the inverse, not for your own person use, but in order to hand other software expecting a matrix as input, then you need to invert the matrix.
Regis 02.04.10 at 05:04
Very interesting. Do you know any method to invert matrix for using in GRID computing? For example, a method that split a larger matrix in smallers ones. Each smaller matrix would be processed by a node and, at the final, deliver the results to the main node that would joins that pieces and deliver the final result (something like that – sorry my english…).
Ganesh Swami 02.04.10 at 11:47
John: Big fan of your articles, and I always learn something new. Thanks.
Frequently one comes across cases where the matrix has some special structure. A common example is matrices belonging to the orthogonal group, where the inverse is simply the transpose. You don’t even have to compute/store the transpose as libraries such as BLAS take in a parameter that specifies whether you want to use the transpose. Similarly, in numpy you can use matrix.T to create a view.
Stephen 02.16.10 at 23:09
I think Stuart and Fasih’s points are good ones, that for calculating the standard errors in statistics often requires inverting a matrix. If we want to find the least squares fit for the overdetermined system Y=XB, then B=(X’X)^-1X’Y
To calculate the standard error for our estimate of B, we need to know the diagonal elements of (X’X)^-1.
This same basic problem comes up in maximum likelihood estimation. The standard errors of the coefficient estimates come from the inverse of the Hessian matrix for the optimization problem.
http://en.wikipedia.org/wiki/Cram%C3%A9r-Rao_inequality
CogitoErgoCogitoSum 04.13.10 at 13:08
Short of computer applications (which I don’t bother with) and the Markov chain, I have never found a practical use for matrices, whatsoever. Perhaps I’m not well versed enough with higher level math, or perhaps the entire subject is unnecessary. I mean, don’t we learn Linear Algebra in the 9th grade when we first solve simultaneous linear equations? Isn’t that the same thing? The same can be achieved, Ax = b → x = A-1b, using regular algebraic expressions as opposed to matrices. I’ve never really seen the point – the mathematical point, I mean – except to simplify certain concepts for those who cannot otherwise wrap their minds around it in regular algebra due to symbol-overload or whatever else overwhelms them.
CogitoErgoCogitoSum 04.13.10 at 13:12
Perhaps the subject already exists, I’m not sure, but how do you solve non-linear equations using matrices? Is it even possible? I mean, isn’t that why matrix algebra is also called linear algebra, because the equations they represent are linear? That is another reason, from my perspective, that matrices are ridiculous. They lack the power to resolve non-linear equations – which are far more complicated and practical and common in the real world, I think. Is there a non-linear equation method using linear/matrix algebra? Or is regular pen and paper algebra better in every way?
John 04.13.10 at 13:36
Nonlinear systems are routinely solved by a sequence of linear problems. Think of Newton’s method for finding roots, in just one variable. The method essentially says replace a function by its linear approximation (i.e. the line with slope given by the derivative) and see where that linear approximation crosses the x-axis. You can do the analogous thing in several variables, using a sequence of linear approximations.
MatrixWhelp 04.29.10 at 08:46
I need to transform vectors and points between two coordinate systems. The most elegant solution seemed to be to invert the transformation matrix and multiply with that. I am aware that I probably could special-case it in real life, but performance wise it is no matter also I wanted to understand matrices better and write a general solution. So I set out to find matrix inversion code to my liking. Found none that could satisfy me completely for my purpose (wanted constant time and space usage, least possible moves in memory and no loops, also calculate the determinant as a byproduct not separately – so a kind of trivially most efficient without special instructions), so I wrote one. I failed at it miserably…
If I have a transformation matrix and I want to build the transformation matrix that undoes all affine transforms without matrix inversion I am all for it at this point. Could you point me in the right direction?
My code by the way: http://pastebin.com/t5qtELDB
Matrices are stored row-major, multiplication, transpose, and scalar multiplication are trivial not included.
I think I completely failed the cofactor matrix calculation, just don’t know why… I used the fact that I can chose any row or column to expand when calculating determinants…
John 04.29.10 at 08:53
If you’re talking about 3D coordinates, you’re probably not too concerned with accuracy or efficiency. In that case, inverting the 3×3 matrices is fine and may be conceptually easier. But that approach doesn’t scale well to larger matrices.
MatrixWhelp 04.29.10 at 09:16
Thanks!
I thought the same, just wasn’t sure. I like the robustness and mathematical clearness of matrix inversion in this case.
I have to tell you if I did not write kind of an API for others to work in for myself I would rather not use matrix inversion if my life depended on it… But it needs to be simple and easily understood code (more important than performance).
I can’t ask that you tell me what is wrong with my code, but could you (or someone else) tell me if I’m right in the assumption that 4×4 matrix inversion can be done in (roughly) matrix multiplication time, and only have to use 6+6 (for the upper and lower half) 2×2 determinants to it?
Or I completely misunderstood something?
MatrixWhelp 04.30.10 at 17:29
Sorry! There is nothing wrong with that code 3 days of my life I spent trying to find out what’s wrong with it, and what I did not check, if my Matrix by Vector function is really doing it’s job as it should… Well I do understand matrices better than before…
You may delete my comments if you wish.
hari 12.12.10 at 01:43
Considering the problem Ax = b with A expressed in 2 significant decimal digits accuracy like 2.65, 0.35, 2.85, 0.49 etc say of rank 6 and x representing fractions as a column vector (0.25, 0.15, 0.51 and the likes, less than 1 and which add up to 1), one will have b accurate to 2 decimal digits only even though Ax may have 4 decimal digits from the A*x computation.
When inversion is applied to above scenario, be it by any method, x = A^-1*b, what accuracy can be expected in the model retrieved as fractions?
Can x be accurate to more than one significant decimal digit in the above case where A is expressed in 2 significant digits?
hari
Paul 12.24.10 at 12:41
A number of folks have asked for the diagonal of inv(A’A).
Using the singular value decomposition, A = U S V’, with U’U = V’V = I, S diagonal,
A'A = (USV')'USV' = VSU'USV' = VSSV'
inv(A'A) = inv(VSSV') = inv(V') inv(SS) inv(V) = V inv(SS)' V'
Since S is diagonal,
inv(A'A) = V inv(S) inv(S)' V' = (V inv(S)) (V inv(S))'
For any matrix X, diag(XX’) is the sum of the square of the rows of X or the columns of X’. With S diagonal, the inverse is just element-wise 1/S, and you can compute the diagonal entries of inv(A’A) by summing the squares of the columns of V’ and dividing the result by the corresponding diagonal entry in S.
Paul
Jens-Peter M. Zemke 01.25.11 at 14:38
In Interval Arithmetic it makes perfect sense to compute (floating point approximations) to the inverse. Needless to say that these approximations are computed using the LU decomposition or such. The best known scheme to compute an enclosure to the solution of some (nonlinear) system of (possibly degenerate) interval equations is based on the so-called Krawczyk Operator and relies on the explicit computation of an (approximation to an) inverse matrix.
Dirk Kok 01.26.11 at 04:02
Hi,
I take it the discussion here is about very large matrices? Or does it apply to small 4×4 matrices as well? For example, I have an iduction motor simulation in matab which is a 4 by 4 matrix and I was wondering if thie factorisation could speed up the simulation? And if so, where can I learn about factorisation of matrices?
Cheers,
John 01.26.11 at 06:51
I did have large matrices in mind when I wrote this. However, the same considerations apply to small matrices, though to a lesser extent.
To learn about matrix factorizations, Matrix Computations by Golub and Van Loan is a classic.
Warwick Dumas 03.06.11 at 18:15
The linear algebra I am doing is very simple: basically I want to create a spline fit to a large set of points, and I decided to try using some averaging between quintic polynomial fits to subsets of 6 points each. The fitting procedure will be repeated say 10^9 times during a program run. Reading this article motivated me not to just invert the relevant matrix. I wouldn’t want to just apply an inversion routine – performance is crucial – but I did consider inverting it by hand and applying that.
Instead I am writing LU by hand, and trying my best to exploit the special property involved. Viz, the polynomial takes the form a0 + a1(x-xmid)+a2(x-xmid)^2+..+a5(x-xmid)^5 where if the points are x0,…,x5 then xmid is the average of x2 and x3. Therefore where d corresponds to distance from xmid, one row is 1,d2,d2*d2,d2^3,d2^4,d2^5, and one row is 1,-d2,d2*d2,-d2^3,d2^4,-d2^5. And the first column is all 1’s. Other than that I’m not sure there is much to exploit.
I wouldn’t prefer to write an LU routine (a la Press et al) because of the pivoting business – trying to keep this really straightforward as well as fast.
I just wondered, since this specific problem is obviously a very common one, does anyone have a standard, numerically stable algorithm for doing it. (This reinventing the wheel is time-consuming, and unfortunately I have to go to work so can only work on this voluntary project at the weekend.)
John 03.06.11 at 18:22
Fitting splines leads to banded linear systems. These can be solved very quickly with special algorithms. Solving a banded system takes O(N) operations whereas Gaussian elimination on a dense matrix requires O(N^3) operations.
zortharg 04.08.11 at 14:39
You have assumed a priori that solving Ax=b is THE reason anyone might want to invert a matrix. What if, for example, you want to find the probability density function of a large set of correlated gaussian random variables, and you have a very large covariance matrix. You need to take the covariance matrix and do what with it? Guess! That’s right! You need to invert it! Look up multivariate gaussian on wikipedia if you don’t believe me.
John Moeller 04.08.11 at 16:15
@zortharg : They went over this in comment 11. You invert a matrix and keep it, but it’s the matrix from the Cholesky decomposition, not the original covariance matrix. You should read all the comments before posting your own.
Warwick Dumas 04.08.11 at 16:28
John :
Thanks. I missed your comment (!) but in the end I ended up doing exactly this, using the band-diagonal LU algorithm in Press et al (not sure to whom they credited it). This for a genuine quintic spline curve not a weighted average. This then worked in practice about 10x slower than just taking derivatives using local quadratic approx, but the improvement was worthwhile in certain respects.
Cheers,
Warwick
Samuel 09.13.11 at 13:03
I havent read every comment, but I have a simple question: inverting A is not equivalent to solving A x_1=b_1,…,A x_n = b_n for b_1 = (1,0,…,0) , b_2= (0,1, ,0 ,…,0), etc. ??
Dave Webb 09.13.11 at 18:10
Question: What if I need to multiply something by the inverse, instead of just solving linear equations? For example, say I have a positive definite symmetric matrix \Sigma. If I need to calculate a multivariate normal density with this matrix as the covariance, I need to do something like (X^T * inverse(\Sigma) * X), where X is a column vector. Can I do this w/o explicitly taking the inverse?
John Moeller 09.13.11 at 20:46
@Dave: Yes, you can do that by taking the inverse of the Cholesky decomposition. I.e., decompose as $latex L^TL=\Sigma$, solve $latex Ly=x$ for $latex y$ (which is fast because $latex L$ is triangular), and then compute $latex y^Ty$. Or you could invert $latex L$ (which is easy because it’s triangular). This is mentioned in comment 11.
Scott Edwards 09.19.11 at 23:13
Its clear from the comments above that solving a system of linear equations is NOT best done by finding the inverse of A. So, then what IS the best way to solve them? I assume some of the acronymns above (eg LU) hold the answer. Sorry, I know you guys are obviously light years ahead of me, but in linear algebra I only learned the “invert the matrix” method. Could you point me to some intro sources?
Thanks!
Scott
John 09.20.11 at 07:07
The “L” and “U” in LU stand for lower-triangular and upper-triangular. It’s a way of factoring a matrix A into the product of two special matrices L and U.
As for resources, I’d recommend any of the linear algebra books Ward Cheney has written. His latest book is very expensive, but you could look for a used copy or an earlier edition.
Scott Edwards 09.21.11 at 09:16
Thanks so much for your quick response, John – appreciate your willingness to help a newbie out! I’ve checked out Ward’s books, and I guess the one titled “linear algebra theory and applications” would be a good place to start and I assume it would address the techniques you are describing. Looks like the 2nd edition just came out, so maybe I can pick the first one up at a discount, since I don’t have $160 at hand right now.
. Until then ill keep checking out your great blog, as I attempt to make the shift from someone with a pretty good traditional background in stat and research methods to someone like you guys that actually knows how this stuff functions at a deep level – to the point of knowing things like how best to calculate standard deviation (one of your blog entries) – something I always assumed was very cut and dry! Anyway, thanks!
Andrew 10.03.11 at 22:14
Thanks for the advice. I now see why my matlab scrip is more stable than my C equivalent. In matlab I use b/A, but in C I explicitly calculate the inverse.
raequin 12.06.11 at 11:41
I came across this article because my equations for Kalman filtering call for a matrix inversion, and I’m not sure about the invertibility of the matrix sum in question. The following matrix is part of the update step:
(HPH’ + W)^-1
where H is my output map, P is the estimate of the error covariance, and W is the covariance matrix for the measurement noise. I gather that there’s no way to be sure of the invertibility of this sum (besides, though I guess that P and W are invertible I know that H for my case does not have full row rank, as m > n). Here is the entire equation for the state update.
x_{k+1|k+1} = x_{k|k} + PH(HPH’ + W)^-1(y – Hx_{k+1|k})
Anyway, that’s the background for this comment; I thought my matrix was not guaranteed to be invertible. I’m now looking for tips on how to implement this equation. Comment #28 states that “implementations [of the Kalman filter] never use the inverse,” but yet the first one I looked at does (http://code.google.com/p/efficient-java-matrix-library/wiki/KalmanFilterExamples). Other comments above mention Cholesky decomposition for multiplication tasks, so I’m guessing that might be the approach to take. Though now, after reading a Wikipedia entry, I’m guessing this can’t be the right approach since I don’t think the sum will necessarily be Hermitian, positive definite. What do I use instead of the inverse here?
Warwick Dumas 12.06.11 at 12:40
If the matrix is noninvertible then there is not exactly one solution.
If it is invertible then you can use LU.
I expect others more learned will add their comments.
Matt 12.06.11 at 12:47
@raequin : Cholesky decompositions, like LL^T and LDL^T, are not the only choice. For instance, consider the catalogue of matrix decompositions here (note: only LLT and LDLT have (semi)definetness requirements):
http://eigen.tuxfamily.org/dox-devel/TopicLinearAlgebraDecompositions.html
raequin 12.06.11 at 12:49
@Warwick There is not expected to be an exact solution to the system. We’ve been using Gauss-Newton (ordinary least squares) to compute the state. I am working on an alternative with Kalman filtering. So, I guess I still have the question of invertibility, whether or not I end up doing that or some other method (such as LU)?
raequin 12.06.11 at 12:59
I still will need to know what to do about the inversion, but it occurs to me that the matrix
S = (HPH’ + W)
is not guaranteed to be invertible even if H, P, and W are. I don’t understand how this update equation can be used then (I’m not saying it can’t, I just don’t see how one is to use an update with an inversion that may not be possible at every step).
Warwick Dumas 12.06.11 at 13:41
I would have thought (could be wrong) that for the sum of invertibles (with physical meanings) to be noninvertible implies some kind of special relationship between the summands. Given your physical definitions I don’t see how you’d get that. Degenerate matrices don’t occur by chance.
If your equations do not have a solution (overdetermined), then you can get the least-squares fit with a pseudoinverse, and I guess there are then similar decompositions for that. I don’t know whether a Kalman filter should lead to insoluble equations.
When you say m > n, I am taking it that means your observation y is an m-vector but your underlying process x is an n-vector. Correct?
raequin 12.06.11 at 14:23
@Matt I know that a covariance matrix is positive semidefinite, but I’m not sure about a) the characteristics of HPH’, and b) the characteristics of two positive semidefinite matrices.
@ Warwick You are correct about my meaning for m > n. As per your comment on invertibility, I don’t know that I have a sum of invertibles. HH’ is definitely not invertible, am not sure about HPH’. The current method is to get a least-squares fit with a pseudoinverse.
—–
I hate to clog the comments section like this, so bear with me. The following excerpt bears directly on my application. It covers my implementation question I started off asking, but I still don’t see how the matrix sum is guaranteed to be symmetric, positive-definite. Any thoughts?
From _Kalman Filtering; Theory and Practice Using MATLAB_, second edition, by Grewal and Andrews:
Cholesky decomposition provides an effcient and numerically stable method for solving equations of the form AX = Y when A is a symmetric, positive-definite matrix. The modified Cholesky decomposition is even better, because it avoids taking scalar square roots. It is the recommended method for forming the term
(HPH’ + R)^-1 H
in the conventional Kalman filter without explicitly inverting a matrix. That is, if one
decomposes HPH’ + R as UDU’ , then
(UDU’)(HPH’ + R)^-1 H = H.
It then suffices to solve
UDU’X = H
for X.
I’d say that this answers the implementation question I posed. There’s still the matter of invertibility of
HPH’ + W
but this page is for alternatives to matrix inversion so I’ll leave off posting any more questions here. Thanks Matt and Warwick!
John Moeller 12.06.11 at 14:59
If P is PSD, then so is HPH’. Just consider y’HPH’y. x’Px is positive for any x, therefore it’s positive for any H’y. Since y’HPH’y is positive for any y, then HPH’ is PSD.
The sum of two PSD matrices is always PSD (the positive semidefinite matrices form a cone in the vector space of all symmetric matrices; cones are closed under sum). So if P and W are PSD, then so is HPH’ + W. If W is not singular, or if H and P are not singular, then HPH’ + W is invertible.
raequin 12.06.11 at 15:01
@John Moeller Thanks, that’s clear.
Warwick Dumas 12.06.11 at 15:03
OK I could be wrong about this, but a back-of-envelope reckoning indicates that for a vector z,
z^T HH^T z = sum{columns of H} [ (z.h column j)^2 ]
So according to that, HH^T simply cannot have a nontrivial kernel, unless H itself is something strange.
However, I could always be wrong.
If I am correct that HH^T has to be invertible, then it follows that HPH^T is invertible.
I don’t see physically why your stepping equation should be insoluble when m > n. In fact I think it should not be insoluble even if m < n, as the Kalman filter is just giving you the best estimate.
Warwick Dumas 12.06.11 at 15:05
Wow someone gave the proper answer.