Unhappy programmers are all alike

A recent pair of articles from Dr. Dobbs reminded me of a famous line from Anna Karenina:

All happy families are alike, but each unhappy family is unhappy in its own way.

Paul Johnson argues that Tolstoy had it backward, that there is more variety in happy families. He argues that there are “obvious, recurring patterns in unhappy families” such as drunkenness and adultery. In any case, back to programming.

Are unhappy programmers all alike? Or happy programmers?

Andrew Binstock wrote a pair of articles asking what makes great programmers different and what makes bad programmers different.

As for great programmers, Binstock says they have wide knowledge and good judgment of how to apply that knowledge. True, but not unique to programming.

But he also says great programmers have the ability to quickly switch between levels of abstraction. Such a skill could be useful in other professions, but it’s particularly useful in programming because software development requires knowledge of many different levels of abstraction. For example, data ranges from bits to terabytes, each with its own level of abstraction. Not many professions regularly have to think about phenomena at multiple resolutions ranging over 12 orders of magnitude.

As for bad programmers, Binstock lists a number of common traits but boils it down to this: “In my experience, at the core of bad programming is invariably a lack of curiosity.” I’d agree that a lack of curiosity explains a lot, but it leaves a great deal of unexplained diversity in bad programming.

Making a singular matrix non-singular

Someone asked me on Twitter

Is there a trick to make an singular (non-invertible) matrix invertible?

The only response I could think of in less than 140 characters was

Depends on what you’re trying to accomplish.

Here I’ll give a longer explanation.

So, can you change a singular matrix just a little to make it non-singular? Yes, and in fact you can change it by as little as you’d like. Singular square matrices are an infinitely thin subset of the space of all square matrices, and any tiny random perturbation is almost certain to take you out of that thin subset. (“Almost certain” is actually a technical term here, meaning with probability 1.)

Adding a tiny bit of noise to a singular matrix makes it non-singular. But why did you want a non-singular matrix? Presumably to solve some system of equations. A little noise makes the system go from theoretically impossible to solve to theoretically possible to solve, but that may not be very useful in practice. It will let you compute a solution, but that solution may be meaningless.

A little noise makes the determinant go from zero to near zero. But here is an important rule in numerical computation:

If something is impossible at 0, it will be hard near zero.

“Hard” could mean difficult to do or it could mean the solution is ill-behaved.

Instead of asking whether a matrix is invertible, let’s ask how easy it is to invert. That is, let’s change a yes/no question into a question with a continuum of answers. The condition number of a matrix measures how easy the matrix is to invert. A matrix that is easy to invert has a small condition number. The harder it is to invert a matrix, the larger its condition number. A singular matrix is infinitely hard to invert, and so it has infinite condition number. A small perturbation of a singular matrix is non-singular, but the condition number will be large.

So what exactly is a condition number? And what do I mean by saying a matrix is “hard” to invert?

The condition number of a matrix is the norm of the matrix times the norm of its inverse. Defining matrix norms would take too long to go into here, but intuitively it is a way of sizing up a matrix. Note that you take the norm of both the matrix and its inverse. A small change to a matrix might not change its norm much, but it might change the norm of its inverse a great deal. If that is the case, the matrix is called ill-conditioned because it has a large condition number.

When I say a matrix is hard to invert, I mean it is hard to do accurately. You can use the same number of steps to invert a matrix by Gaussian elimination whether the matrix has a small condition number or a large condition number. But if the matrix has a large condition number, the result may be very inaccurate. If the condition number is large enough, the solution may be so inaccurate as to be completely useless.

You can think of condition number as an error multiplier. When you solve the linear system Ax = b, you might expect that a little error in b would result in only a little error in x. And that’s true if A is well-conditioned. But a small change in b could result in a large change in x if A is ill-conditioned. If the condition number of A is 1000, for example, the error in x will be 1000 times greater than the error in b. So you would expect x to have three fewer significant figures than b.

Note that condition number isn’t limited to loss of precision due to floating point arithmetic. If you have some error in your input b, say due to measurement error, then you will have some corresponding error in your solution to Ax = b, even if you solve the system Ax = b exactly. If the matrix A is ill-conditioned, any error in b (rounding error, measurement error, statistical uncertainty, etc.) will result in a correspondingly much larger error in x.

So back to the original question: can you make a singular matrix non-singular? Sure. In fact, it’s hard not to. Breathe on a singular matrix and it becomes non-singular. But the resulting non-singular matrix may be useless. The perturbed matrix may be theoretically invertible but practically non-invertible.

So here’s a more refined question: can you change a singular matrix into a useful non-singular matrix? Yes you can, sometimes, but the answer depends very much on the problem you’re trying to solve. This is generally known as regularization, though it goes by different names in different contexts. You use knowledge of your domain beyond the specific data at hand to guide how you change your matrix.

More linear algebra posts

Obsession

Obsession has come to have a positive connotation. Individuals and companies brag about being obsessed about this or that. But obsession is a psychosis, and the original meaning of the word is still valid.

Obsession, according to the canons of psychology, occurs when an innocuous idea is substituted for a painful one. The victim simply avoids recognizing the thing which will hurt. [source]

Obsession with small things is a way to avoid thinking about big things. Maybe our company is sinking like a rock, but our website is valid XHTML 1.1! Maybe my relationships are falling apart, but I’ve almost got all my GTD applications configured the way I want them. Maybe my paper makes absolutely no contribution to science, but now I’ve got another publication.

Paying attention to detail because it is important is not obsession. And when people brag about being obsessed, they’re probably trying to imply that this is what they’re doing. Details matter in relation to a bigger picture. Focusing on details in isolation, hoping that the bigger picture will take care of itself, is obsession.

Running Linux on Azure

Microsoft began offering Linux virtual machine hosting on its Azure platform this week.

I created an Ubuntu 12.04 server this morning and everything worked smoothly. I found it much easier to create and connect to a virtual machine on Azure than on Amazon EC2.

Painting with Numbers

Painting with Numbers (ISBN 1118172574) is a new book of advice on making numerical presentations.

The book is very elementary. It contains no math beyond arithmetic, and it focuses almost entirely on financial data in Excel spreadsheets. But it does have useful tips. A lot of presentations would be easier to understand if the presenter had read this book. Think of it as a sort of Tufte-lite.

One of the themes in the book is a list of 18 “deadly sins” of presentation. Here are the first three.

  1. Not right-justifying a column of numbers.
  2. Basing column width or row height on the length of the caption.
  3. Using visual effects for any reason other than clarifying, distinguishing, or adding meaning to information.

I like #17: “I know most of you can’t read the numbers on this slide, but …”

Visual Basic and coyotes

David S. Platt calls VB 6 developers a silent majority. He says the vast majority of VB developers did not want the power (and associated complexity) of VB.NET, but Microsoft only heard from the vocal minority who felt otherwise. And so a large number of VB programmers cling to VB 6. As Platt puts it,

But giving Visual Basic .NET to the Visual Basic 6 community was like raising a coyote as a domestic dog, then releasing him into the woods, shouting, “Hunt for your dinner as God intended, you magnificent, wild creature!” Most of them said, “Heck with that. I’m staying on my nice warm cushion by the fire while you open me a can of Alpo.”

Small data

Big data is getting a lot of buzz lately, but small data is interesting too. In some ways it’s more interesting. Because of limit theorems, a lot of things become dull in the large that are more interesting in the small.

When working with small data sets you have to accept that you will very often draw the wrong conclusion. You just can’t have high confidence in inference drawn from a small amount of data, unless you can do magic. But you do the best you can with what you have. You have to be content with the accuracy of your method relative to the amount of data available.

For example, a clinical trial may try to find the optimal dose of some new drug by giving the drug to only 30 patients. When you have five doses to test and only 30 patients, you’re just not going to find the right dose very often. You might want to assign 6 patients to each dose, but you can’t count on that. For safety reasons, you have to start at the lowest dose and work your way up cautiously, and that usually results in uneven allocation to doses, and thus less statistical power. And you might not treat all 30 patients. You might decide—possibly incorrectly—to stop the trial early because it appears that all doses are too toxic or ineffective. (This gives a glimpse of why testing drugs on people is a harder statistical problem than testing fertilizers on crops.)

Maybe your method finds the right answer 60% of the time, hardly a satisfying performance. But if alternative methods find the right answer 50% of the time under the same circumstances, your 60% looks great by comparison.

Related post: The law of medium numbers