Installing R and Rcpp

Almost exactly a year ago, I wrote about my frustration calling C++ from R. Maybe this will become an annual event because I’m back at it.

This time my experience was more pleasant. I was able to install Rcpp on an Ubuntu virtual machine and run example 2.2 from the Rcpp FAQ once I upgraded R to the latest version. I wrote up some notes on the process of installing the latest version of R and Rcpp on Ubuntu.

I have not yet been able to run Rcpp on Windows.

Update: Thanks to Dirk Eddelbuettel for pointing out in the comments that you can install Rcpp from the shell rather than from inside R by running sudo apt-get install r-cran-rcpp. With this approach, I was able to install Rcpp without having to upgrade R first.

Simplicity is hard to sell

People say they want simple things, but they don’t. Not always.

Donald Norman argues in Living with Complexity that people really want things that are easy to use, not things that are simple. They want things that are familiar, even if the familiar is complex.

People also want all the features they can handle. If you’re competing with a product that is overwhelmingly complex, then simplicity sells. Otherwise simplicity is perceived as weakness. The more features the better. People may enjoy using simpler products, but they buy complex ones.

It’s hard to remove even vestigial features. If people are in the habit of doing something that is no longer necessary (or never did add much value), you’ll have a lot of work to do to convince them that it’s OK to let go of it.

Related post: Simple versus easy

Shell != REPL

A shell is not the same as a REPL (Read Evaluate Print Loop). They look similar, but they have deep differences.

Shells are designed for one-line commands, and they’re a little awkward when used as programming languages.

Scripting languages are designed for files of commands, and they’re a little awkward to use from a REPL.

IPython is an interesting hybrid. You could think of it as a Python REPL with shell-like features added. Eshell is another interesting compromise, a shell implemented in Emacs Lisp that also works as a Lisp REPL. These hybrids are evidence that as much as people like their programming languages, they appreciate additions to a pure language REPL.

Learn PowerShell as a shell first

When I was learning PowerShell, I thought of it as a scripting language that has a shell. But the right way to think of it may be the opposite, a shell that has a scripting language. Apparently others have followed this same change of perspective.

Don Jones and Jeffrey Hicks open their new book Learn Windows PowerShell 3 in a Month of Lunches by explaining that they first saw PowerShell as a VBScript replacement and taught it that way.

Since late 2009, however, a shift has occurred. More and more administrators who don’t have prior VBScript experience have started trying to learn the shell. All of a sudden, our old teaching patterns didn’t work very well, because we focused on scripting and programming. That’s when we realized that PowerShell isn’t really a scripting language. It’s really a command-line shell where you run command-line utilities. Like all good shells, it has scripting capabilities, but you don’t have to use them, and you certainly don’t have to start with them.

[Don Jones wrote the first edition of the book, which I haven’t read. This quote comes from the second edition, coauthored with Jeffrey Hicks.]

Other PowerShell books I’ve read felt more like a programming language book than a book on a shell, more like, for example, The C Programming Language than The Linux Command Line. The latter seems to be a better analog for the new PowerShell 3 book: emphasis on interactive use first, programming second. At least that’s my first impression. The book isn’t finished yet — it’s part of Manning’s Early Release program — and I haven’t yet read all of the part that has been released.

If you’d like a chance to win a copy of Learn Windows PowerShell 3 in a Month of Lunches, please leave a comment below. I’ll draw a winner and the publisher will send you a copy of the book.

Update: I’ll draw a winner on Friday, June 29. Comments need to be posted by midnight Thursday in my time zone.

Related links:

Crayons to Computers

I saw a daycare named “Crayons to Computers” recently. I assume the implication is that crayons are basic and computers are advanced. Programming a computer is more advanced than writing with crayons, but surely their clients don’t learn to program computers. They probably play video games, which takes considerably less dexterity and mental skill than using crayons.

So maybe the place should be named “Computers to Crayons.” Or if they really wanted to offer advanced instruction, perhaps they could be “Crayons to Clarinets.” Learning to play clarinet takes much more work than learning to play a computer game.

Window dressing for ideological biases

From Russ Roberts on the latest EconTalk podcast:

… this is really embarrassing as a professional economist — but I’ve come to believe that there may be no examples … of where a sophisticated multivariate econometric analysis … where important policy issues are at stake, has led to a consensus. Where somebody says: Well, I guess I was wrong. Where somebody on the other side of the issue says: Your analysis, you’ve got a significant coefficient there — I’m wrong. No. They always say: You left this out, you left that out. And they’re right, of course. And then they can redo the analysis and show that in fact — and so what that means is that the tools, instead of leading to certainty and improved knowledge about the usefulness of policy interventions, are merely window dressing for ideological biases that are pre-existing in the case of the researchers.

Emphasis added.

Related post: Numerous studies have confirmed …

Methods that get used

I have a conjecture regarding statistical methods:

The probability of a method being used drops by at least a factor of 2 for every parameter that has to be determined by trial-and-error.

A method could have a dozen inputs, and if they’re all intuitively meaningful, it might be adopted. But if there is even one parameter that requires trial-and-error fiddling to set, the probability of use drops sharply. As the number of non-intuitive parameters increases, the probability of anyone other than the method’s author using the method rapidly drops to zero.

John Tukey said that the practical power of a statistical test is its statistical power times the probability that someone will use it. Therefore practical power decreases exponentially with the number of non-intuitive parameters.

Related post: Software that gets used

Relearning to type

I’m starting to feel some strain in my hands, so I’m going to take Vivek Haldar’s advice:

Act like you do have RSI, and change your set up right now to avoid it.

For one thing, I bought an ergonomic keyboard this weekend. It lets me hold my hands in a more relaxed position.

When I learned to type, I learned to use the shift key on the opposite hand of a letter to capitalize it. For example, holding down the shift key with my right hand to type an A with my left hand. But for some reason, I never developed the analogous habit for the Control and Alt keys; I only use these keys with my left hand. So while I’m getting used to a new keyboard, I’m going to try to use Control and Alt on the opposite hand of the letter they modify.

I may also try remapping another key to be the Escape key. I use Emacs, so I use the Escape key frequently. Some people recommend remapping Caps Lock to be an Escape key. I currently have the Caps Lock key mapped to be a control key. I may go back to using the default left control key and use Caps Lock as the Escape key. Or maybe I’ll remap the context menu key between the Alt and Control keys on the right side since I never use it.

Related links:

Institutional mediocrity

Here are a couple quotes I’ve run across lately. Both say that attempts to filter out the low end of human thought inevitably filter out some of the high end too.

First, from Doug Gwyn:

Unix was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things.

Second, from Dave Gray:

The more idiot-proof the system, the more you will constrain behavior, forcing people to act like idiots, even when it’s against their better judgment. Even when people know there’s a better way to do something, they will often be constrained by policies and procedures that were designed to reduce variety in the system. If your system needs to solve problems that you can’t anticipate, then you have a problem, because automated systems and idiots can’t solve problems.

Related posts:

Root-finding with noisy functions

Suppose you have function g(x) and you want to find x so that g(x) = 0. However, you don’t have direct access to g(x). Instead you can evaluate f(x) which is g(x) plus random noise. Typically f(x) would be an approximation of g(x) based on Monte Carlo simulation. I have spent countless hours solving problems like this.

One difficulty is that the value of f(x) will change every time you compute it. You might hunt down x and be sure you have it braketed between two values, only to find out you were wrong when you compute f(x) more accurately, say with more Monte Carlo steps.

Assume g(x), i.e. f(x) without noise, is monotone increasing. Assume also that f(x) is expensive to compute. Say each evaluation takes 20 minutes. The amplitude of the noise may vary with x.

Here’s a simple approach to trying to find where f(x) is zero. Use a bisection method, but stop when it looks like you’ve come as close as you can due to noise. If the monotonicity assumption is violated, use that as a signal that you’re within the resolution of the noise.

# Assume a < b, fa = f(a) < 0, fb = f(b) > 0.
def noisy_bisect(f, a, b, fa, fb, tolerance):
    if b-a < tolerance:
        return (a, b)
    mid = 0.5*(a+b)
    fmid = f(mid)
    if fmid < fa or fmid > fb:
        # Monotonicity violated. Reached resolution of noise.
        return (a, b)
    if fmid < 0:
        a, fa = mid, fmid
    else:
        b, fb = mid, fmid
    return noisy_bisect(f, a, b, fa, fb, tolerance)

The values of f(a) and f(b) are saved to avoid recalculation since they’re expensive to obtain.

You could experiment with this function as follows.

from scipy.stats import norm
from time import sleep

# Gaussian distribution w/ mean 0 and standard deviation 0.01
noise = norm(0, 0.01)

def f(x):
    sleep(10) # wait 10 seconds
    return x**2 - 1 + noise.rvs()

a = 0
b = 3
fa = f(a)
fb = f(b)
print noisy_bisect(f, a, b, fa, fb, 0.0001)

Without noise, the code would iterate until the zero is bracketed in an interval of width 0.0001. However, since the noise in on the order of 0.01, the method will typically stop when the bracket is a on the order of 0.01 wide, maybe a little smaller.

We could make the root-finding software a little faster by using a secant method rather than a bisection method. Instead of taking the midpoint, the secant method draws a line between (a, f(a)) and (b, f(b)) and uses the point where the line crosses the x-axis as the next guess.

A more sophisticated approach would be to increase the accuracy of f(x) over time. If f(x) is evaluated using N steps of a Monte Carlo simulation, the standard deviation of the noise is roughly proportional to 1/√N. So, for example, cutting the noise in half requires multiplying N by 4. You could get a rough bracket on the zero using a small value of N, then increase N as the bracket gets smaller. You could play with this by using a function like the following.

def f(x, N):
    sleep(N) # wait N seconds
    noise = norm(0, 0.1/sqrt(N))
    return x**2 - 1 + noise.rvs()

You can make the noise as small as you’d like, but at a price. And the price goes up faster than the noise goes down, so it pays to not reduce the noise more than necessary.

Unhappy programmers are all alike

A recent pair of articles from Dr. Dobbs reminded me of a famous line from Anna Karenina:

All happy families are alike, but each unhappy family is unhappy in its own way.

Paul Johnson argues that Tolstoy had it backward, that there is more variety in happy families. He argues that there are “obvious, recurring patterns in unhappy families” such as drunkenness and adultery. In any case, back to programming.

Are unhappy programmers all alike? Or happy programmers?

Andrew Binstock wrote a pair of articles asking what makes great programmers different and what makes bad programmers different.

As for great programmers, Binstock says they have wide knowledge and good judgment of how to apply that knowledge. True, but not unique to programming.

But he also says great programmers have the ability to quickly switch between levels of abstraction. Such a skill could be useful in other professions, but it’s particularly useful in programming because software development requires knowledge of many different levels of abstraction. For example, data ranges from bits to terabytes, each with its own level of abstraction. Not many professions regularly have to think about phenomena at multiple resolutions ranging over 12 orders of magnitude.

As for bad programmers, Binstock lists a number of common traits but boils it down to this: “In my experience, at the core of bad programming is invariably a lack of curiosity.” I’d agree that a lack of curiosity explains a lot, but it leaves a great deal of unexplained diversity in bad programming

Making a singular matrix non-singular

Someone asked me on Twitter

Is there a trick to make an singular (non-invertible) matrix invertible?

The only response I could think of in less than 140 characters was

Depends on what you’re trying to accomplish.

Here I’ll give a longer explanation.

So, can you change a singular matrix just a little to make it non-singular? Yes, and in fact you can change it by as little as you’d like. Singular square matrices are an infinitely thin subset of the space of all square matrices, and any tiny random perturbation is almost certain to take you out of that thin subset. (“Almost certain” is actually a technical term here, meaning with probability 1.)

Adding a tiny bit of noise to a singular matrix makes it non-singular. But why did you want a non-singular matrix? Presumably to solve some system of equations. A little noise makes the system go from theoretically impossible to solve to theoretically possible to solve, but that may not be very useful in practice. It will let you compute a solution, but that solution may be meaningless.

A little noise makes the determinant go from zero to near zero. But here is an important rule in numerical computation:

If something is impossible at 0, it will be hard near zero.

“Hard” could mean difficult to do or it could mean the solution is ill-behaved.

Instead of asking whether a matrix is invertible, let’s ask how easy it is to invert. That is, let’s change a yes/no question into a question with a continuum of answers. The condition number of a matrix measures how easy the matrix is to invert. A matrix that is easy to invert has a small condition number. The harder it is to invert a matrix, the larger its condition number. A singular matrix is infinitely hard to invert, and so it has infinite condition number. A small perturbation of a singular matrix is non-singular, but the condition number will be large.

So what exactly is a condition number? And what do I mean by saying a matrix is “hard” to invert?

The condition number of a matrix is the norm of the matrix times the norm of its inverse. Defining matrix norms would take too long to go into here, but intuitively it is a way of sizing up a matrix. Note that you take the norm of both the matrix and its inverse. A small change to a matrix might not change its norm much, but it might change the norm of its inverse a great deal. If that is the case, the matrix is called ill-conditioned because it has a large condition number.

When I say a matrix is hard to invert, I mean it is hard to do accurately. You can use the same number of steps to invert a matrix by Gaussian elimination whether the matrix has a small condition number or a large condition number. But if the matrix has a large condition number, the result may be very inaccurate. If the condition number is large enough, the solution may be so inaccurate as to be completely useless.

You can think of condition number as an error multiplier. When you solve the linear system Ax = b, you might expect that a little error in b would result in only a little error in x. And that’s true if A is well-conditioned. But a small change in b could result in a large change in x if A is ill-conditioned. If the condition number of A is 1000, for example, the error in x will be 1000 times greater than the error in b. So you would expect x to have three fewer significant figures than b.

Note that condition number isn’t limited to loss of precision due to floating point arithmetic. If you have some error in your input b, say due to measurement error, then you will have some corresponding error in your solution to Ax = b, even if you solve the system Ax = b exactly. If the matrix A is ill-conditioned, any error in b (rounding error, measurement error, statistical uncertainty, etc.) will result in a correspondingly much larger error in x.

So back to the original question: can you make a singular matrix non-singular? Sure. In fact, it’s hard not to. Breathe on a singular matrix and it becomes non-singular. But the resulting non-singular matrix may be useless. The perturbed matrix may be theoretically invertible but practically non-invertible.

So here’s a more refined question: can you change a singular matrix into a useful non-singular matrix? Yes you can, sometimes, but the answer depends very much on the problem you’re trying to solve. This is generally known as regularization, though it goes by different names in different contexts. You use knowledge of your domain beyond the specific data at hand to guide how you change your matrix.

Related posts:

Obsession

Obsession has come to have a positive connotation. Individuals and companies brag about being obsessed about this or that. But obsession is a psychosis, and the original meaning of the word is still valid.

Obsession, according to the canons of psychology, occurs when an innocuous idea is substituted for a painful one. The victim simply avoids recognizing the thing which will hurt. [source]

Obsession with small things is a way to avoid thinking about big things. Maybe our company is sinking like a rock, but our web site is valid XHTML 1.1! Maybe my relationships are falling apart, but I’ve almost got all my GTD applications configured the way I want them. Maybe my paper makes absolutely no contribution to science, but now I’ve got another publication.

Paying attention to detail because it is important is not obsession. And when people brag about being obsessed, they’re probably trying to imply that this is what they’re doing. Details matter in relation to a bigger picture. Focusing on details in isolation, hoping that the bigger picture will take care of itself, is obsession.

Running Linux on Azure

Microsoft began offering Linux virtual machine hosting on its Azure platform this week.

I created an Ubuntu 12.04 server this morning and everything worked smoothly. I found it much easier to create and connect to a virtual machine on Azure than on Amazon EC2.