Posts tagged as:

Programming

Interview with Clojure author

by John on March 8, 2010

Simple-talk has an interview with Rich Hickey, author of the programming language Clojure (pronounced “closure”). Clojure is a dialect of Lisp designed to run on top of the Java Virtual Machine. The language is also being ported to the .NET framework as Clojure CLR.

Two things stood out to me in the interview: a comparison of Lisp with C++, and a discussion of complexity.

You’ll often hear a programmer argue that language X is better than language Y.  To support their argument, they’ll say they wrote a program in Y, then wrote it in X in less time. For example, someone might argue that Ruby is better than Python because they were able to rewrite their web site using Ruby in half the time it took to write the original Python version. Such arguments are weak because you can write anything faster the second time. The first implementation required analysis and design that the second implementation can reuse entirely or at least learn from.

Rich Hickey argues that he can develop programs in Lisp faster than in C++. He offers as support that he first wrote something in Lisp and then took three times longer to rewrite it in C++. This is just a personal anecdote, not a scientific study, but it carries more weight than the usual anecdote because he’s claiming the first language was more efficient than the second.

In his discussion of incidental complexity, complexity coming from ones tools rather than from the intrinsic complexity of the problem being solved, Hickey says

I think programmers have become inured to incidental complexity, in particular by confusing familiar or concise with simple. And when they encounter complexity, they consider it a challenge to overcome, rather than an obstacle to remove. Overcoming complexity isn’t work, it’s waste.

The phrase “confusing familiar or concise with simple” is insightful. I never appreciated the arguments about the complexity of C++ until I got a little distance from the language; C++ was so familiar I didn’t appreciate how complex it is until I had a break from writing it. Also, simple solutions are usually concise, but concise solutions may not be simple. I chuckle whenever I hear someone say a problem was simple to solve because they were able to solve it in one line — one long stream of entirely mysterious commands.

Thanks to Omar Gomez for pointing out the interview article.

Related posts:

A little simplicity goes a long way
I disagree with Torvalds about C++
Baklava code

{ 3 comments }

What do you learn just in case you’ll need it in the future, and what do you learn just in time when you do need it?

In general, you learn things in school just in case you’ll need them later. Then once you get a job, you learn more things just in time when you need them.

When you learn just in time, you’re highly motivated. There’s no need to imagine whether you might apply what you’re learning since the application came first. But you can’t learn everything just in time. You have to learn some things before you can imagine using them. You need to have certain patterns in your head before you can recognize them in the wild.

Years ago someone told me that he never learned algebra and has never had a need for it. But I’ve learned algebra and use it constantly. It’s a lucky thing I was the one who learned algebra since I ended up needing it. But of course it’s not lucky. I would not have had any use for it either if I’d not learned it.

The difference between just-in-case and just-in-time is like the difference between training and trying. You can’t run a marathon by trying hard. The first person who tried that died. You have to train for it. You can’t just say that you’ll run 26 miles when you need to and do nothing until then.

Software developers prefer just in time learning. There’s so much out there that you aren’t going to need. You can’t learn every detail of every operating system, every programming language, every library etc. before you do any real work. You can only remember so much arbitrary information without a specific need for it. Even if you could learn it all in the abstract, you’d be decades into your career without having produced anything. On top of that, technological information has a short shelf life, so it’s not worthwhile to learn too much that you’re not sure you have a need for.

On the other hand, you need to know what’s available, even if you’re only going to learn the details just in time. You can’t say “I need to learn about version control system now” if you don’t even know what version control is. You need to have a survey knowledge of technology just in case. You can learn APIs just in time. But there’s a big gray area in between where it’s hard to know what is worthwhile to learn and when.

Related posts:

Software that gets used
Why programmers write unneeded code
Don’t standardize education, personalize it
Worthless technical books

{ 22 comments }

Amateur software

by John on February 10, 2010

I’m growing increasingly frustrated with amateur software. Before I explain why, let me first be clear on what I do not mean by amateur.

  • Amateur does not mean low quality. Some amateur software is outstanding, and some professional software is terrible.
  • Amateur does not mean open source. Some amateur projects are open source and some are not.

I’m using “amateur software” to mean software projects developed by volunteers. I imagine most amateur software is written by professional developers. These are folks paid to write software for a company by day who then work on something else they love by night.

Open source software is not necessarily amateur software. Linux, for example, is now professional software. Around 75% of Linux kernel development is carried out by people paid to work on Linux. Some of the best software is both open source and at least partially professional.

Volunteers do what they want to do by definition. The problem is that the reverse is also true: volunteers do not do what they do not want to do. And for software developers, writing documentation usually falls in the “do not want to do” column. So does making software easy to install. So does testing in multiple environments.

When a company has an interest in a piece of software, they can pay people to do the tasks the volunteers don’t want to do. In fact, if they’re smart, they will concentrate their efforts precisely on the tasks volunteers don’t want to do. In this way even one or two paid staff can make an enormous contribution to a largely volunteer project.

Some amateur projects are highly polished. These may be small projects lead by rare individuals who pay attention to details beyond pure software development. More often, these are large mature projects that have so many volunteers that they have a few who are willing to do tasks that most developers do not want to do.

Related posts:

Shallow bugs versus reported bugs
Software profitability in the middle
Hard to spend money

{ 11 comments }

You can’t force people to provide metadata

by John on February 7, 2010

I ran across a long rant from Steve Yegge this evening about junior programmers. In a nutshell, Yegge says they like to play around with metadata rather than getting real work done.

Here’s an insightful observation Yegge makes along the way.

And Haskell, OCaml and their ilk … try to force people to model everything. Programmers hate that. These languages will never, ever enjoy any substantial commercial success, for the exact same reason the Semantic Web is a failure. You can’t force people to provide metadata for everything they do. They’ll hate you.

Related post:

Probability of semantic markup being correct

{ 9 comments }

Little programs versus big programs

by John on February 3, 2010

From You Are Not a Gadget:

Little programs are delightful to write in isolation, but the process of maintaining large-scale software is always miserable. … Technologists wish every program behaved like a brand-new, playful little program, and will use any available psychological strategy to avoid thinking about computers realistically.

Related posts:

Writes large, correct programs
Why there will always be programmers

{ 3 comments }

How to compute the soft maximum

by John on January 20, 2010

The most obvious way to compute the soft maximum can easily fail due to overflow or underflow.

The soft maximum of x and y is defined by

g(x, y) = log( exp(x) + exp(y) ).

The most obvious way to turn the definition above into C code would be

double SoftMaximum(double x, double y)
{
    return log( exp(x) + exp(y) );
}

This works for some values of x and y, but fails if x or y is large. For example, if we use this to compute the soft maximum of 1000 and 200, the result is numerical infinity. The value of exp(1000) is too big to represent in a floating point number, so it is computed as infinity. exp(200) is finite, but the sum of an infinity and a finite number is infinity. Then the log function applied to infinity returns infinity.

We have the opposite problem if we try to compute the soft maximum of -1000 and -1200. In this computation exp(-1000) and exp(-1200) both underflow to zero, and the log function returns negative infinity for the logarithm of zero.

Fortunately it’s not hard to fix the function SoftMaximum to avoid overflow and underflow. Look what happens when we shift both arguments by a constant.

log( exp(x – k) + exp(y – k) ) = log( exp(x) + exp(y) ) – k.

This says

log( exp(x) + exp(y) ) = log( exp(x -k) + exp(y-k) ) + k

If we pick k to be the maximum of x and y, then one of the calls to exp has argument 0 (and so it returns 1) and the other has a negative argument. This means the follow code cannot overflow.

double SoftMaximum(double x, double y)
{
	double maximum = max(x, y);
	double minimum = min(x, y);
	return maximum + log( 1.0 + exp(minimum - maximum) );
}

The call to exp(minimum - maximum) could possibly underflow to zero, but in that case the code returns maximum. And in that case the return value is very accurate: if maximum is much larger than minimum, then the soft maximum is essentially equal to maximum.

The equation for the soft maximum implemented above has a few advantages in addition to avoiding overflow. It makes it clear that the soft maximum is always greater than the maximum. Also, it shows that the difference between the hard maximum and the soft maximum is controlled by the spread of the arguments. The soft maximum is nearest the hard maximum when the two arguments are very different and furthest from the hard maximum when the two arguments are equal.

Thanks to Andrew Dalke for suggesting the topic of this post by his comment.

Related links:

Soft maximum
Anatomy of a floating point number
Avoiding overflow, underflow, and loss of precision

{ 3 comments }

Ten surprises from numerical linear algebra

by John on January 20, 2010

Here are ten things about numerical linear algebra that you may find surprising if you’re not familiar with the field.

  1. Numerical linear algebra applies very advanced mathematics to solve problems that can be stated with high school mathematics.
  2. Practical applications often require solving enormous systems of equations, millions or even billions of variables.
  3. The heart of Google is an enormous linear algebra problem. PageRank is essentially an eigenvalue problem.
  4. The efficiency of solving very large systems of equations has benefited at least as much from advances in algorithms as from Moore’s law.
  5. Many practical problems — optimization, differential equations, signal processing, etc. — boil down to solving linear systems, even when the original problems are non-linear. Finite element software, for example, spends nearly all its time solving linear equations.
  6. A system of a million equations can sometimes be solved on an ordinary PC in under a millisecond, depending on the structure of the equations.
  7. Iterative methods, methods that in theory require an infinite number of steps to solve a problem, are often faster and more accurate than direct methods, methods that in theory produce an exact answer in a finite number of steps.
  8. There are many theorems bounding the error in solutions produced on real computers. That is, the theorems don’t just bound the error from hypothetical calculations carried out in exact arithmetic but bound the error from arithmetic as carried out in floating point arithmetic on computer hardware.
  9. It is hardly ever necessary to compute the inverse of a matrix.
  10. There is remarkably mature software for numerical linear algebra. Brilliant people have worked on this software for many years.

Related posts:

Don’t invert that matrix
Searching for John Francis
Applying PageRank to biology
Matrix cookbook
What is the cosine of a matrix?

{ 8 comments }

Software sins of omission

by John on January 12, 2010

The Book of Common Prayer contains the confession

… we have left undone those things which we ought to have done, and we have done those things which we ought not to have done.

The things left undone are called sins of omission; things which ought not to have been done are called sins of commission.

In software testing and debugging, we focus on sins of commission, code that was implemented incorrectly. But according to Robert Glass, the majority of bugs are sins of omission. In Frequently Forgotten Fundamental Facts about Software Engineering Glass says

Roughly 35 percent of software defects emerge from missing logic paths, and another 40 percent are from the execution of a unique combination of logic paths.

If these figures are correct, three out of four software bugs are sins of omission, errors due to things left undone. These are bugs due to contingencies the developers did not think to handle. Three quarters seems like a large proportion, but it is plausible. I know I’ve often written plenty of bugs that amounted to not considering enough possibilities, particularly in graphical user interface software. It’s hard to think of everything a user might do and all the ways a user might arrive at a particular place. (When I first wrote user interface applications, my reaction to a bug report would be “Why would anyone do that?!” If everyone would just use my software the way I do, everything would be OK. )

It matters whether bugs are sins of omission or sins of commission. Different kinds of bugs are caught by different means. Developers have come to appreciate the value of unit testing lately, but unit tests primarily catch sins of commission. If you didn’t think to program something in the first place, you’re not likely to think to write a test for it. Complete test coverage could only find 25% of a projects bugs if you assume 75% of the bugs come from code that no one thought to write.

The best way to spot sins of omission is a fresh pair of eyes. As Glass says

Rigorous reviews are more effective, and more cost effective, than any other error-removal strategy, including testing. But they cannot and should not replace testing.

One way to combine the benefits of unit testing and code reviews would be to have different people write the unit tests and the production code.

Related posts:

The most subtle of the seven deadly sins
Shallow bugs versus reported bugs
Negative space in operating systems

{ 7 comments }

Camtasia as a software deployment tool

by John on January 10, 2010

Last week .NET Rocks mentioned a good idea in passing: start a screencast tool like Camtasia before you do a software install. Michael Learned, told the story of a client that asked him to take screen shots of every step in the installation of Microsoft’s Team Foundation Server. Carl Franklin commented “What a great idea to throw Camtasia on there and record the whole process.”

It would be better if the installation process were scripted and not just recorded, but sometimes that’s not practical. Sometimes clicking a few buttons is absolutely necessary or at least far easier than writing a script. And even if you think your entire process is automated with a script, a screencast might be a good idea. It could record little steps you have to do in order to run your script, details that are easily forgotten.

Another way to use this idea would be to have one person do a practice install on a test server while recording the process. Then another person could document and script the process by studying the video. This would be helpful when the person who knows how to do the installation lacks either the verbal skills to explain the process or the scripting skills to automate it.

Related posts:

Rotating programmers
Automated software builds
Programming the last mile

{ 4 comments }

Better tools, less productivity?

by John on January 6, 2010

Can better tools make you less productive? Here’s a quote from Frequently Forgotten Fundamental Facts about Software Engineering by Robert Glass:

Most software tool and technique improvements account for about a 5- to 30-percent increase in productivity and quality. … Learning a new tool or technique actually lowers programmer productivity and product quality initially. You achieve the eventual benefit only after overcoming this learning curve.

If you’re always learning new tools, you may be less productive than if you stuck with your old tools a little longer, even if the new tools really are better. And especially if you’re a part-time developer, you may never reach the point where a new tool pays for itself before you throw it away and pick up a new one. Kathleen Dollard wrote an editorial to this effect in 2004 entitled Save The Hobbyist Programmer.

Miners know they have a significant problem when the canary they keep with them stops singing. Hobbyist/part-time programmers are our industry’s version of the canary, and they have stopped singing. People who program four to eight hours a week are being cut out of the picture because they can’t increase their skills as fast as technology changes. That’s a danger signal for the rest of us.

So what do you do? Learn quickly or change slowly. The first option is to commit to learning a new tool quickly, invest heavily in up-front training, and use the tool as much as you can before the next one comes along.  This is the favored option for ambitious programmers who want to maximize their marketability by always using the latest tools.

The second option is to develop a leap frog strategy, letting some new things pass you by.  The less time you spend per week programming, the less often you should change tools. Change occasionally, yes, but wait for big improvements.

Related posts:

Doing good work with bad tools
Fear of tech commitment
Three-hour per week language

{ 3 comments }

Fear of tech commitment

by John on December 29, 2009

According to the stereotypes, men fear committing to relationships. I find that hard to relate to. But I can relate to fear of technological commitment. I don’t want to take the time to learn something well that’s going to go away in a year. Like anyone else I want to pick the best tool for the job, but sometimes I’ve invested too much time in evaluation.

In a panel discussion on whether software development has become too complex, one of the major complaints was the bewildering number of options. The implicit assumption is that one must evaluate every option. This is an emotional reaction driven by fear of missing out.

Looking back on technologies that have come and gone, the best option was never orders of magnitude better than the second best option. We expect that the choices facing us now matter a great deal, despite knowing that similar decisions in the past didn’t matter that much.

Not only are some of our choices not so important, they don’t last so long either. We act as if we’re picking the technology we’re going to use for the rest of our lives. In reality, we may be picking the technology we’re going to use for the next year.

Very often it’s not worth the deliberation to pick the “best” technology. Pick a good one and don’t look back.

Related posts:

Shallow bugs versus reported bugs
Three-hour per week language
Doing good work with bad tools

{ 2 comments }

The most productive programmers are orders of magnitude more productive than average programmers. But salaries usually fall within a fairly small range in any company. Even across the entire profession, salaries don’t vary that much. If some programmers are 10x more productive than others, why aren’t they paid 10x as much?

Joel Spolsky gave a couple answers to this question in his most recent podcast. First, programmer productivity varies tremendously across the profession, but it may not vary so much within a given company. Someone who is 10x more productive than his colleagues is likely to leave, either to work with other very talented programmers or to start his own business. Second, extreme productivity may not be obvious. This post elaborates on this second reason.

How can someone be 10x more productive than his peers without being noticed? In some professions such a difference would be obvious. A salesman who sells 10x as much as his peers will be noticed, and compensated accordingly. Sales are easy to measure, and some salesmen make orders of magnitude more money than others. If a bricklayer were 10x more productive than his peers this would be obvious too, but it doesn’t happen: the best bricklayers cannot lay 10x as much brick as average bricklayers. Software output cannot be measured as easily as dollars or bricks. The best programmers do not write 10x as many lines of code and they certainly do not work 10x longer hours.

Programmers are most effective when they avoid writing code. They may realize the problem they’re being asked to solve doesn’t need to be solved, that the client doesn’t actually want what they’re asking for. They may know where to find reusable or re-editable code that solves their problem. They may cheat. But just when they are being their most productive, nobody says “Wow! You were just 100x more productive than if you’d done this the hard way. You deserve a raise.” At best they say “Good idea!” and go on.  It may take a while to realize that someone routinely comes up with such time-saving insights. Or to put it negatively, it may take a long time to realize that others are programming with sound and fury but producing nothing.

The romantic image of an über-programmer is someone who fires up Emacs, types like a machine gun, and delivers a flawless final product from scratch. A more accurate image would be someone who stares quietly into space for a few minutes and then says “Hmm. I think I’ve seen something like this before.”

Related posts:

Writes large correct programs
Experienced programmers and lines of code

{ 96 comments }

Three quotes on software development

by John on November 19, 2009

Here are three quotes on software development I ran across yesterday.

From Douglas Crockford, author of JavaScript, The Good Parts:

Just because something is a standard it doesn’t mean it’s the right choice for every application (e.g. XML).

From Yukihiro Matsumoto, creator of Ruby:

An open source project is like a shark. It must keep moving, or it will die.

From Roger Sessions, CTO of ObjectWatch:

A good IT architecture is made up largely of agreements to disagree. … Bad architectures and good both contain disagreements, but the bad architectures lack agreements on how to do so.

I once worked on a project that had a proprietary file format that became more sophisticated over time until it resembled a primitive relational database. After that I resolved to use standard technologies as much as possible. I think others have had the same experience and overreacted, using standard technologies even when they are overkill. Crockford’s comment is a reminder to moderate one’s zeal for standards. Moderation in all things.

I would add to Matsumoto’s comment that it’s not only open source projects that need to keep moving or die, though they may have an extra need for movement to maintain credibility.

The way I understand Sessions’ comment is that good architecture focuses on high level agreement rather than low-level conformity. “Let’s rewrite all our code in Java” is not a good software architecture. Or one that I hear more often “Let’s move everything to Oracle.” Such low-level standardization does not guarantee a coherently organized system. Whether subsystems use the same implementation technologies is not as important as whether there is a good strategy for making the pieces fit together.

Related posts:

Enterprise software
Million dollar software technique
JavaScript: A picture is worth a thousand words

{ 1 comment }

Software Archeology

by John on November 10, 2009

The most recent episode of Software Engineering Radio is Software Archeology with Dave Thomas. In his interview, Dave Thomas gives many practical tips for how to read code, especially when inheriting a project. This interview should be required listening for computer science students. They spend the majority of their time writing code while they’re in school and yet they will spend the majority of their time reading code once they get out — reading code in order to debug or extend it, and if they’re smart, reading code to learn from it.

Dave Thomas attributes one of his most unusual suggestions to Ward Cunningham. Thomas says Cunningham recommends pasting code into Microsoft Word and viewing it in a 2 point font. At this font size you cannot possibly read the code, but you can tell a great deal about the structure of the code. For example, you may spot duplicate code by recognizing a recurring shape.

I tested Ward Cunningham’s idea on a couple source files.

Example 1:

Example 2:

Example 1 has short functions. Near the bottom of the clip something is very repetitive. Skimming through the entire file you see several of these repetitive blocks. (This is test code. The blocks are computed values and expected values for comparison.)

Example 2 looks  quite different from Example 1. The image comes from one long function. (This was taken from FORTRAN code that had been programmatically translated in to C++. The frequent short dashes on the left are labels for goto statements.)

Related posts:

Reviewing catch blocks
Finding bad error messages

{ 6 comments }

Shallow bugs versus reported bugs

by John on October 25, 2009

The open source community has a saying: With enough eyes, all bugs are shallow. When enough people look at a piece of code, someone is going find and fix the bugs.

A related principle is that with enough users, all bugs will be reported. With enough people use the software, someone else is going to run into the problem. Someone will report it. Someone will talk about it in an online forum. Someone will blog about it and post a work-around until the bug is fixed. This principle deserves more attention; it’s not cited as often as the shallow bugs principle.

Ideally, you want to use software with lots of eyes and lots of users. Firefox is an open source product with lots eyes and lots of users. But more often you have to pick eyes or users. You have to choose between open but obscure software and closed but popular software. Open source projects may have more people looking at the source code, and so they have the  “many eyes make shallow bugs” maxim working for them. But the user base for many open source projects is tiny compared to their commercial counterparts. The number of users to find and report bugs is small, and the number who document fixes and work-arounds is even smaller.

I’m not ideologically attached to open source or commercial software. I use both. I just want my software to work. And when it doesn’t work, I want to find a solution quickly.

Related posts:

Software profitability in the middle
Software that gets used

{ 3 comments }