Posts tagged as:

Programming

Reviewing catch blocks

by John on October 22, 2009

Here’s an interesting exercise. If you’re writing code in a language like C# or C++ that has catch statements, write a script to report all catch blocks. You might be surprised at what you find. Some questions to ask:

  • Do catch blocks swallow exceptions and thus mask problems?
  • Is information lost by catching an exception and throwing a new one?
  • Are exceptions logged appropriately?
  • Are notification messages grammatically correct and helpful?

Here’s a PowerShell script that will report all catch statements plus the five lines following the catch statement.

Related post:

Finding embarrassing and unhelpful error messages

{ 1 comment }

Opening black boxes

by John on October 14, 2009

Rookie programmers don’t know how to reuse code. They write too much original code because they either don’t know about libraries or they don’t know how to use them. And if they do reuse someone else’s code, they copy and paste it, creating maintenance problems.

The next step in professional development is learning to reuse code. Encapsulation! Black boxes! Buy, don’t build! etc.

But this emphasis on reuse and black boxes can go too far. We can be intimidated by these black boxes and afraid to open them. We can come to believe the black boxes were created by superior beings. We can spend more time inferring the behavior of the black boxes than it would take to open them up or rewrite them. Then we pile leaky abstraction on top of leaky abstraction when we treat our own code as black boxes.

Joe Armstrong said in Coders at Work

Over the years I’ve kind of made a generic mistake … to not open the black box. … It’s worthwhile seeing if the direct route is quicker than the packaged route.

Several of the programmers who were interviewed in the book made similar remarks. They contribute part of their success to being unafraid of black boxes. They gained experience and confidence by taking things apart to see how they work.

Donald Knuth once said in an interview

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. … you’ll never convince me that reusable code isn’t mostly a menace.

Knuth returns to this theme in Coders at Work.

There’s this overemphasis on reusable software where you never get to open up the box … It’s nice to have these black boxes but, almost always, if you can look inside the box you can improve it …

Well, Knuth can almost always improve any code he finds. Less talented programmers need to be more humble. But too often programmers who are talented enough to make improvements are reluctant to do so. As Yeats said in his poem The Second Coming,

The best lack all conviction, while the worst are full of passionate intensity.

In any discussion of opening black boxes, someone will bring up the analogy of cars: Not everyone needs to know how a car works inside. I would agree that drivers no longer need to understand how a car works, but automotive engineers do need to know. The problem isn’t users who don’t understand how software works, it’s software developers who don’t understand how software works.

I don’t deny that software libraries are extremely valuable. Knuth goes too far when he says reusable code is usually a menace. But I see a disturbing lack of curiosity among programmers. They are far too willing to use code they don’t understand.

Related post:

Reusable code versus re-editable code

{ 4 comments }

Maybe NASA could use some buggy software

by John on October 8, 2009

In Coders at Work, Peter Norvig quotes NASA administrator Don Goldin saying

We’ve got to do the better, faster, cheaper. These space missions cost too much. It’d be better to run more missions and some of them would fail but overall we’d still get more done for the same amount of money.

NASA has extremely rigorous processes for writing software. They supposedly develop bug-free code; I doubt that’s true, thought I’m sure they do have exceptionally low bug rates. But this quality comes at a high price. Rumor has it that space shuttle software costs $1,500 per line to develop. When asked about the price tag, Norvig said “I don’t know if it’s optimal. I think they might be better off with buggy software.” At some point it’s certainly not optimal. If it doubles the price of a project to increase your probability of a successful mission from 98% to 99%, it’s not worth it; you’re better off running two missions with a 98% chance of success each.

Few people understand that software quality is all about probabilities of errors. Most people think the question is whether you’d rather have bug-free software or buggy software. I’d rather have bug-free software, thank you. But bug-free is not an option. Nothing humans do is perfect. All we can do is lower the probabilities of bugs. But as the probability of bugs goes to zero, the development costs go to infinity. (Actually it’s not all about probabilities of errors. It’s also about the consequences of errors. Sending back a photo with a few incorrect pixels is not the same as crashing a probe.)

Norvig’s comment makes sense regarding unmanned missions. But what about manned missions? Since one of the possible consequences of error is loss of life, the stakes are obviously higher. But does demanding flawless software increase the probability of a safe mission? One of the consequences of demanding extremely high quality software is that some tasks are too expensive to automate and so humans have to be trained to do those tasks. But astronauts make mistakes just as programmers do. If software has a smaller probability of error than an astronaut would have for a given task, it would be safer to rely on the software.

Related post:

Software in space

{ 11 comments }

Every time your software displays an error message, you risk losing credibility with your users. If the message is grammatically incorrect, your credibility definitely goes down a notch. And if the message is unhelpful, your credibility goes down at least one more notch. The same can be said for any message, but error messages are particularly important for three reasons.

  1. Users are in a bad mood when they see error messages; this is not the time to make things worse.
  2. Programmers are sloppy with error messages because they’re almost certain the messages will never be displayed.
  3. Error conditions are unusual by their very nature, and so it’s practically impossible to discover them all by black-box testing.

The best way to find error messages is to search the source code for text potentially displayed to users. I’ve advocated this practice for years and usually I encounter indifference or resistance. And yet nearly every time I extract the user-visible text from a software project I find dozens of spelling errors, grammar errors, and incomprehensible messages.

Last year I wrote an article for CodeProject on this topic and provided a script to strip text strings from source code. See PowerShell Script for Reviewing Text Shown to Users. The script looks for all quoted text and text inside XML entities. Then it tries to filter out text strings that are not displayed to users. For example, a string with no spaces is more likely to be a file name or some other code fragment than a user message. The script produces a report that a copy editor can then review. In addition to checking spelling and grammar, an editor can judge whether a message would be comprehensible and useful.

I admit that the parsing in the script is crude. It could miss some strings, and it could filter out some strings that it should keep. But the script is very simple, less than 100 lines. And it works on multiple source code types: C++, C#, XML, VB, etc. Writing a sophisticated parser for each of those languages would be a tremendous amount of work, but a quick-and-dirty script may be 98% as effective. Since most projects review 0% of their source code text, reviewing 98% is an improvement.

In addition to improving the text that a user sees, a text review gives some insight into a program’s structure. For example, if messages are repeated in multiple files, most likely the code has a lot of “clipboard inheritance,” code copied and pasted rather than isolated into reusable functions. A text review could also determine whether a program is concatenating strings to build SQL statements rather than calling stored procedures, possibly indicating a security vulnerability.

{ 5 comments }

Why programmers write unneeded code

by John on October 5, 2009

Programmers write a lot of code that is never used. There are numerous reasons for this. In Peter Seibel’s book Coders at Work, Peter Norvig gives his take on why this happens.

Seibel: Why is it so tempting to solve a problem we don’t really have?

Norvig: You want to be clever and you want closure; you want to complete something and move on to something else. I think people are built to only handle a certain amount of stuff and you want to say “This is completely done; I can put it out of my mind and then I can move on.”

Sometimes software developers believe there’s a high probability that some unrequested feature will be needed in the future. In general, they over-estimate such probabilities. The acronym YAGNI — you aren’t gonna need it — is meant to remind developers of this tendency.

It’s a great feeling to say “I’ve already done that” when someone asks for a new feature. Then you’re the hero, the sage who anticipated what needed to be developed. When you write code that’s not needed, perhaps nobody notices, and you can comfort yourself that the time is still coming when the world will want your feature. The times when you guessed correctly are more vivid in your mind than the times when you’ve been wrong and so you over-estimate just how often you’ve been right.

But sometimes it’s worthwhile to solve a bigger problem than you have to. It may make sense to create a more complete solution than is currently necessary while the problem is fresh on your mind; it will be harder to pick the problem back up in the future, so if you’re ever going to write it, now’s the time.

Norvig rightfully points out the down-side of seeking closure. Maybe the last 2% is intellectually satisfying but horribly difficult and not worth the effort. I’ve erred on both sides. Years ago I often erred on the side of developing functionality that was never used. Then reading Kent Beck convinced me that YAGNI is usually true. Since then I’ve erred on the side of wishing I’d done more while a project was fresh on my mind.

Related posts:

Where does the programming effort go?
The buck stops with the programmer

{ 12 comments }

Is programming getting easier or harder?

by John on October 3, 2009

From Peter Seibel’s interview with Guy Steele in Coders at Work:

Seibel: It is easier to write software now because of advances we’ve made?

Steele: Well, it’s much easier now to write the kinds of programs we were trying to write 30 years ago. But I think our ambitions have grown tremendously. So I think programming is probably a more difficult activity than it was 30 years ago. … it’s not possible to understand everything that’s going on anymore.

Related posts:

Solo software development
Programming language fatigue
The excitement of not knowing what you’re doing

{ 7 comments }

JavaScript: A picture is worth a thousand words

by John on September 28, 2009

Here’s a photo posted by David Walsh on Twitter on yesterday.

Photos of JavaScript books, the good parts being much smaller

Related links:

Programming language subsets
I wish someone would write “R, The Good Parts”
Programming language fatigue
JavaScript: The Definitive Guide
JavaScript: The Good Parts

{ 1 comment }

Software profitability in the middle

by John on September 22, 2009

Kent Beck made an interesting observation about the limits of open source software on FLOSS Weekly around one hour into the show. These aren’t his exact words, just my summary.

Big companies like IBM will contribute to big open source projects like Apache because doing so is in their economic interest. And hobbyists will write small applications and give them away. But who is going to write medium-sized software, projects big enough to be useful but not important enough to any one company to fund? That’s where commercial software thrives.

Kent Beck attributes this argument to Paul Davis.

Beck also talked about how he tried but couldn’t pay his bills developing open source software. The hosts were a little defensive and  pointed out that many people have managed to earn money indirectly from open source software. Beck agreed but said that the indirect approach didn’t work for him. He said that he donates about 10% of his time to open source development (i.e. xUnit) but he makes his money by charging for his products and services.

Related post:

How to avoid being outsourced or open sourced

{ 2 comments }

Conservation of complexity

by John on September 16, 2009

Larry Wall said something one time to the effect that Scheme is beautiful and every Scheme program is ugly; Perl is ugly, but it lets you write beautiful programs. Of course it also lets you write ugly programs if you choose.

Scheme is an elegant, minimalist language. The syntax of the language is extremely simple; you could say it has no syntax. But this simplicity comes at a price. But because the language does so little for you, you have to write the code that might have been included in other languages. And because the language has no syntax, code written in Scheme is hard to read. As Larry Wall said

The reason many people hate programming in Lisp [the parent language of Scheme] is because every thing looks the same. I’ve said it before, and I’ll say it again: Lisp has all the visual appeal of oatmeal with fingernail clippings mixed in.

The complexity left out of Scheme is transferred to the code you write in Scheme. If you’re writing small programs, that’s fine. But if you write large programs in Scheme, you’ll either write a lot of code yourself or you’ll leverage a lot of code someone else has written in libraries.

Perl is the opposite of a minimalist language. There are shortcuts for everything. And if you master the language, you can write programs that are beautiful in that they are very concise. Perl programs can even be easy to read. Yes, Perl programs look like line noise to the uninitiated, but once you’ve learned Perl, the syntax can be helpful if used well. (I have my complaints about Perl, but I got over the syntax.)

Perl is a complicated language, but it works very well for some problems. Features that other languages would put in libraries (e.g. regular expressions, text munging) are baked directly into the Perl language. And if you depend on those features, it’s very handy to have direct support in the language.

The point of my discussion of Scheme and Perl is that the complexity has to go somewhere, either in the language, in libraries, or in application code. That doesn’t mean all languages are equal for all tasks. Some languages put the complexity where you don’t have to think about it. For example, Java simpler than C++, as long as you don’t have to understand the inner workings of the JVM. But if you do need to look inside the JVM, suddenly Java is more complex than C++. The total complexity hasn’t changed, but your subjective experience of the complexity increased.

Earlier this week I wrote a post about C and C++. My point there was similar. C is simpler than C++, but software written in C is often more complicated that software written in C++ when you compare code written by developers of similar talent. If you need the functionality of C++, and most large programs will, then you will have to write it yourself if you’re using C. And if you’re a superstar developer, that’s fine. If you’re less than a superstar, the people who inherit your code may wish that you had used a language that had this functionality built-in.

I understand the attraction to small programming languages. The ideal programming language has everything you need and nothing more. But that means the ideal language is a moving target, changing as your work changes. As your work becomes more complicated, you might be better off moving to a more complex language, pushing more of the complexity out of your application code and into the language and its environment. Or you may be able down-size your language because you no longer need the functionality of a more complex language.

Related posts:

Three-hour-a-week language
Plain Python
MIT replaces Scheme with Python
Periodic table of Perl operators
Programming language subsets

{ 0 comments }

I disagree with Linus Torvalds about C++

by John on September 15, 2009

I heard about this note from Linus Torvalds from David Wolever yesterday. Here’s Torvald’s opinion of C++.

C++ is a horrible language. It’s made more horrible by the fact that a lot of substandard programmers use it, to the point where it’s much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C.

Well, I’m nowhere near as talented a programmer as Linus Torvalds, but I totally disagree with him. If it’s easy to generate crap in a relatively high-level and type-safe language like C++, then it must be child’s play to generate crap in C. It’s not fair to compare world-class C programmers like Torvalds and his peers to average C++ programmers. Either compare the best with the best or compare the average with the average. Comparing the best with the best isn’t very interesting. I imagine gurus like Bjarne Stroustrup and Herb Sutter can write C++ as skillfully as Linus Torvalds writes C, though that is an almost pointless comparison. Comparing average programmers in each language is more important, and I don’t believe C would come out on top in such a comparison.

Torvalds talks about “STL and Boost and other total and utter crap.” A great deal of thought has gone into the STL and to Boost by some very smart people over the course of several years. Their work has been reviewed by countless peers. A typical C or C++ programmer simply will not write anything more efficient or more robust than the methods in these libraries if they decide to roll their own.

Torvalds goes on to say

In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C.

I’ve had the opposite experience. I’d say that anyone wanting to write a large C program ends up reinventing large parts of C++ and doing it poorly. The features added to C to form C++ were added for good reasons. For example, once you’ve allocated and de-allocated C structs a few times, you realize it would be good to have functions to do this allocation and de-allocation. You basically end up re-inventing C++ constructors and destructors. But you end up with something totally sui generis. There’s no compiler support for the conventions you’ve created. No one can read about your home-grown constructors and destructors in a book. And you probably have not thought about as many contingencies as the members of the C++ standards committee have thought of.

I disagree that writing projects in C keeps out inferior C++ programmers who are too lazy to write C. One could as easily argue the opposite, that C is for programmers too lazy to learn C++. Neither argument is fair, but I think there is at least as much validity to the latter as there is to the former. I think there may be a sort of bimodal distribution of C programmer talent: some of the best and some of the worst programmers use C but for different reasons.

I do not claim that C++ is perfect, but I’ve never had any desire to go back to C after I moved to C++ many years ago. I’ll grant that I’m not writing my own operating system, but neither are the vast majority of programmers. For my work, C++ is as low-level as I care to go.

Related posts:

Conservation of complexity
Two perspectives on the design of C++
C++ templates may reduce memory footprint
Porting Visual C++ code to Linux/gcc

{ 67 comments }

Termites and programmers

by John on September 1, 2009

There are more termites in the world than there are elephants. Not only that, the total mass of the world’s elephants is roughly 1/1000 the total mass of the world’s termites. The big, visible animals, the ones that first come to mind, are a small fraction of the total.

Something similar is true of software projects: the big, visible projects, the ones people write about, are a small fraction of the total. Certainly there are more small projects in the world than large projects. And I imagine more programmers in total work on small projects than on large projects. I don’t have any hard numbers on this, and I doubt anyone else does. Most hard numbers come from large, visible projects! Who is going to do a census of all the little one-man projects that go unnoticed?

This post is a continuation of a comment I made as part of the discussion following my blog post on medieval software project management. My contention there was that most projects involve one developer, have no written requirements, and no external testing. That may not be correct, but I imagine it’s closer to the truth than assuming everyone works on projects with a dozen developers, formal requirements documents, and a staff of testers.

The first books on the “right” way to develop software codified the experience gained from working on enormous federally funded software projects. For example, the recommended practice was to spend huge proportion of the total effort in up-front planning. While that made sense when coordinating the efforts of thousands of contractors in the days of punch cards, it doesn’t make as much sense now. The agile software development movement began when people realized that the world had changed and the “best practices” of a previous generation were not optimal for smaller projects and vastly superior hardware.

Agile software development has replaced the best practices of the 1960’s in many organizations. However, there is still a strong tendency to think that small projects should use the same tools and techniques as large, enterprise projects. Most books are written about medium to large projects and many developers worry unnecessarily about scaling up their projects. (”What if I get a million visitors an hour to my web site?” You should be so lucky. Worry about that after it becomes a remote possibility.) Few pundits give advice that scales down, that is, advice appropriate for small projects. I wrote about one exception in a previous post in which Rob Page suggests different methods for projects with a budget of less than $1M and projects with a larger budget.

Related posts:

Million dollar cutoff for software technique
Enterprising software
Medieval software project management

{ 0 comments }

Software that gets used

by John on July 31, 2009

I’ve been looking back at software projects that I either developed or managed. I thought about which projects produced software that is actively used and which didn’t. Here’s what the popular projects had in common. The software

  1. was developed to address existing needs, not speculation of future needs;
  2. solved a general problem; and
  3. was simple, often controversially simple.

The software used most often is a numerical library. It addresses general problems, but at the same time it is specialized to our unique needs. It has some functions you won’t find in other libraries, and it lacks some functions you’d expect to see but that we haven’t needed.

A couple of the more successful projects were re-writes of existing software that deleted maybe 90% of the original functionality. The remaining 10% was made easier to use and was tested thoroughly. No one missed the deleted functionality in one project. In the other, users requested that we add back 1% of the functionality we had deleted.

Related posts:

Simple legacy
The simplest thing that might work

{ 1 comment }

Baklava code

by John on July 27, 2009

“Spaghetti code” is a well-known phrase for software with tangled logic, especially legacy code with goto statements.

The term “lasagna code” is not nearly as common. I first heard it used to describe code with too many architectural layers. Then I found the following older reference to lasagna code with a different slant on the term.

Lasagna code is used to describe software that has a simple, understandable, and layered structure. Lasagna code, although structured, is unfortunately monolithic and not easy to modify. An attempt to change one layer conceptually simple, is often very difficult in actual practice.

Since “lasagna code” has a different usage, I propose the term “baklava code” for code with too many layers.

Baklava. Photo credit Wikipedia

Baklava is a delicious pastry make with many paper-thin layers of phyllo dough. While thin layers are fine for a pastry, thin software layers don’t add much value, especially when you have many such layers piled on each other. Each layer has to be pushed onto your mental stack as you dive into the code. Furthermore, the layers of phyllo dough are permeable, allowing the honey to soak through. But software abstractions are best when they don’t leak. When you pile layer on top of layer in software, the layers are bound to leak.

Related posts:

Leaky abstractions and hockey stick learning curves
Floating point numbers are a leaky abstraction

{ 4 comments }

Bad programmers create jobs

by John on July 22, 2009

Jeff Atwood quotes an interview with David Parnas in his most recent blog post.

Q: What is the most often-overlooked risk in software engineering?

A: Incompetent programmers. There are estimates that the number of programmers needed in the U.S. exceeds 200,000. This is entirely misleading. It is not a quantity problem; we have a quality problem. One bad programmer can easily create two new jobs a year. Hiring more bad programmers will just increase our perceived need for them. If we had more good programmers, and could easily identify them, we would need fewer, not more.

{ 2 comments }

IEEE floating point arithmetic in Python

by John on July 21, 2009

Sometimes a number is not a number. Numeric data types represent real numbers in a computer fairly well most of the time, but sometimes the abstraction leaks. The sum of two numeric types is always a numeric type, but the result might be a special bit pattern that says overflow occurred. Similarly, the ratio of two numeric types is a numeric type, but that type might be a special type that says the result is not a number.

The IEEE 754 standard dictates how floating point numbers work. I’ve talked about IEEE exceptions in C++ before. This post is the Python counterpart. Python’s floating point types are implemented in terms of C’s double type  and so the C++ notes describe what’s going on at a low level. However, Python creates a higher level abstraction for floating point numbers. (Python also has arbitrary precision integers, which we will discuss at the end of this post.)

There are two kinds of exceptional floating point values: infinities and NaNs. Infinite values are represented by inf and can be positive or negative. A NaN, not a number, is represented by nan. Let x = 10200. Then x2 will overflow because 10400 is too big to fit inside a C double. (To understand just why, see Anatomy of a floating point number.) In the following code, y will contain a positive infinity.

x = 1e200; y = x*x

If you’re running Python 3.0 and you print y, you’ll see inf. If you’re running an earlier version of Python, the result may depend on your operating system. On Windows, you’ll see 1.#INF but on Linux you’ll see inf. Now keep the previous value of y and run the following code.

z = y; z /= y

Since z = y/y, you might think z should be 1. But since y was infinite, it doesn’t work that way. There’s no meaningful way to assign a numeric value to the ratio of infinite values and so z contains a NaN. (You’d have to know “how they got there” so you could take limits.) So if you print z you’d see nan or 1.#IND depending on your version of Python and your operating system.

The way you test for inf and nan values depends on your version of Python. In Python 3.0, you can use the functions math.isinf and math.isnan respectively. Earlier versions of Python do not have these functions. However, the SciPy library has corresponding functions scipy.isinf and scipy.isnan.

What if you want to deliberately create an inf or a nan? In Python 3.0, you can use float('inf') or float('nan'). In earlier versions of Python you can use scipy.inf and scipy.nan if you have SciPy installed.

IronPython does not yet support Python 3.0, nor does it support SciPy directly. However, you can use SciPy with IronPython by using Ironclad from Resolver Systems. If you don’t need a general numerical library but just want functions like isinf and isnan you can create your own.


def isnan(x): return type(x) is float and x != x
def isinf(x): inf = 1e5000; return x == inf or x == -inf

The isnan function above looks odd. Why would x != x ever be true? According to the IEEE standard, NaNs don’t equal anything, even each other. (See comments on the function IsFinite here for more explanation.) The isinf function is really a dirty hack but it works.

To wrap things up, we should talk a little about integers in Python. Although Python floating point numbers are essentially C floating point numbers, Python integers are not C integers. Python integers have arbitrary precision, and so we can sometimes avoid problems with overflow by working with integers. For example, if we had defined x as 10**200 in the example above, x would be an integer and so would y = x*x and y would not overflow; a Python integer can hold 10400 with no problem. We’re OK as long as we keep producing integer results, but we could run into trouble if we do anything that produces a non-integer result. For example,

x = 10**200; y = (x + 0.5)*x

would cause y to be inf, and

x = 10**200; y = x*x + 0.5

would throw an OverflowError exception.

Related posts:

Floating point numbers are a leaky abstraction
Anatomy of a floating point number
Overflow and loss of precision

{ 5 comments }