The most recent episode of Software Engineering Radio is Software Archeology with Dave Thomas. In his interview, Dave Thomas gives many practical tips for how to read code, especially when inheriting a project. This interview should be required listening for computer science students. They spend the majority of their time writing code while they’re in school and yet they will spend the majority of their time reading code once they get out — reading code in order to debug or extend it, and if they’re smart, reading code to learn from it.
Dave Thomas attributes one of his most unusual suggestions to Ward Cunningham. Thomas says Cunningham recommends pasting code into Microsoft Word and viewing it in a 2 point font. At this font size you cannot possibly read the code, but you can tell a great deal about the structure of the code. For example, you may spot duplicate code by recognizing a recurring shape.
I tested Ward Cunningham’s idea on a couple source files.
Example 1 has short functions. Near the bottom of the clip something is very repetitive. Skimming through the entire file you see several of these repetitive blocks. (This is test code. The blocks are computed values and expected values for comparison.)
Example 2 looks quite different from Example 1. The image comes from one long function. (This was taken from FORTRAN code that had been programmatically translated in to C++. The frequent short dashes on the left are labels for
5 thoughts on “Software Archeology”
Gotos, C++, Fortran, really small fonts… you are going to give me nightmares.
Another option is to use a text editor that does this directly: for example, Sublime Text (http://www.sublimetext.com/), has a minimap view designed for just this. There are some other alternatives too, such as an emacs plugin (http://www.emacswiki.org/emacs/MiniMap) and the contour view in DrScheme.
You show a program that does not have a high enough resolution to read. The fact that you even put a program with several goto comments shows a lack of programming knowledge. Any good programmer knows that using goto statements can lead to unexpedicted results. Program control should be left to loops, continue, break, and switch.
Yes, the source is unreadable. Ward Cunningham’s suggestion is to use a font that is too small to read so that you’re forced to look at the shape of the code layout rather than the syntax.
I agree that gotos are generally bad practice. I chose the machine-generated C not because it’s good code but because it had a shape that contrasts with the hand-written C++ code.
10 What do you
20 goto 50
40 goto 10
60 gotos are
70 goto 30