Writing software for someone else

One of the differences between amateur and professional software development is whether you’re writing software for yourself or for someone else. It’s like the difference between keeping a journal and being a journalist.

People who have only written software for their own use have no idea how much work goes into writing software for others. You have to imagine a thousand things a user might do that you would never do. You have to decide which of these things you will accommodate, and which you will disallow. And when you decide to disallow an action, you have to decide how to do so while causing minimal irritation to the user.

GUI applications are particularly hard to write, not because it’s difficult to draw buttons and boxes on a screen, but because it’s difficult to think of all the ways a user could arrive at a particular state.

In between writing software for yourself and writing software for others is writing software for people very much like yourself. Open source software started out this way, alpha programmers writing software for alpha programmers. Since then the OSS community has gotten much better at writing software for general users.

Related post: Software exoskeletons

Why read and write tech books?

Now that we have Google, countless blogs, and Stack Overflow, why should anyone buy technical books? And why should anybody write them? Charles Petzold’s answer is that books provide a narrative in a way that the web cannot.

Books about programming have certainly become less essential over the past 15 years or so. …

The Web has demonstrated that it’s greatest strength is the accumulation of information from many sources, and providing links between related concepts. However, where the Web falls down is in presenting long narratives, and I think this is a problem. For thousands of years, human beings have learned not by accumulating facts, but by following a narrative — a story that forges a path through the forest of information rather than merely describing all the trees.

… Books — at least those that are written well — provide narratives that the Web does not. …

As I’m writing a book, my primary intent is not to regurgitate the documentation but to impose a narrative on the material. This narrative has to begin with the basics and gradually introduce more and more material with a pace that neither overwhelms nor bores the reader. A narrative is necessarily a single path, and I spend much time and effort coming up with a good one.

From An Experiment in Book Publishing.

I completely agree. I love the kind of books Petzold is talking about. And by the way, a book with a dozen authors isn’t a real book in my opinion. I’m disappointed whenever I’m browsing a library and think I’ve found a book on something, only to realize I’ve found a stack of articles bound together with no narrative.

Unix doesn't follow the Unix philosophy

The Unix philosophy is a noble idea, but even Unix doesn’t follow it too closely. The Unix philosophy, as summarized by Doug McIlroy, says

  1. Write programs that do one thing and do it well.
  2. Write programs to work together.
  3. Write programs to handle text streams, because that is a universal interface.

Here is an example from James Hague where the first point has been strained.

The UNIX ls utility seemed like a good idea at the time. It’s the poster child for the UNIX way: a small tool that does exactly one thing well. Here that thing is to display a list of filenames. But deciding exactly what filenames to display and in what format led to the addition of over 35 command-line switches. Now the man page for the BSD version of ls bears the shame of this footnote: “To maintain backward compatibility, the relationships between the many options are quite complex.”

James Hague gives this as only one small example of how programmers have allowed things to become unnecessarily complicated. He concludes

We did this. We who claim to value simplicity are the guilty party. See, all those little design decisions actually matter, and there were places where we could have stopped and said “no, don’t do this.” And even if we were lazy and didn’t do the right thing when changes were easy, before there were thousands of users, we still could have gone back and fixed things later. But we didn’t.

He’s right, to some extent. But as I argued in Where the Unix philosophy breaks down, some of the growth in complexity is understandable. It’s a lot easier to maintain an orthogonal design when your software isn’t being used. Software that gets used becomes less orthogonal and develops diagonal shortcuts.

Why does ls have dozens of tangled options? Because users, even Unix users, are not overly fond of the first two points of the Unix philosophy. They don’t want to chain little programs together. They’d rather do more with the tool at hand than put it down to pick up new tools. They do appreciate the ideal of single-purpose tools that work well together, but only in moderation.

I agree that “all those little design decisions actually matter, and there were places where we could have stopped and said ‘no, don’t do this.'” Some complexity has come from a lack of foresight or a lack of courage. But not all of it. Some of it has come from satisfying what complex humans want from their software.

Related post:

100x better approach to software?

The Book of Inkscape

When I first started using Inkscape, I read Inkscape: Guide to a Vector Drawing Program by Tavmjong Bah, 3rd edition. It’s now in its 4th edition, which I have not seen.

I received a copy of The Book of Inkscape by Dmitry Kirsanov recently, and it looks like the book I would have preferred to start with. Both books are fine introductions, but Kirsanov’s book is more my style.

Bah’s book is more inductive. It teaches you the elements of Inkscape by first taking you through a series of projects. Kirsanov’s book is organized more like a textbook or a reference. Some people would prefer Bah’s book, especially if it were their intention to work through all the exercises. I prefer Kirsanov’s book, organized more by topic than by project. It’s easier to dip in and out of as needed.

I’d like to learn Inkscape well. I could imagine going through a book slowly, carefully working all the examples, exploring side roads, etc. But that’s not realistic for me any time soon. For now, I expect I’ll learn more about Inkscape just-in-time as I need to make illustrations. And Kirsanov’s book is better suited for that.

Related posts:

Including LaTeX in an Inkscape drawing
Including an Inkscape drawing in LaTeX
Plotting functions in Inkscape

Should you walk or run in the rain?

One of the problems in X and the City, a book I mentioned the other day, is deciding whether you’ll get wetter by walking or running in the rain.

The author takes several factors into account and models the total amount of water a person absorbs as

T = frac{Iwd}{v}left(ct costheta + l(c sintheta + v)right).

This assumes a person is essentially a rectangular box of height l, width w, and thickness t. The rain is falling at an angle θ to the vertical (e.g. θ = 0 for rain coming straight down). The distance you need to walk or run is d and your speed is v. The rain is falling with speed c. The parameter I is the rain intensity, ranging from 0 for no rain to 1 for continuous flow. The book goes into greater detail, deriving the formula and estimating numerical values for the parameters.

Conclusion?

If the rain is driving into you from the front, run as fast as you safely can. On the other hand, if the rain is coming from behind you, and you can keep pace with its horizontal speed by waling, do so!

Using SciPy with IronPython

Three years ago I wrote a post about my disappointment using SciPy with IronPython. A lot has changed since then, so I thought I’d write a short follow-up post.

To install NumPy and SciPy for use with IronPython, follow the instructions here. After installation, NumPy works as expected.

There is one small gotcha with SciPy. To use SciPy with IronPython, start ipy with the command line argument -X:Frames. Then you can use SciPy as you would from CPython. For example.

c:> ipy -X:Frames
>>> import scipy as sp
>>> sp.pi
3.141592653589793

Without the -X:Frames option you’ll get an error when you try to import scipy.

AttributeError: 'module' object has no attribute '_getframe'

According to this page,

The issue is that SciPy makes use of the CPython API for inspecting the current stack frame which IronPython doesn’t enable by default because of a small runtime performance hit. You can turn on this functionality by passing the command line argument “-X:Frames” to on the command line.

The 1970s

Here’s a perspective on the 1970s I found interesting: The decade was so embarrassing that climbing out of the ’70s was a proud achievement.

The 1970s were America’s low tide. Not since the Depression had the country been so wracked with woe. Never — not even during the Depression — had American pride and self-confidence plunged deeper. But the decade was also, paradoxically, in some ways America’s finest hour. America was afflicted in the 1970s by a systemic crisis analogous to the one that struck Imperial Rome in the middle of the third century A.D. … But unlike the Romans, Americans staggered only briefly before the crisis. They took the blow. For a short time they behaved foolishly, and on one or two occasions, even disgracefully. Then they recouped. They rethought. They reinvented.

Source: How We Got Here: The 70’s: The Decade That Brought You Modern Life—For Better or Worse

Differential Equations and the City

This afternoon I got a review copy of X and the City: Modeling Aspects of Urban Life by John A. Adam. It’s a book about mathematical model, taking all its examples from urban life: public transportation, growth, pollution, etc. I’ve only skimmed through the book so far, but it looks like most of the applications involve differential equations. Some depend on algebra or probability.

The book looks interesting. I hope to say more about the book once I’ve had a chance to read it. The examples are all short, so it may be any easy book to read a little at a time.

I also got a review copy of The Book of Inkscape today, and I’m expecting several other books soon. It may take a while to get through these since this is a busy time for me. When it rains, it pours.

Castles and quantum mechanics

How are castles and quantum mechanics related? One connection is rook polynomials.

The rook is the chess piece that looks like a castle, and used to be called a castle. It can move vertically or horizontally, any number of spaces.

A rook polynomial is a polynomial whose coefficients give the number of ways rooks can be arranged on a chess board without attacking each other. The coefficient of xk in the polynomial Rm,n(x) is the number of ways you can arrange k rooks on an m by n chessboard such that no two rooks are in the same row or column.

The rook polynomials are related to the Laguerre polynomials by

Rm,n(x) = n! xn Lnm-n(-1/x)

where Lnk(x) is an “associated Laguerre polynomial.” These polynomials satisfy Laguerre’s differential equation

x y” + (n+1-x) y‘ + k y = 0,

an equation that comes up in numerous contexts in physics. In quantum mechanics, these polynomials arise in the solution of the Schrödinger equation for the hydrogen atom.

Related:

Relations between special functions

Mars, magic squares, and music

About a year ago I wrote about Jupiter’s magic square. Then yesterday I was listening to the New Sounds podcast that mentioned a magic square associated with Mars. I hadn’t heard of this, so I looked into it and found there were magic squares associated with each of solar system bodies known to antiquity (i.e. Sun, Mercury, Venus, Moon, Mars, Jupiter, and Saturn).

Here is the magic square of Mars:

The podcast featured Secret Pulse by Zack Browning. From the liner notes:

Magic squares provide structure to the music. Structure provides direction to the composer. Direction provides restrictions for the focused inspiration and interpretation of musical materials. The effect of this process? Freedom to compose.

The compositions on this CD use the 5×5 Magic Square of Mars (Secret Pulse), the 9×9 Magic Square of the Moon (Moon Thrust), and the ancient Chinese 3×3 Lo Shu Square found in the Flying Star System of Feng Shui (Hakka Fusion, String Quartet, Flying Tones, and Moon Thrust) as compositional models.  The musical structure created from these magic squares is dramatically articulated by the collision of different musical worlds …

I don’t know how the composer used these magic squares, but you can listen to the title track (Secret Pulse) on the podcast.

Related posts:

Jupiter’s magic square
A magic king’s tour
A magic knight’s tour
A knight’s random walk

Machine Learning in Action

A couple months ago I briefly reviewed Machine Learning for Hackers by Drew Conway and John Myles White. Today I’m looking at Machine Learning in Action by Peter Harrington and comparing the two books.

Both books are about the same size and cover many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python.

ML in Action doesn’t lean heavily on Python libraries. It mostly implements its algorithms from scratch, with a little help from NumPy for linear algebra, but it does not use ML libraries such as scikit-learn. It sometimes uses Matplotlib for plotting and uses Tkinter for building a simple GUI in one chapter. The final chapter introduces Hadoop and Amazon Web Services.

ML for Hackers is a little more of a general introduction to machine learning. ML in Action contains a brief introduction to machine learning in general, but quickly moves on to specific algorithms. ML for Hackers spends a good number of pages discussing data cleaning. ML in Action starts with clean data in order to spend more time on algorithms.

ML in Action takes 8 of the top 10 algorithms in machine learning (as selected by this paper) and organizes around these algorithms. (The two algorithms out of the top 1o that didn’t make it into ML in Action were PageRank, because it has been covered well elsewhere, and EM, because its explanation requires too much mathematics.) The algorithms come first in ML in Action, illustrations second. ML for Hackers puts more emphasis on its examples and reads a bit more like a story. ML in Action reads a little more like a reference book.

http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/#comment-170809

Criteria for a computing setup

“My setup” articles have become common. These articles list the hardware and software someone uses, usually with little explanation. The subtext is often the author’s commitment to the Apple brand or to open source, to spending money on the best stuff or to avoid spending money on principle. I don’t find such articles interesting or useful.

Vivek Haldar has written a different kind of  “my setup” article, one that emphasizes the problems he set out to solve and the reasons for the solutions he chose. Here are a couple excerpts describing his goals for preserving his data and his health.

Try to remember the oldest digital artifact that you can still retrieve, and more importantly, decode and view. Can you? How old is it? That should give you some idea of how hard and full of unknowns the problem of long-term preservation is. …

If a significant fraction of your working life is spent working with computers, and you do not yet have even the mildest RSI, you should consider yourself extremely lucky, but not immune. Act like you do have RSI, and change your set up right now to avoid it.

I thought the best part of the article was the criteria, not the solutions. It’s not that I disapprove of his choices, but I appreciate more his explanation of the rationale behind his choices. I don’t expect anybody is going to read the article and say “That’s it! I’m going to copy that setup.” I gather that in at least one detail, his choice of version control system, Vivek wouldn’t even copy his own setup if he were to start over. But people will get ideas to consider in their own circumstances.

Related post: Ford-Chevy arguments in tech

Solutions to knight's random walk

My previous post asked this question:

Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square?

There is a mathematical solution that is a little arcane, but short and exact. You could also approach the problem using simulation, which is more accessible but not exact.

The mathematical solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the edges connect legal knight moves. The general solution for the time to first return is simply 2N/k where N is the number of edges in the graph, and k is the number of edges meeting at the starting point. Amazingly, the solution hardly depends on the structure of the graph at all. It only requires that the graph is connected. In our case N = 168 and k = 2.

For a full explanation of the math, see this online book, chapter 3, page 9. Start there and work your way backward until you understand the solution.

And now for simulation. The problem says to pick a legal knight’s move at random. The most direct approach would be to find the legal moves at a given point first, then choose one of those at random. The code below achieves the same end with a different approach. It first chooses a random move, and if that move is illegal (i.e. off the board) it throws that move away and tries again.  This will select a legal move with the right probability, though perhaps that’s not obvious. It’s what’s known as an accept-reject random generator.

from random import randint

# Move a knight from (x, y) to a random new position
def new_position(x, y):

    while True:
        dx, dy = 1, 2

        # it takes three bits to determine a random knight move:
        # (1, 2) vs (2, 1), and the sign of each
        r = randint(0, 7)
        if r % 2:
            dx, dy = dy, dx
        if (r >> 1) % 2:
            dx = -dx
        if (r >> 2) % 2:
            dy = -dy

        newx, newy = x + dx, y + dy
        # If the new position is on the board, take it.
        # Otherwise try again.
        if (newx >= 0 and newx < 8 and newy >= 0 and newy < 8):
            return (newx, newy)

# Count the number of steps in one random tour
def random_tour():
    x, y = x0, y0 = 0, 0
    count = 0
    while True:
        x, y = new_position(x, y)
        count += 1
        if x == x0 and y == y0:
            return count

# Average the length of many random tours
sum = 0
num_reps = 100000
for i in xrange(num_reps):
    sum += random_tour()
print sum / float(num_reps)

A theorem is better than a simulation, but a simulation is a lot better than nothing. This problem illustrates how sometimes we think we need to simulate when we don’t. On the other hand, when you have a simulation and a theorem, you have more confidence in your solution because each validates the other.