Don’t be a technical masochist

There’s an old joke from Henny Youngman:

I told the doctor I broke my leg in two places. He told me to quit going to those places.

Sometimes tech choices are that easy: if something is too hard, stop doing it. A great deal of pain comes from using a tool outside its intended use, and often that’s avoidable.

For example, when regular expressions get too hard, I stop using regular expressions and write a little procedural code. Or when Python is too slow, I try some simple ways of speeding it up, and if that’s not good enough I switch from Python to C++. If something is too hard to do in Windows, I’ll do it in Linux, and vice versa.

Sometimes there’s not a better tool available and you just have to slog through with what you have. And sometimes you don’t have the freedom to use a better tool even though one is available. But a lot of technical pain is self-imposed. If you keep breaking your leg somewhere, stop going there.

“Conventional” is relative

I found this line from Software Foundations amusing:

… we can ask Coq to “extract,” from a Definition, a program in some other, more conventional, programming language (OCaml, Scheme, or Haskell) with a high-performance compiler.

Most programmers would hardly consider OCaml, Scheme, or Haskell “conventional” programming languages, but they are conventional relative to Coq. As the authors said, these languages are “more conventional,” not “conventional.”

I don’t mean to imply anything negative about OCaml, Scheme, or Haskell. They have their strengths—I briefly mentioned the advantages of Haskell just yesterday—but they’re odd birds from the perspective of the large majority of programmers who work in C-like languages.

Real World Haskell

I’m reading Real World Haskell because one of my clients’ projects is written in Haskell. Some would say that “real world Haskell” is an oxymoron because Haskell isn’t used in the real world, as illustrated by a recent xkcd cartoon.

It’s true that Haskell accounts for a tiny portion of the world’s commercial software and that the language is more popular in research. (There would be no need to put “real world” in the title of a book on PHP, for example. You won’t find a lot of computer science researchers using PHP for its elegance and nice theoretical properties.) But people do use Haskell on real projects, particularly when correctness is a high priority.[1] In any case, Haskell is “real world” for me since one of my clients uses it. As I wrote about before, applied is in the eye of the client.

I’m not that far into Real World Haskell yet, but so far it’s just what I was looking for. Another book I’d recommend is Graham Hutton’s Programming in Haskell. It makes a good introduction to Haskell because it’s small (184 pages) and focused on the core of the language, not so much on “real world” complications.

A very popular introduction to Haskell is Learn You a Haskell for Great Good. I have mixed feelings about that one. It explains most things clearly and the informal tone makes it easy to read, but the humor becomes annoying after a while. It also introduces some non-essential features of the language up front that could wait until later or be left out of an introductory book.

* * *

[1] Everyone would say that it’s important for their software to be correct. But in practice, correctness isn’t always the highest priority, nor should it be necessarily. As the probability of error approaches zero, the cost of development approaches infinity. You have to decide what probability of error is acceptable given the consequences of the errors.

It’s more important that the software embedded in a pacemaker be correct than the software that serves up this blog. My blog fails occasionally, but I wouldn’t spend $10,000 to cut the error rate in half. Someone writing pacemaker software would jump at the chance to reduce the probability of error so much for so little money.

On a related note, see Maybe NASA could use some buggy software.

Book review: Practical Data Analysis

Many people have drawn Venn diagrams to locate machine learning and related ideas in the intellectual landscape. Drew Conway’s diagram may have been the first. It has at least been frequently referenced.

By this classification, Hector Cuesta’s new book Practical Data Analysis is located toward the “hacking skills” corner of the diagram. No single book can cover everything, and this one emphasizes practical software knowledge more than mathematical theory or details of a particular problem domain.

The biggest strength of the book may be that it brings together in one place information on tools that are used together but whose documentation is scattered. The book is great source for sample code. The source code  is available on GitHub, though it’s more understandable in the context of the book.

Much of the book uses Python and related modules and tools including:

  • NumPy
  • mlpy
  • PIL
  • twython
  • Pandas
  • NLTK
  • IPython
  • Wakari

It also uses D3.js (with JSON, CSS, HTML, …), MongoDB (with MapReduce, Mongo Shell, PyMongo, …), and miscellaneous other tools and APIs.

There’s a lot of material here in 360 pages, making it a useful reference.

10 software tool virtues

Diomidis Spinellis gives a list of 10 software tool sins in The Tools at Hand episode of his Tools of the Trade podcast. Here are his points, but turned around. For each sin he lists, I give the opposite as a virtue.

10. Maintain API documentation with the source code.

9. Integrate unit testing in development.

8. Track bugs electronically.

7. Let the compiler do what it can do better than you.

6. Learn how to script your tools to work together.

5. Pay attention to compiler warnings and fix them.

4. Use a version control system.

3. Use tools to find definitions rather than scanning for them.

2. Use a debugger.

1. Use tools that eliminate repetitive manual editing.

I turned the original list around because I believe it’s easier to agree that the things above are good than it is to see that their lack is bad. Some items are opposites, like #5: you either pay attention to warnings or you ignore them. But some are not, like #8. Tracking bugs electronically is a good idea, but I wouldn’t call tracking bugs on paper a “sin.”

Related post: Reducing development friction comments on another podcast from Diomidis Spinellis.

Reducing development friction

Diomidis Spinellis gave an insightful list of ways to reduce software development friction in the Tools of the Trade podcast episode The Frictionless Development Environment Scorecard.

The first item on his list grabbed my attention:

Are my personal settings and preferences consistent on all the computers I’m using? Are they stored under version control? Can I install them on a new computer using a single command?

Listening to the podcast provoked me to finally sync my .emacs files on all my computers so that I now have the exact same file on all computers, maintained under version control. (Xah Lee gave me some sample code for creating the branching logic I needed for a few differences between Windows and Linux.)

Here is a small sample of questions from the podcast.

  • Are my files getting backed up? Is the backup tested, accessible, off site, in multiple media, with regularly retained copies?
  • Can I use the same editor for all my code and documentation editing tasks?
  • Can I get context-sensitive help and code completion?
  • Can I search recursively down a directory tree? Ignoring case? Only in a subset of files? With a regular expression?
  • Can I open a shell from the graphical file explorer and vice versa?
  • Can I quickly build the application I’m working on after a change? Can I test the application with a single command?
  • Can I automatically check my code for common or tricky errors? Are these checks run by default? Are they clean?
  • Does my application log its actions?
  • Is documentation for the tools and APIs I use readily available? Is it hyperlinked? Available offline?

The last question from the podcast summarizes the whole list:

Do I regularly evaluate my development environment to pinpoint and eliminate the sources of friction? Do I help my colleagues do the same?

Naming collections

When you have an array of things, do you name the array with a plural noun because it contains many things, or you name it with a singular noun because each thing it contains is singular? For example, if you have a collection of words, should you name it words or word?

Does it make any difference if you’re using some container other than an array? For example if you have a dictionary (a.k.a. map, hash, associative array, etc.) counting word frequencies, should it be count or counts?

I’ve never had a convention that I consciously follow. But I’ve often stopped to wonder which way I should name things. One approach may look right when I declare a variable and another when I use it.

Damian Conway has a reasonable suggestion in his book Perl Best Practices. (There are many things in that book that are good advice for people who never touch Perl.) He recommends using plural names for most arrays and singular names for dictionaries and arrays used like dictionaries.

Because hash entries are typically accessed individually, it makes sense for the hash itself to be named in the singular. That convention causes the individual accesses to read more naturally in the code. … On the other hand, array values are more often processed collectively … So it makes sense to name them in the plural, after the group of items they store. … If, however, an array is to be used as a random-access look-up table, name it in the singular, using the same conventions as a hash.

Tragedies and messes

Dorothy Parker said “It’s not the tragedies that kill us; it’s the messes.”

Sometime that’s how I feel about computing. I think of messes such as having to remember that arc tangent is atan in R and Python, but arctan in NumPy and a in bc. Or that C, Python, and Perl use else if, elif, and elsif respectively. Or did I switch those last two?

These trivial but innumerable messes keep us from devoting our full energy to bigger problems.

One way to reduce these messes is to use fewer tools. Then you know less to be confused about. If you only use Python, for example, then elif is just how it is. But knowing more tools is worth the added mess, up to a point. Past some point, however, new tools add more mental burden than utility. You have to find the optimal combination of tools for yourself, and that combination will change over time.

To use fewer tools, you may need to use more complex tools. Maybe you can replace a list of moderately complex but inconsistent tools with one tool that is more complex but internally consistent.