The magic / boilerplate trade-off

Phil Webb had an insightful tweet the other day.

Programming environments oscillate between boilerplate and magic. APIs tend to start out with all the wires exposed. Programming is tedious, but nothing is hidden. Development is hard, but debugging is easy. See, for example, the hello-world program for Win32.

Programmers get tired of this, and create levels of abstraction. Boilerplate is reduced and development gets easier. And if the abstraction is done well, debugging doesn’t get harder, or not much harder. But this abstraction doesn’t go far enough. Programmers feel like things should be easier still. That’s when magic comes in. Magic differs from abstraction in that it not only hides details, it’s actively misleading. Something appears to work one way, the desired way, but something quite different is going on. Development gets easier, but debugging gets much harder. If the magic gets to be too much, developers look to start over with something more transparent, and the cycle begins again.


Wizards, in the sense of code generators, are closely related to magic. Whereas magic abuses language features to create illusions, wizards generate boilerplate code and become a sort of meta language.

Webb’s comment about boilerplate vs magic came to mind this morning, not from looking at software, but at math. Over time mathematicians discover better ways to organize material, turning some theorems into definitions and vice versa. This makes things easier in the long run, but creates a barrier to entry in the short term. Math moves from boilerplate to magic, from relatively concrete and tedious to abstract but less obviously motivated. The definitions seem magical to the beginner because the applications have been delayed.

When I see this in an area I understand well, I think it’s clever. When I see it in an area I don’t know well, it feels like some sort of guild is trying to keep me out. I know that this isn’t the case—the machinery of mathematics is always created for practical reasons, not to intentionally intimidate anyone—but it can feel that way. It’s not so much that the motivation is deliberately hidden but that it is obscured. An unfortunate aspect of mathematics culture is that people are reluctant to discuss motivation in writing. It’s more likely to come out in conversation or in lectures.

Related posts:

The longer it has taken, the longer it will take

Suppose project completion time follows a Pareto (power law) distribution with parameter α. That is, for t > 1, the probability that completion time is bigger than t is t. (We start out time at t = 1 because that makes the calculations a little simpler.)

Now suppose we know that a project has lasted until t0 so far. Then the expected finish time is αt0/(α-1) and so the expected additional time is t0/(α-1). Note that both are proportional to t0. So the longer it has taken, the longer it will take. If the project is running late, you can expect the time remaining to be even more than the expected time before the project started. The finish line is moving away from you!

For example, suppose α = 2 (in applications of power laws, α is often between 1 and 3) and you’re measuring time in years. When the project starts at t = 1, it is expected to take one year, until t = 2. Now suppose you’re starting the second year and the project isn’t done. Now it’s expected to finish at t = 4, two more years. When you started, the project was supposed to take a year. One year later, it has taken a year, and should be expected to take two more years. I said “should be expected” rather than “is expected” because no one would believe such an estimate. (Ever heard of the Big Dig? Or other megaprojects?)

Note that we have computed the conditional probability given only the time it has taken so far, and no other information. If you know more, for example maybe you know that some specific pieces have been completed, then you should use that information.

This is related to the Lindy effect. The longer a cultural artifact has been around, the longer it is expected to last into the future.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

Dimensional analysis and types

This weekend I mentioned on Twitter that it’s spooky how well dimensional analysis catches errors. If you’re trying to calculate a number of horses, does your final result have units of horses? If it has units of cats or kilograms, something has gone wrong. This is such a simple idea, it’s remarkable that it’s worth checking. You can find a surprising number of errors simply by asking whether your answer has consistent units — e.g. did you add a length to a mass somewhere? — and whether it has the right units — if you’re computing a dollar amount, does your answer come out in dollars?

A few people replied, making the analogy with type systems for programming languages. Type systems prevent you, for example, from adding a number and a word or from trying to take the square root of an image. You could think of dimensional analysis as a subset of type theory.

I believe there’s an argument implicit in the comparison of dimensional analysis and type theory: If a minimal amount of type checking, i.e. checking units of measurement, is effective at catching errors, more type checking should catch more errors. Although I mostly agree with this argument, it leaves out cost. Stronger type checking catches more errors, but at what cost? Dimensional analysis is practically free. It can often be a literal back-of-an-envelope calculation and has great return on effort. What about type systems?

Most people would agree that some minimal amount of type checking is worth the effort. The controversy is over when it is no longer economical to add additional structure. Haskell has a much more expressive type system than less formal languages, and yet the Haskell community is looking for ways to make the type system even more expressive. Some would say we’ve yet to discover the point where additional typing isn’t worth the effort. At the opposite end of the spectrum are people who believe that no amount of explicit typing is worth it.

As with most technical controversies, the resolution depends on context. You don’t want types to get in your way when you’re writing a quick-and-dirty script. But you might greatly appreciate them in large, mission-critical projects.

It takes skill to get the most benefit from a type system. A good type system won’t automatically do anything for your code. You could actively work against the type system by writing “stringly typed” code, for example. Every function takes in a string and returns a string. You could passively work with the type system, using built-in types as intended but not creating any new types. Or you could actively work with the type system, creating new types so that more logical errors will become compiler errors.

An example of passively using a type system would be distinguishing integers and floating point numbers, say using integers for counting things, floating point numbers for measurements like temperature and mass, and strings for character data. You could argue that there aren’t many natural types in such program, and so a strong type system wouldn’t be that helpful. A more active approach would be, for example, to introduce different types for temperatures and masses. You could go much further than this, organizing data into more complex types.

All other things being equal, I like strong, static typing. But there are usually other factors that outweigh my typing preferences: availability of libraries, convenience, client requirements, etc.

Related post: Why can you multiply different units but not add them?

Learning (needlessly) hard technology

A few years ago, a friend told me he was thinking about learning a certain technology because it was really hard to use. This was not something that had to be complex to solve a complex problem, but something that was unnecessarily complex. Why would anyone do that?

His reasoning was that as a consultant, he could make good money supporting a technology that’s hard to use. My friend would have more integrity than to recommend something that he didn’t think was a good solution. Perhaps he was thinking of saying something like this to a client: “I wouldn’t recommend this technology if you were starting from scratch. But since you’re invested in it, I’ll help you with it or help you migrate to something else.”

That sounds like an unpleasant way to earn a living. It also sounds risky. If something really is unnecessarily complex, better alternatives are likely to arise, perhaps suddenly. (This assumes people are free to choose alternatives, not prohibited by law, for example.)

Learning a technology that’s complex for good reasons could be a smart and ethical move. The work is harder at lower levels of abstraction, but someone has to solve the problems others would rather not think about. And since not as many people can do that work, it should pay better and be more secure.

There are a couple dangers, however, associated with choosing a more difficult technology. One is the temptation to use it where it isn’t needed. The other is that the set of problems where it is needed may shrink over time.

Related posts:

Mixing Haskell and R

It would be hard to think of two programming languages more dissimilar than Haskell and R.

Haskell was designed to enforce programming discipline; R was designed for interactive use. Haskell emphasizes correctness; R emphasizes convenience.  Haskell emphasizes computer efficiency; R emphasizes interactive user efficiency. Haskell was written to be a proving ground for programming language theorists. R was written to be a workbench for statisticians. Very different goals lead to very different languages.

When I first heard of a project to mix Haskell and R, I was a little shocked. Could it even be done? Aside from the external differences listed above, the differences in language internals are vast. I’m very impressed that the folks at Tweag I/O were able to pull this off. Their HaskellR project lets you call R from Haskell and vice versa. (It’s primarily for Haskell calling R, though you can call Haskell functions from your R code: Haskell calling R calling Haskell. It kinda hurts your brain at first.) Because the languages are so different, some things that are hard in one are easy in the other.

I used HaskellR while it was under initial development. Our project was written in Haskell, but we wanted to access R libraries. There were a few difficulties along the way, as with any project, but these were resolved and eventually it just worked.

The success of OOP

Allen Wirfs-Brock gave the following defense of OOP a few days ago in a series of six posts on Twitter:

A young developer approached me after a conf talk and said, “You must feel really bad about the failure of object-oriented programming.” I was confused. I said, “What do you mean that object-orient programming was a failure. Why do you think that?”

He said, “OOP was supposed to fix all of our software engineering problems and it clearly hasn’t. Building software today is just as hard as it was before OOP. came along.”

“Have you ever look at the programs we were building in the early 1980s? At how limited their functionality and UIs were? OOP has been an incredible success. It enabled us to manage complexity as we grew from 100KB applications to today’s 100MB applications.”

Of course OOP hasn’t solved all software engineering problems. Neither has anything else. But OOP has been enormously successful in allowing ordinary programmers to write much larger applications. It has become so pervasive that few programmers consciously think about it; it’s simply how you write software.

I’ve written several posts poking fun at the excesses of OOP and expressing moderate enthusiasm for functional programming, but I appreciate OOP. I believe functional programming will influence object oriented programming, but not replace it.


High-dimensional integration

Numerically integrating functions of many variables almost always requires Monte Carlo integration or some variation. Numerical analysis textbooks often emphasize one-dimensional integration and, almost as a footnote, say that you can use a product scheme to bootstrap one-dimensional methods to higher dimensions. True, but impractical.

Traditional numerical integration routines work well in low dimensions. (“Low” depends on context, but let’s say less than 10 dimensions.) When you have the choice of a well-understood, deterministic method, and it works well, by all means use it. I’ve seen people who are so used to problems requiring Monte Carlo methods that they use these methods on low-dimensional problems where they’re not the most efficient (or robust) alternative.

Monte Carlo

Except for very special integrands, high dimensional integration requires either Monte Carlo integration or some variation such as quasi-Monte Carlo methods.

As I wrote about here, naive Monte Carlo integration isn’t practical for high dimensions. It’s unlikely that any of your integration points will fall in the region of interest without some sort of importance sampler. Constructing a good importance sampler requires some understanding of your particular problem. (Sorry. No one-size-fits-all black-box methods are available.)

Quasi-Monte Carlo (QMC)

Quasi-Monte Carlo methods (QMC) are interesting because they rely on something like a random sequence, but “more random” in a sense, i.e. more effective at exploring a high-dimensional space. These methods work well when an integrand has low effective dimension. The effective dimension is low when nearly all the action takes place on a lower dimensional manifold. For example, a function may ostensibly depend on 10,000 variables, while for practical purposes 12 of these are by far the most important. There are some papers by Art Owen that say, roughly, that Quasi-Monte Carlo methods work well if and only if the effective dimension is much smaller than the nominal dimension. (QMC methods are especially popular in financial applications.)

While QMC methods can be more efficient than Monte Carlo methods, it’s hard to get bounds on the integration error with these methods. Hybrid approaches, such as randomized QMC can perform remarkably well and give theoretical error bounds.

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) is a common variation on Monte Carlo integration that uses dependent random samples. In many applications, such as Bayesian statistics, you’d like to be able to create independent random samples from some probability distribution, but this is not practical. MCMC is a compromise. It produces a Markov chain of dependent samples, and asymptotically these samples have the desired distribution. The dependent nature of the samples makes it difficult to estimate error and to determine how well the integration estimates using the Markov chain have converged.

(Some people speak of the Markov chain itself converging. It doesn’t. The estimates using the Markov chain may converge. Speaking about the chain converging may just be using informal language, or it may betray a lack of understanding.)

There is little theory regarding the convergence of estimates from MCMC, aside from toy problems. An estimate can appear to have converged, while an important region of the integration problem remains unexplored. Despite its drawbacks, MCMC is the only game in town for many applications.

Click to learn more about numerical integration consulting

Algorithmic wizardry

Last week I wrote a short commentary on James Hague’s blog post Organization skills beat algorithmic wizardry. This week that post got more traffic than my server could handle. I believe it struck a chord with experienced software developers who know that the challenges they face now are not like the challenges they prepared for in school.

Although I completely agree that “algorithmic wizardry” is over-rated in general, my personal experience has been a little different. My role on projects has frequently been to supply a little bit of algorithmic wizardry. I’ve often been asked to look into a program that is taking too long to run and been able to speed it up by an order of magnitude or two by improving a numerical algorithm. (See an example here.)

James Hague says that “rarely is there some … algorithm that casts a looming shadow over everything else.” I believe he is right, though I’ve been called into projects precisely on those rare occasions when an algorithm does cast a shadow over everything else.

The most important skill in software development

Here’s an insightful paragraph from James Hague’s blog post Organization skills beat algorithmic wizardry:

When it comes to writing code, the number one most important skill is how to keep a tangle of features from collapsing under the weight of its own complexity. I’ve worked on large telecommunications systems, console games, blogging software, a bunch of personal tools, and very rarely is there some tricky data structure or algorithm that casts a looming shadow over everything else. But there’s always lots of state to keep track of, rearranging of values, handling special cases, and carefully working out how all the pieces of a system interact. To a great extent the act of coding is one of organization. Refactoring. Simplifying. Figuring out how to remove extraneous manipulations here and there.

Algorithmic wizardry is easier to teach and easier to blog about than organizational skill, so we teach and blog about it instead. A one-hour class, or a blog post, can showcase a clever algorithm. But how do you present a clever bit of organization? If you jump to the solution, it’s unimpressive. “Here’s something simple I came up with. It may not look like much, but trust me, it was really hard to realize this was all I needed to do.” Or worse, “Here’s a moderately complicated pile of code, but you should have seen how much more complicated it was before. At least now someone stands a shot of understanding it.” Ho hum. I guess you had to be there.

You can’t appreciate a feat of organization until you experience the disorganization. But it’s hard to have the patience to wrap your head around a disorganized mess that you don’t care about. Only if the disorganized mess is your responsibility, something that means more to you than a case study, can you wrap your head around it and appreciate improvements. This means that while you can learn algorithmic wizardry through homework assignments, you’re unlikely to learn organization skills unless you work on a large project you care about, most likely because you’re paid to care about it.

Related posts:

Launching missiles with Haskell

Haskell advocates are fond of saying that a Haskell function cannot launch missiles without you knowing it. Pure functions have no side effects, so they can only do what they purport to do. In a language that does not enforce functional purity, calling a function could have arbitrary side effects, including launching missiles. But this cannot happen in Haskell.

The difference between pure functional languages and traditional imperative languages is not quite that simple in practice.

Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where monads come in.

Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult.

Monads provide a sort of side-band for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles.

Once monads get large and complicated, it’s hard to know what side effects they hide. Maybe they can launch missiles after all. You can only be sure by studying the source code. Now how do you know that calling a C function, for example, doesn’t launch missiles? You study the source code. In that sense Haskell and C aren’t entirely different.

The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.

Related post: Monads are hard because …

Information hiding

One of the basic principles of software development is information hiding. People agree that it’s desirable, but may not realize they have different ideas of what it means. And when done poorly, well-meaning attempts to make software more maintainable backfire. Leo Brodie cautions

… we should clarify. From what, or whom, are we hiding information?

[T]raditional languages … bend over backwards to ensure that modules hide internal routines and data structures from other modules. The goal is to achieve module independence (a minimum coupling). The fear seems to be that modules strive to attack each other like alien antibodies. Or else, that evil bands of marauding modules are out to clobber the precious family data structures.

This is not what we’re concerned about. The purpose of hiding information, as we mean it, is simply to minimize the effects of a possible design-change by localizing things that might change within each component.

Quote from Thinking Forth. Emphasis added.


Managing Emacs windows

When you have Emacs split into multiple windows, how do you move your cursor between windows? How do you move the windows around?

Update (27 Jan 2015): You may want to skip the post below and use ace-window.  I plan to use it rather than my solution below. Also, see Sacha Chua’s video on window management, including using ace-window.

Moving the cursor between windows

You can use C-x o to move the cursor to the “other” window. That works great when you only have two windows: C-x o toggles between them.

When you have more windows, C-x o moves to the next window in top-to-bottom, left-to-right order. So if you have five windows, for example, you have to go to the “other” window up to four times to move to each of the windows.

The cursor is in the A window. Repeatedly executing C-x o will move the cursor to C, B, and D.

There are more direct commands:

  • windmove-up
  • windmove-down
  • windmove-left
  • windmove-right

You might map these to Control or Alt plus the arrow keys to have a memorable key sequence to navigate windows. However, such keybindings may conflict with existing keybindings, particularly in org-mode.

Moving windows

You can move windows around with the buffer-move. (Technically the windows stay fixed and the buffer contents moves around.)

Super and Hyper keys

Since the various Control arrow keys and Alt arrow keys are spoken for in my configuration, I thought I’d create Super and Hyper keys so that Super-up would map to windmove-up etc.

Several articles recommend mapping the Windows key to Super and the Menu key to Hyper on Windows. The former is problematic because so many of the Windows key combinations are already used by the OS. I did get the latter to work by adding the following to the branch of my init.el that runs on Windows.

        (setq w32-pass-apps-to-system nil
              w32-apps-modifier 'hyper)
        (global-set-key (kbd "<H-up>")    'windmove-up)
        (global-set-key (kbd "<H-down>")  'windmove-down)
        (global-set-key (kbd "<H-left>")  'windmove-left)
        (global-set-key (kbd "<H-right>") 'windmove-right)  

I haven’t figured out yet how to make Super or Hyper keys work on Linux. Suggestions for setting up Super and Hyper on Linux or Windows would be much appreciated.

Striving for simplicity, arriving at complexity

This post is a riff on a line from Mathematics without Apologies, the book I quoted yesterday.

In an all too familiar trade-off, the result of striving for ultimate simplicity is intolerable complexity; to eliminate too-long proofs we find ourselves “hopelessly lost” among the too-long definitions. [emphasis added]

It’s as if there’s some sort of conservation of complexity, but not quite in the sense of a physical conservation law. Conservation of momentum, for example, means that if one part of a system loses 5 units of momentum, other parts of the system have to absorb exactly 5 units of momentum. But perceived complexity is psychological, not physical, and the accounting is not the same. By moving complexity around we might increase or decrease the overall complexity.

The opening quote suggests that complexity is an optimization problem, not an accounting problem. It also suggests that driving the complexity of one part of a system to its minimum may disproportionately increase the complexity of another part. Striving for the simplest possible proofs, for example, could make the definitions much harder to digest. There’s a similar dynamic in programming languages and programs.

Larry Wall said that Scheme is a beautiful programming language, but every Scheme program is ugly. Perl, on the other hand, is ugly, but it lets you write beautiful programs. Scheme can be simple because it requires libraries and applications to implement functionality that is part of more complex languages. I had similar thoughts about COM. It was an elegant object system that lead to hideous programs.

Scheme is a minimalist programming language, and COM is a minimalist object framework. By and large the software development community prefers complex languages and frameworks in hopes of writing smaller programs. Additional complexity in languages and frameworks isn’t felt as strongly as additional complexity in application code. (Until something breaks. Then you might have to explore parts of the language or framework that you had blissfully ignored before.)

The opening quote deals specifically with the complexity of theorems and proofs. In context, the author was saying that the price of Grothendieck’s elegant proofs was a daunting edifice of definitions. (More on that here.) Grothendieck may have taken this to extremes, but many mathematicians agree with the general approach of pushing complexity out of theorems and into definitions. Michael Spivak defends this approach in the preface to his book Calculus on Manifolds.

… the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions … There are good reasons why the theorems should all be easy and the definitions hard. As the evolution of Stokes’ theorem revealed, a single simple principle can masquerade as several difficult results; the proofs of many theorems involve merely stripping away the disguise. The definitions, on the other hand, serve a twofold purpose: they are rigorous replacements for vague notions, and machinery for elegant proofs. [emphasis added]

Mathematicians like to push complexity into definitions like software developers like to push complexity into languages and frameworks. Both strategies can make life easier on professionals while making it harder on beginners.

Related post: A little simplicity goes a long way

Regular expression resources

Continuing the series of resource posts each Wednesday, this week we have notes on regular expressions:

See also blog posts tagged regular expressions.

For daily tips on regular expressions, follow @RegexTip on Twitter.

Regex tip icon

Last week: Probability resources

Next week: Numerical computing resources