Simple example of Kleisli composition

Mars Climate Orbiter, artist conception, via NASA

When a program needs to work with different systems of units, it’s best to consistently use one system for all internal calculations and convert to another system for output if necessary. Rigidly following this convention can prevent bugs, such as the one that caused the crash of the Mars Climate Orbiter.

For example, maybe you need to work in degrees and radians. It would be sensible to do all calculations in radians, because that’s what software libraries expect, and output results in degrees, because that’s what humans expect.

Now suppose you have a function that takes in a length and doubles it, and another function takes in a length and triples it. Both functions take in length in kilometers but print the result in miles.

You would like the composition of the two functions to multiply a length by six. And as before, the composition would take in a speed in kilometers and return a speed in miles.

Here’s how we could implement this badly.

    miles_per_km = 5/8 # approx

    def double(length_km): 
        return 2*length_km*miles_per_km

    def triple(length_km): 
        return 3*length_km*miles_per_km

    length_km = 8
    d = double(length_km)
    print("Double: ", d)
    t = triple(d)
    print("Triple: ", t)

This prints

    Double: 10.0
    Triple: 18.75

The second output should be 30, not 18.5. The result is wrong because we converted from kilometers to miles twice. The correct implementation would be something like the following.

    miles_per_km = 0.6213712

    def double(length_km): 
        d = 2*length_km
        print("Double: ", d*miles_per_km)
        return d

    def triple(length_km): 
        t = 3*length_km
        print("Triple: ", t*miles_per_km)
        return t

    length_km = 8
    d = double(length_km)
    t = triple(d)

This prints the right result.

    Double: 10.0 
    Triple: 30.0

In abstract terms, we don’t want the composition of f and g to be simply gf.

We have a function f from X to Y that we think of as our core function, and a function T that translates the output. Say f doubles its input and T translates from kilometers to miles. Let f* be the function that takes X to TY, i.e. the combination of f and translation.

Now take another function g from Y to Z and define g* as the function that takes Y to TZ. We want the composition of f* and g* to be

g* ∘ f* = T ∘ g ∘ f.

In the example above, we only want to convert from kilometers to miles once. This is exactly what Kleisli composition does. (“Kleisli” rhymes with “highly.”)

Kleisli composition is conceptually simple. Once you understand what it is, you can probably think of times when it’s what you wanted but you didn’t have a name for it.

Writing code to encapsulate Kleisli composition takes some infrastructure (i.e. monads), and that’s a little complicated, but the idea of what you’re trying to achieve is not. Notice in the example above, what the functions print is not what they return; the print statements are a sort of side channel. That’s the mark of a monad.

Kleisli categories

The things we’ve been talking about are formalized in terms of Kleisli categories. You start with a category C and define another category that has the same objects as C does but has a different notion of composition, i.e. Kleisli composition.

Given a monad T on C, the Kleisli category CT has the same objects as C. An arrow f* from X to Y in CT corresponds to an arrow f from X to TY in C. In symbols,

HomCT(X, Y) = HomC(X, TY).

Mr. Kleisli’s motivation for defining his categories was to answer a more theoretical question—whether all monads arise from adjunctions—but more practically we can think of Kleisli categories as a way of formalizing a variation on function composition.

Related posts

Getting pulled back in

“Just when I thought I was out, they pull me back in.” — Michael Corleone, The Godfather, Part 3

My interest in category theory goes in cycles. Something will spark my interest in it, and I’ll dig a little further. Then I reach my abstraction tolerance and put it back on the shelf. Then sometime later something else comes up and the cycle repeats. Each time I get a little further.

A conversation with a client this morning brought me back to the top of the cycle: category theory may be helpful in solving a concrete problem they’re working on.

I’m skeptical of applied category theory that starts with categories. I’m more bullish on applications that start from the problem domain, a discussion something like this.

“Here’s a pattern that we’re trying to codify and exploit.”

“Category theory has a name for that, and it suggests you might also have this other pattern or constraint.”

“Hmm. That sounds plausible. Let me check.”

I think of category theory as a pattern description language, a way to turn vague analogies into precise statements. Starting from category theory and looking for applications is less likely to succeed.

When I left academia the first time, I got a job as a programmer. My first assignment was to make some change to an old Fortran program, and I started asking a lot of questions about context. My manager cut me off saying “You’ll never get here from there.” I had to work bottom-up, starting from the immediate problem. That lesson has stuck with me ever since.

Sometimes you do need to start from the top and work your way down, going from abstract to concrete, but less often that I imagined early in my career.

Code katas taken more literally

Karate class

Code katas are programming exercises intended to develop programming skills, analogous to the way katas develop martial art skills.

But literal katas are choreographed. They are rituals rather than problem-solving exercises. There may be an element of problem solving, such as figuring how to better execute the prescribed movements, but katas are rehearsal rather than improvisation. brings up the analogy to musical practice in the opening paragraph of the home page. But musical practice is also more ritual than problem-solving, at least for classical music. A musician might go through major and minor scales in all 12 keys, then maybe a chromatic scale over the range of the instrument, then two different whole-tone scales, etc.

A code kata would be more like a jazz musician improvising a different melody to the same chord changes every day. (Richie Cole would show off by improvising over the chord changes to Cherokee in all twelve keys. I don’t know whether this was a ritual for him or something he would pull out for performances.)

This brings up a couple questions. What would a more literal analog of katas look like for programming? Would these be useful?

I could imagine someone going through a prescribed sequence of keystrokes that exercise a set of software features that they wanted to keep top of mind, sorta like practicing penmanship by writing out a pangram.

This is admittedly a kind of an odd idea. It makes sense that the kinds of exercises programmers are interested in require problem solving rather than recall. But maybe it would appeal to some people.


Image “karate training” by Genista is licensed under CC BY-SA 2.0 .

Visualizing C operator precedence

Here’s an idea for visualizing C operator precedence. You snake your way through the diagram starting from left to right.

Operators at the same precedence level are on the same horizontal level.

Following the arrows for changing directions, you move from left-to-right through the operators that associate left-to-right and you move right-to-left through the operators that associate right-to-left.

Although this diagram is specifically for C, many languages follow the same precedence with minor exceptions. For example, all operators that Perl shares with C follow the same precedence as C.

visualization of C operator precedence

Related posts

Tool recursion

“Literature about Lisp rarely resists that narcissistic pleasure of describing Lisp in Lisp.” — Christian Queinnec, Lisp in Small Pieces


Applying software development tools to themselves has a dark side and a light side.

There’s a danger of becoming obsessed with one’s tools and never getting around to using them. If it’s your job to cut down a tree, there’s some benefit to sharpening your ax, but you can’t only sharpen your ax. At some point you hit diminishing return and it’s time to start chopping.

But there are benefits to self-referential systems, such as macros that use Lisp to generate Lisp, or writing a C compiler in C, or using Emacs to tweak Emacs. There’s a kind of consistency that results, and there can be a compound return on effort. But as with writing a recursive procedure, there has to be a base case, a point at which you stop recursing. Otherwise you go down the black hole of becoming absorbed in your tool and never using it for work.

Even though I’ve used Emacs for a long time, I’ve never understood the recursive fascination some people have with it. For example, part of the elevator pitch for Emacs is that it’s self-documenting. You can pull up help on Emacs from inside Emacs. But you can also type your questions into a search engine without having to learn an arcane help API. What’s the advantage to the former?

For one thing, using Emacs help inside Emacs works without a network connection. For another, you avoid the risk of being distracted by something you see when you’re using your web browser. But the most subtle benefit is the compound effect of self-reference. You can use the same navigation commands in the help system that you can when editing text, you can execute code snippets in place, etc.

When I hear “Isn’t it cool that you can do X in X?” my first thought is “Yeah, that sounds cool” but my second thought is “But why would you want to do that? Sounds like it could be really hard.” I’m starting to appreciate that there are sometimes long-term benefits to these sort of recursive tool uses even if they’re not optimal in the short run.

Nota bene


I was looking at the J programming language yesterday and I was amused to see that it uses “NB.” to mark the rest of a line of source code as a comment, just like # in Python or // in C++. This makes comments in J look like comments in English prose.

“NB” abbreviates the Latin phrase nota bene meaning “note well.” It’s been used to mark side notes in English for about three centuries.

Most programming languages couldn’t use “NB” or “NB.” as a comment marker because it would inconsistent with conventions for identifiers, but J’s unconventional code syntax allows it to use conventional English notation for comments.

Why J?

I was looking at J because I have a project looking at its younger sister Remora. As described in this paper,

Remora is a higher-order, rank-polymorphic array-processing programming language, in the same general class of languages as APL and J. It is intended for writing programs to be executed on parallel hardware.

J keeps the array-oriented core of APL but drops its infamous symbols. Remora syntax is even closer to the mainstream, being written like a Lisp. (Some might object that Lisp isn’t mainstream, but it sure is compared to APL or J.)

APL comment symbol

Learning about J’s comment marker made me curious what its APL counterpart was. APL had custom symbols for everything, including comments. Comments began with ⍝ (U+235D), the idea being that the symbol looked like a lamp, giving light to the poor soul mentally parsing code.


The full name for the lamp symbol is “APL FUNCTIONAL SYMBOL UP SHOE JOT.” Since this section of code is explicitly for APL symbols, why not call the symbol  COMMENT or LAMP rather than UP SHOE JOT?

I suppose the comment symbol looks like the bottom of a shoe. There’s also a symbol ⍦ (U+2366) [1] with the name “APL FUNCTIONAL SYMBOL DOWN SHOE STILE”


and so “up” and “down” must refer to the orientation of the part of the symbol that looks like ∩ and ∪. But what about “jot” and “stile”?

A jot is a small character. The name is related to the Greek letter iota (ι) and the Hebrew letter yod (י). But if ∩ and ∪ are a shoe, the “jot” is a fairly large circle. Does “jot” have some other meaning?

A “stile” is a step or a rung, as in a turnstile. I suppose the vertical bar on top of ∪ is a stile.

Related posts

[1] What is this character for in APL? Unicode includes it as an APL symbol, but it’s not included in Wikipedia’s list of APL symbols.

Monads and Macros

There are two techniques in software development that have an almost gnostic mystique about them: monads and macros.

Macro burrito

The photo is a pun [1].

Pride and pragmatism

As with everything people do, monads and macros are used with mixed motives, for pride and for pragmatism.

As for pride, monads and macros have just the right barrier to entry: high enough to keep out most programmers, but not so high as to be unsurmountable with a reasonable amount of effort. They’re challenging to learn, but not so challenging that you can’t show off what you’ve learned to a wide audience.

As for pragmatism, both monads and macros can be powerful in the right setting. They’re both a form of information hiding.

Monads let you concentrate on functions by providing a sort of side channel for information associated with those functions. For example, you may have a drawing program and want to compose a rotation and stretching. These may be ostensibly pure functions, but they also effect a log of operations that lets you undo actions. It would be burdensome to consider this log as an explicit argument passed into and returned from every operation. so you might keep this log information in a monad.

Macros let you hide the fact that your programming language doesn’t have features that you would like. Operator overloading is an example of adding the appearance of a new feature to a programming language. Macros take this much further, for better or for worse. If you think operator overloading is good because of its potential to clarify code, you’ll like macros. If you think operator overload is bad because of its potential for misuse, you definitely won’t like macros.

Mutually exclusive

Few people are excited about both monads and macros; only one person that I know comes to mind.

Monads and macros appeal to opposite urges: the urge to impose rules and the urge to get around rules. There is a time for both, a time to build structure and a time to tear structure down.

Monads are most popular in Haskell, and macros in Lisp. These are very different languages and their communities have very different values [2].

The ideal Haskell program has such rigid structure that only correct code will compile. The ideal Lisp program is so flexible that it is essentially a custom programming language.

A Haskell programmer would like to say that a program is easy to reason about because of its mathematical structure. A Lisp programmer would like to say a program is easy to read because it maps well onto the problem at hand.

Lisp enthusiast Doug Hoyte says in his book Let Over Lambda

As we now know, nobody truly understands macros.

A Haskell programmer would find this horrifying, but a Lisp programmer would consider it an acceptable price to pay or even something fun to explore.

Related posts

[1] “Giant veggie burrito” by dphiffer is licensed with CC BY 2.0. It’s a macro burrito. There are myriad monad metaphors, including famously one saying a monad is a burrito.

[2] Here I’m referring to archetypes, generalizations but not exaggerations. Of course no language community is monolithic, and an individual programmer will have different approaches to different tasks. But as a sweeping generalization, Haskell programmers value structure and Lisp programmers value flexibility.

Pareto and Pandas

This post muses about what it means to learn a software library. I’ll use Pandas as an example, but the post isn’t just about Pandas.

Suppose you say “I want to learn Pandas.” That implicitly assumes Pandas one thing, and in a sense it is. In another sense Pandas is hundreds of things.

At the top level, the pandas module (version 1.2.0) has 142 things inside.

    >>> import pandas as pd
    >>> len(dir(pd))

The two most important things inside are the Series and DataFrame objects. They each in turn contain hundreds of things.

    >>> len(dir(pd.Series))
    >>> len(dir(pd.DataFrame))

That’s evidence Pandas’ diversity. But here’s evidence of it’s unity: most of the things inside these two objects have the same names.

    >>> s = set(dir(pd.Series))
    >>> d = set(dir(pd.DataFrame))
    >>> len(s.union(d))
    >>> len(s - d)
    >>> len(d - s)

Pandas kinda has a fractal dimension, having both complexity and unity. The best way to think about it is not as one monolithic thing, or as hundreds of isolated things. It’s a coherent, but not perfectly coherent, collection of related things. This is true of all software libraries. Pandas is more coherent than most libraries because it was initially the product of one mind, that of Wes McKinney.

This has a couple implications for what it means to “learn Pandas.” Because Pandas is big, you have to explore it strategically, not exhaustively. And because Pandas is coherent, part of what it means to learn Pandas is to develop a feel for the way Pandas does things.

No one is going to learn Pandas by studying every object, every method on every object, and every argument to every method on every object. It’s too big. That’s also unnecessary.

There’s probably something like a Pareto distribution on the usefulness of features. The most commonly used features are used far, far more often than the most obscure features.

It would be interesting to do some kind of survey to see which features are actually used and how often. But I don’t think that’s practical. The easiest thing to do would be to find some large code base that heavily uses Pandas. But that’s not typical of how Pandas is used. Probably most lines of code using Pandas are scattered over millions of small scripts, much of it not in production code.

A well-designed library makes it possible to make good guesses about functionality you haven’t used. You learn the gestalt of the library. You can always look up API documentation as needed, but you can’t develop an intuition for a library just-in-time.

“Learn Pandas” is a daunting goal, and maybe an impossible goal if by “learn” you mean explore exhaustively. But “learn how to do my common tasks quickly in Pandas” and “develop a feel for how to do things in Pandas” are much smaller tasks.

Related posts


Programmers like highly expressive programming languages, but programming managers do not. I wrote about this on Twitter a few months ago.

Q: Why do people like Lisp so much?

A: Because Lisp is so expressive.

Q: Why don’t teams use Lisp much?

A: Because Lisp is so expressive.

Q: Why do programmers complain about Java?

A: Because it’s not that expressive.

Q: Why do businesses use Java?

A: Because it’s not that expressive.

A highly expressive programming language offers lots of options. This can be a good thing. It makes programming more fun, and it can lead to better code. But it can also lead to more idiosyncratic code.

A large programming language like Perl allows developers to carve out language subsets that hardly overlap. A team member has to learn not only the parts of the language he understands and wants to use, but also all the parts that his colleagues might use. And those parts that he might accidentally use.

While Perl has maximal syntax, Lisp has minimal syntax. But Lisp is also very expressive, albeit in a different way. Lisp makes it very easy to extend the language via macros. While Perl is a big language, Lisp is an extensible language. This can also lead to each programmer practically having their own language.

With great expressiveness comes great responsibility. A team using a highly expressive language needs to develop conventions for how the language will be used in order to avoid fracturing into multiple de facto languages.

But what if you’re a team of one? Now you don’t need to be as concerned how other people use your language. You still may need to care somewhat. You want to be able to grab sample code online, and you may want to share code or ask others for help. It pays not to be entirely idiosyncratic, though you’re free to wander further from the mainstream.

Even when you’re working in a team, you still may have code that only you use. If your team is producing C# code, and you secretively use a Perl script to help you find things in the code, no one needs to know. On the other hand, there’s a tendency for personal code to become production code, and so personal tools in a team environment are tricky.

But if you’re truly working by yourself, you have great freedom in your choice of tools. This can take a long time to sort out when you leave a team environment to strike out on your own. You may labor under your previous restrictions for a while before realizing they’re no longer necessary. At the same time, you may choose to stick to your old tools, not because they’re optimal for your new situation, but because it’s not worth the effort to retool.

Related posts

(Regarding the last link, think myth as in Joseph Campbell, not myth as in Myth Busters.)

From shell to system

Routine computer tasks and system programming require different tools, though I’m not entirely sure why.

Many people have thought about how inconsistent shells and system programming languages are and tried to unite them. Wouldn’t it be nice to use one language for everything? But attempts to bring system languages down to the shell, or to push shell programming up to large programs, have not been very successful.

I learned Perl in college so I wouldn’t have to learn shell programming. That’s what Perl was initially designed to be: an alternative to shell scripting. Larry Wall called Perl a “distillation of Unix culture.”

Perl is the most disliked programming language according to Stack Overflow. And yet I imagine many who complain about Perl gladly use the menagerie of quirky tools that Perl was created to unify. Bash is popular while Perl is unpopular, and yet the quirkiest parts of Perl are precisely those it shares with bash.

I expect much of the frustration with Perl comes from using it as a language for writing larger programs. Perl is very terse and expressive. These features are assets for one-liners and individual use. They are liabilities for large programs and team development.

Compared to a system programming language like Java, Perl is complex, inconsistent, and unsafe. But compared to shell scripting, Perl is simple, consistent, and safe!

Related posts