Naming Awk

The Awk programming language was named after the initials of its creators. In the preface to a book that just came out, The AWK Programing Language, Second Edition, the authors give a little background on this.

Naming a language after its creators shows a certain paucity of imagination. In our defense, we didn’t have a better idea, and by coincidence, at some point in the process we were in three adjacent offices in the order Aho, Weinberger, and Kernighan.

By the way, here’s a nice line from near the end of the book.

Realistically, if you’re going to learn only one programming language, Python is the one. But for small programs typed at the command line, Awk is hard to beat.

A small programming language

Paul Graham said “Programming languages teach you not to want what they don’t provide.” He meant that as a negative: programmers using less expressive languages don’t know what they’re missing. But you could also take that as a positive: using a simple language can teach you that you don’t need features you thought you needed.

Awk

I read the original awk book recently, published in 1988. It’s a small book for a small language. The language has grown since 1988, especially the Gnu implementation gawk, and yet from the beginning the language had a useful set of features. Most of what has been added since then is of no use to me.

How I use awk

It has been years since I’ve written an awk program that is more than one line. If something would require more than one line of awk, I probably wouldn’t use awk. I’m not morally opposed to writing longer awk programs, but awk’s sweet spot is very short programs typed at the command line.

At one point when I was saying how I like little awk programs, someone suggested I use Perl one-liners instead because then I’d have access to Perl’s much richer set of features, in particular Perl regular expressions. Along those lines, see these notes on how to write Perl one-liners to mimic sed, grep, and awk.

But when I was reading the awk book I thought about how I rarely need the the features awk doesn’t have, not for the way I use awk. If I were writing a large program, not only would I want more features, I’d want a different language.

Now my response to the suggestion to use Perl one-liners would be that the simplicity of awk helps me focus by limiting my options. Awk is a jig. In Paul Graham’s terms, awk teaches me not to want what it doesn’t provide.

Regular expressions

At first I wished awk were more expressive is in its regular expression implementation. But awk’s minimal regex syntax is consistent with the aesthetic of the rest of the language. Awk has managed to maintain its elegant simplicity by resisting calls to add minor conveniences that would complicate the language. The maintainers are right not to add the regex features I miss.

Awk does not support, for example, \d for digits. You have to type [0-9] instead. In exchange for such minor inconveniences you get a simple but adequate regular expression implementation that you could learn quickly. See notes on awk’s regex features here.

The awk book describes regular expressions in four leisurely pages. Perl regular expressions are an order of magnitude more complex, but not an order of magnitude more useful.

 

Simple example of Kleisli composition

Mars Climate Orbiter, artist conception, via NASA

When a program needs to work with different systems of units, it’s best to consistently use one system for all internal calculations and convert to another system for output if necessary. Rigidly following this convention can prevent bugs, such as the one that caused the crash of the Mars Climate Orbiter.

For example, maybe you need to work in degrees and radians. It would be sensible to do all calculations in radians, because that’s what software libraries expect, and output results in degrees, because that’s what humans expect.

Now suppose you have a function that takes in a length and doubles it, and another function takes in a length and triples it. Both functions take in length in kilometers but print the result in miles.

You would like the composition of the two functions to multiply a length by six. And as before, the composition would take in a speed in kilometers and return a speed in miles.

Here’s how we could implement this badly.

    miles_per_km = 5/8 # approx

    def double(length_km): 
        return 2*length_km*miles_per_km

    def triple(length_km): 
        return 3*length_km*miles_per_km

    length_km = 8
    d = double(length_km)
    print("Double: ", d)
    t = triple(d)
    print("Triple: ", t)

This prints

    Double: 10.0
    Triple: 18.75

The second output should be 30, not 18.5. The result is wrong because we converted from kilometers to miles twice. The correct implementation would be something like the following.

    miles_per_km = 0.6213712

    def double(length_km): 
        d = 2*length_km
        print("Double: ", d*miles_per_km)
        return d

    def triple(length_km): 
        t = 3*length_km
        print("Triple: ", t*miles_per_km)
        return t

    length_km = 8
    d = double(length_km)
    t = triple(d)

This prints the right result.

    Double: 10.0 
    Triple: 30.0

In abstract terms, we don’t want the composition of f and g to be simply gf.

We have a function f from X to Y that we think of as our core function, and a function T that translates the output. Say f doubles its input and T translates from kilometers to miles. Let f* be the function that takes X to TY, i.e. the combination of f and translation.

Now take another function g from Y to Z and define g* as the function that takes Y to TZ. We want the composition of f* and g* to be

g* ∘ f* = T ∘ g ∘ f.

In the example above, we only want to convert from kilometers to miles once. This is exactly what Kleisli composition does. (“Kleisli” rhymes with “highly.”)

Kleisli composition is conceptually simple. Once you understand what it is, you can probably think of times when it’s what you wanted but you didn’t have a name for it.

Writing code to encapsulate Kleisli composition takes some infrastructure (i.e. monads), and that’s a little complicated, but the idea of what you’re trying to achieve is not. Notice in the example above, what the functions print is not what they return; the print statements are a sort of side channel. That’s the mark of a monad.

Kleisli categories

The things we’ve been talking about are formalized in terms of Kleisli categories. You start with a category C and define another category that has the same objects as C does but has a different notion of composition, i.e. Kleisli composition.

Given a monad T on C, the Kleisli category CT has the same objects as C. An arrow f* from X to Y in CT corresponds to an arrow f from X to TY in C. In symbols,

HomCT(X, Y) = HomC(X, TY).

Mr. Kleisli’s motivation for defining his categories was to answer a more theoretical question—whether all monads arise from adjunctions—but more practically we can think of Kleisli categories as a way of formalizing a variation on function composition.

Related posts

Getting pulled back in

“Just when I thought I was out, they pull me back in.” — Michael Corleone, The Godfather, Part 3

My interest in category theory goes in cycles. Something will spark my interest in it, and I’ll dig a little further. Then I reach my abstraction tolerance and put it back on the shelf. Then sometime later something else comes up and the cycle repeats. Each time I get a little further.

A conversation with a client this morning brought me back to the top of the cycle: category theory may be helpful in solving a concrete problem they’re working on.

I’m skeptical of applied category theory that starts with categories. I’m more bullish on applications that start from the problem domain, a discussion something like this.

“Here’s a pattern that we’re trying to codify and exploit.”

“Category theory has a name for that, and it suggests you might also have this other pattern or constraint.”

“Hmm. That sounds plausible. Let me check.”

I think of category theory as a pattern description language, a way to turn vague analogies into precise statements. Starting from category theory and looking for applications is less likely to succeed.

When I left academia the first time, I got a job as a programmer. My first assignment was to make some change to an old Fortran program, and I started asking a lot of questions about context. My manager cut me off saying “You’ll never get here from there.” I had to work bottom-up, starting from the immediate problem. That lesson has stuck with me ever since.

Sometimes you do need to start from the top and work your way down, going from abstract to concrete, but less often that I imagined early in my career.

Code katas taken more literally

Karate class

Code katas are programming exercises intended to develop programming skills, analogous to the way katas develop martial art skills.

But literal katas are choreographed. They are rituals rather than problem-solving exercises. There may be an element of problem solving, such as figuring how to better execute the prescribed movements, but katas are rehearsal rather than improvisation.

CodeKata.com brings up the analogy to musical practice in the opening paragraph of the home page. But musical practice is also more ritual than problem-solving, at least for classical music. A musician might go through major and minor scales in all 12 keys, then maybe a chromatic scale over the range of the instrument, then two different whole-tone scales, etc.

A code kata would be more like a jazz musician improvising a different melody to the same chord changes every day. (Richie Cole would show off by improvising over the chord changes to Cherokee in all twelve keys. I don’t know whether this was a ritual for him or something he would pull out for performances.)

This brings up a couple questions. What would a more literal analog of katas look like for programming? Would these be useful?

I could imagine someone going through a prescribed sequence of keystrokes that exercise a set of software features that they wanted to keep top of mind, sorta like practicing penmanship by writing out a pangram.

This is admittedly a kind of an odd idea. It makes sense that the kinds of exercises programmers are interested in require problem solving rather than recall. But maybe it would appeal to some people.

***

Image “karate training” by Genista is licensed under CC BY-SA 2.0 .

Visualizing C operator precedence

Here’s an idea for visualizing C operator precedence. You snake your way through the diagram starting from left to right.

Operators at the same precedence level are on the same horizontal level.

Following the arrows for changing directions, you move from left-to-right through the operators that associate left-to-right and you move right-to-left through the operators that associate right-to-left.

Although this diagram is specifically for C, many languages follow the same precedence with minor exceptions. For example, all operators that Perl shares with C follow the same precedence as C.

visualization of C operator precedence

Related posts

Tool recursion

“Literature about Lisp rarely resists that narcissistic pleasure of describing Lisp in Lisp.” — Christian Queinnec, Lisp in Small Pieces

 

Applying software development tools to themselves has a dark side and a light side.

There’s a danger of becoming obsessed with one’s tools and never getting around to using them. If it’s your job to cut down a tree, there’s some benefit to sharpening your ax, but you can’t only sharpen your ax. At some point you hit diminishing return and it’s time to start chopping.

But there are benefits to self-referential systems, such as macros that use Lisp to generate Lisp, or writing a C compiler in C, or using Emacs to tweak Emacs. There’s a kind of consistency that results, and there can be a compound return on effort. But as with writing a recursive procedure, there has to be a base case, a point at which you stop recursing. Otherwise you go down the black hole of becoming absorbed in your tool and never using it for work.

Even though I’ve used Emacs for a long time, I’ve never understood the recursive fascination some people have with it. For example, part of the elevator pitch for Emacs is that it’s self-documenting. You can pull up help on Emacs from inside Emacs. But you can also type your questions into a search engine without having to learn an arcane help API. What’s the advantage to the former?

For one thing, using Emacs help inside Emacs works without a network connection. For another, you avoid the risk of being distracted by something you see when you’re using your web browser. But the most subtle benefit is the compound effect of self-reference. You can use the same navigation commands in the help system that you can when editing text, you can execute code snippets in place, etc.

When I hear “Isn’t it cool that you can do X in X?” my first thought is “Yeah, that sounds cool” but my second thought is “But why would you want to do that? Sounds like it could be really hard.” I’m starting to appreciate that there are sometimes long-term benefits to these sort of recursive tool uses even if they’re not optimal in the short run.

Nota bene

NB

I was looking at the J programming language yesterday and I was amused to see that it uses “NB.” to mark the rest of a line of source code as a comment, just like # in Python or // in C++. This makes comments in J look like comments in English prose.

“NB” abbreviates the Latin phrase nota bene meaning “note well.” It’s been used to mark side notes in English for about three centuries.

Most programming languages couldn’t use “NB” or “NB.” as a comment marker because it would inconsistent with conventions for identifiers, but J’s unconventional code syntax allows it to use conventional English notation for comments.

Why J?

I was looking at J because I have a project looking at its younger sister Remora. As described in this paper,

Remora is a higher-order, rank-polymorphic array-processing programming language, in the same general class of languages as APL and J. It is intended for writing programs to be executed on parallel hardware.

J keeps the array-oriented core of APL but drops its infamous symbols. Remora syntax is even closer to the mainstream, being written like a Lisp. (Some might object that Lisp isn’t mainstream, but it sure is compared to APL or J.)

APL comment symbol

Learning about J’s comment marker made me curious what its APL counterpart was. APL had custom symbols for everything, including comments. Comments began with ⍝ (U+235D), the idea being that the symbol looked like a lamp, giving light to the poor soul mentally parsing code.

U+235D APL FUNCTIONAL SYMBOL UP SHOE JOT

The full name for the lamp symbol is “APL FUNCTIONAL SYMBOL UP SHOE JOT.” Since this section of code is explicitly for APL symbols, why not call the symbol  COMMENT or LAMP rather than UP SHOE JOT?

I suppose the comment symbol looks like the bottom of a shoe. There’s also a symbol ⍦ (U+2366) [1] with the name “APL FUNCTIONAL SYMBOL DOWN SHOE STILE”

APL FUNCTIONAL SYMBOL DOWN SHOE STILE

and so “up” and “down” must refer to the orientation of the part of the symbol that looks like ∩ and ∪. But what about “jot” and “stile”?

A jot is a small character. The name is related to the Greek letter iota (ι) and the Hebrew letter yod (י). But if ∩ and ∪ are a shoe, the “jot” is a fairly large circle. Does “jot” have some other meaning?

A “stile” is a step or a rung, as in a turnstile. I suppose the vertical bar on top of ∪ is a stile.

Related posts

[1] What is this character for in APL? Unicode includes it as an APL symbol, but it’s not included in Wikipedia’s list of APL symbols.

Monads and Macros

There are two techniques in software development that have an almost gnostic mystique about them: monads and macros.

Pride and pragmatism

As with everything people do, monads and macros are used with mixed motives, for pride and for pragmatism.

As for pride, monads and macros have just the right barrier to entry: high enough to keep out most programmers, but not so high as to be unsurmountable with a reasonable amount of effort. They’re challenging to learn, but not so challenging that you can’t show off what you’ve learned to a wide audience.

As for pragmatism, both monads and macros can be powerful in the right setting. They’re both a form of information hiding.

Monads let you concentrate on functions by providing a sort of side channel for information associated with those functions. For example, you may have a drawing program and want to compose a rotation and stretching. These may be ostensibly pure functions, but they also effect a log of operations that lets you undo actions. It would be burdensome to consider this log as an explicit argument passed into and returned from every operation. so you might keep this log information in a monad.

Macros let you hide the fact that your programming language doesn’t have features that you would like. Operator overloading is an example of adding the appearance of a new feature to a programming language. Macros take this much further, for better or for worse. If you think operator overloading is good because of its potential to clarify code, you’ll like macros. If you think operator overload is bad because of its potential for misuse, you definitely won’t like macros.

Mutually exclusive

Few people are excited about both monads and macros; only one person that I know comes to mind.

Monads and macros appeal to opposite urges: the urge to impose rules and the urge to get around rules. There is a time for both, a time to build structure and a time to tear structure down.

Monads are most popular in Haskell, and macros in Lisp. These are very different languages and their communities have very different values [1].

The ideal Haskell program has such rigid structure that only correct code will compile. The ideal Lisp program is so flexible that it is essentially a custom programming language.

A Haskell programmer would like to say that a program is easy to reason about because of its mathematical structure. A Lisp programmer would like to say a program is easy to read because it maps well onto the problem at hand.

Lisp enthusiast Doug Hoyte says in his book Let Over Lambda

As we now know, nobody truly understands macros.

A Haskell programmer would find this horrifying, but a Lisp programmer would consider it an acceptable price to pay or even something fun to explore.

Related posts

[1] Here I’m referring to archetypes, generalizations but not exaggerations. Of course no language community is monolithic, and an individual programmer will have different approaches to different tasks. But as a sweeping generalization, Haskell programmers value structure and Lisp programmers value flexibility.

Pareto and Pandas

This post muses about what it means to learn a software library. I’ll use Pandas as an example, but the post isn’t just about Pandas.

Suppose you say “I want to learn Pandas.” That implicitly assumes Pandas one thing, and in a sense it is. In another sense Pandas is hundreds of things.

At the top level, the pandas module (version 1.2.0) has 142 things inside.

    >>> import pandas as pd
    >>> len(dir(pd))
    142

The two most important things inside are the Series and DataFrame objects. They each in turn contain hundreds of things.

    >>> len(dir(pd.Series))
    434
    >>> len(dir(pd.DataFrame))
    441

That’s evidence Pandas’ diversity. But here’s evidence of it’s unity: most of the things inside these two objects have the same names.

    >>> s = set(dir(pd.Series))
    >>> d = set(dir(pd.DataFrame))
    >>> len(s.union(d))
    491
    >>> len(s - d)
    50
    >>> len(d - s)
    57

Pandas kinda has a fractal dimension, having both complexity and unity. The best way to think about it is not as one monolithic thing, or as hundreds of isolated things. It’s a coherent, but not perfectly coherent, collection of related things. This is true of all software libraries. Pandas is more coherent than most libraries because it was initially the product of one mind, that of Wes McKinney.

This has a couple implications for what it means to “learn Pandas.” Because Pandas is big, you have to explore it strategically, not exhaustively. And because Pandas is coherent, part of what it means to learn Pandas is to develop a feel for the way Pandas does things.

No one is going to learn Pandas by studying every object, every method on every object, and every argument to every method on every object. It’s too big. That’s also unnecessary.

There’s probably something like a Pareto distribution on the usefulness of features. The most commonly used features are used far, far more often than the most obscure features.

It would be interesting to do some kind of survey to see which features are actually used and how often. But I don’t think that’s practical. The easiest thing to do would be to find some large code base that heavily uses Pandas. But that’s not typical of how Pandas is used. Probably most lines of code using Pandas are scattered over millions of small scripts, much of it not in production code.

A well-designed library makes it possible to make good guesses about functionality you haven’t used. You learn the gestalt of the library. You can always look up API documentation as needed, but you can’t develop an intuition for a library just-in-time.

“Learn Pandas” is a daunting goal, and maybe an impossible goal if by “learn” you mean explore exhaustively. But “learn how to do my common tasks quickly in Pandas” and “develop a feel for how to do things in Pandas” are much smaller tasks.

Related posts