Better parts, worse system

There’s a rule of systems thinking that improving part of a system can often make the system as a whole worse.

One example of this is Braess’ Paradox that adding roads can make traffic worse, and closing roads can improve traffic.

Another example is the paradox of enrichment: Increasing the food available to an ecosystem may lead to instability, and even to extinction.

Richard Hamming put it this way in what he called the first rule of systems engineering:

If you optimize the components, you’ll probably ruin the system performance.

For more variations on this principle, see this Twitter thread.

I’ve been thinking about this lately in regard to software tools. If you’re just starting out and ask around for the best way to do this and that, you might get individual bits of good advice that add up to bad advice.

“You’re still using X? You should be using Y. Y’s better.”

When someone says Y is “better” they probably mean that it has more features. It may also be objectively better by several other criteria. But that doesn’t mean it’s better for you, at this point in time, given your experience, your temperament, and your project.

The Dreyfus model of skill acquisition says that beginners want context-free rules: Y is better than X. That’s an understandable desire, and people are all too willing to give context-free advice. But context matters. A lot.

Yesterday I needed to do a little search-and-replace text munging, join two data sets, and compute some basic statistics. I thought about how I could go off on tangents for each task, using the best tool for each task individually, and take forever to get my job done. Instead, I cobbled something together quickly at a command line, with each step being far from the most general solution.

If you choose the best (i.e. biggest) tool for each task, you’ll get affirmation, or at least avoid criticism, for each choice. And you’ll likely end up with an unwieldy hodgepodge of tools, none of which you’re confident in using. If instead you think carefully about your abilities and your needs, you’ll gradually assemble a collection of tools that work together for you,  making you more confident and productive.

Offline documentation

It’s simpler to search the web than to search software-specific documentation. You can just type your query into a search engine and not have to be bothered by the differences in offline documentation systems for different software. But there are a couple disadvantages.

First, the result may not be that relevant. For example, maybe you have a question about LaTeX typesetting and you get back results about rubber. And even if the result is relevant, it might not be accurate or up-to-date.

Second, you might not always be online. You might lose your internet connection, or you might deliberately stay offline for a block of time in order to concentrate better.

A convenient way to grab online documentation for a lot of software packages is to use Dash for macOS or Zeal on Windows and Linux.

If you use a particular piece of software a lot, you probably want to learn how to use its native documentation system. It’s hard to do this for lots of different tools, hence the popularity of the generic web search, but it’s worthwhile for a small number of high priority tools.

Technological boundary layer

The top sides of your ceiling fan blades are dusty because of a boundary layer effect. When the blades spin, a thin layer of air above the blades moves with the blades. That’s why the fan doesn’t throw off the dust.

A mathematical model may have very different behavior in two large regions, with a thin region of rapid transition between the two, such as the transition between the ceiling fan blade and the circulating air in the room. That transition region is called a boundary layer. In this post I want to use the term boundary layer metaphorically.

Some information you use so frequently that you memorize it without trying. And some information you use so infrequently that it’s not worth memorizing it.

There’s not much need to deliberately memorize how software tools work because that information mostly falls into one of the two categories above. Either you use the tool so often that you remember how to use it, or you use the tool so rarely that it’s best to look up how to use it just-in-time.

But there’s a thin region between these two categories: things you use often enough that it’s annoying to keep looking up how to use them, but not so often that you remember the details. In this technological boundary layer, it might be worthwhile to deliberately memorize and periodically review how things work. Maybe this takes the form of flashcards [1] or exercises.

This boundary layer must be kept very small. If you spend more than a few minutes a day on it, you’re probably taking on too much. YMMV.

Things may move in and out of your technological boundary layer over time. Maybe you use or review something so much that it moves into long-term memory. Or maybe you use it less often and decide to let it slip into things you look up as needed.

Related posts

[1] Paper or electronic? In my experience, making paper flashcards helps me memorize things, even if I don’t review them. But I’ve also used the Anki software before and found it helpful.

Just-in-case revisited

Just-in-time learning means learning something just when you need it. The alternative is just-in-case, learning something in case you need it. I discussed this in an earlier post, and today I’d like to add a little to that discussion.

There are some things you need to know (or at least be familiar with) before you have a chance to use them. Here’s a variation on that idea: some things you need to have practiced before you need them in order to overcome an effort barrier.

Suppose you tell yourself that you’ll learn to use Photoshop or GIMP when you need to. Then you need to edit a photo. Faced with the prospect of learning either of these software packages, you might decide that the photo in question looks good enough after all.

There are things that in principle you could learn just-in-time, though in practice this is not psychologically feasible. The mental “activation energy” is too high. Some things you need to practice before hand, not because you couldn’t look them up when needed, but because they would be too daunting to learn when needed.

Related post: Bicycle skills

Simultaneous projects

I said something to my wife this evening to the effect that it’s best for employees to have one or at most two projects at a time. Two is good because you can switch off when you’re tired of one project or if you’re waiting on input. But with three or more projects you spend a lot of time task switching.

She said “But …” and I immediately knew what she was thinking. I have a lot more than two projects going on. In fact, I would have to look at my project tracker to know exactly how many projects I have going on right now. How does this reconcile with my statement that two projects is optimal?

Unless you’re doing staff augmentation contracting, consulting work is substantially different from salaried work. For one thing, projects tend to be smaller and better defined.

Also consultants, at least in my experience, spend a lot of time waiting on clients, especially when the clients are lawyers. So you take on more work than you could handle if everyone wanted your attention at once. At least you work up to that if you can. You balance the risk of being overwhelmed against the risk of not having enough work to do.

Working for several clients in a single day is exhausting, but that’s usually not necessary. My ideal is to do work for one or two clients each day, even if I have a lot of clients who are somewhere between initial proposal and final invoice.

Make boring work harder

I was searching for something this morning and ran across several pages where someone blogged about software they wrote to help write their dissertations. It occurred to me that this is a pattern: I’ve seen a lot of writing tools that came out of someone writing a dissertation or some other book.

The blog posts leave the impression that the tools required more time to develop than they would save. This suggests that developing the tools was a form of moral compensation, procrastinating by working on something that feels like it’s making a contribution to what you ought to be doing.

Even so, developing the tools may have been a good idea. As with many things in life, it makes more sense when you ask “Compared to what“? If the realistic alternative to futzing around with scripts was to write another chapter of the dissertation, then developing the tools was not the best use of time, assuming they don’t actually save more time than they require.

But if the realistic alternative was binge watching some TV series, then writing the tools may have been a very good use of time. Any time the tools save is profit if the time that went into developing them would otherwise have been wasted.

Software developers are often criticized for developing tools rather than directly developing the code they’re paid to write. Sometimes these tools really are a good investment. But even when they’re not, they may be better than the realistic alternative. They may take time away from Facebook rather than time away from writing production code.

Another advantage to tool building, aside from getting some benefit from time that otherwise would have been wasted, is that it builds momentum. If you can’t bring yourself to face the dissertation, but you can bring yourself to write a script for writing your dissertation, you might feel more like facing the dissertation afterward.

Related post: Automate to save mental energy, not time

What sticks in your head

This morning I read an article by Dennis Felsing about his impressive/intimidating Linux desktop setup. He uses a lot of tools that are not the easiest way to get things done immediately but are long-term productivity investments.

Remembrance of syntax past

Felsing apparently is able to remember the syntax of scores of tools and programming languages. I cannot. Part of the reason is practice. I cannot remember the syntax of any software I don’t use regularly. It’s tempting to say that’s the end of the story: use it or lose it. Everybody has their set of things they use regularly and remember.

But I don’t think that’s all. I remember bits of math that I haven’t used in 30 years. Math fits in my head and sticks. Presumably software syntax sticks in the heads of people who use a lot of software tools.

There is some software syntax I can remember, however, and that’s software closely related to math. As I commented here, it was easy to come back to Mathematica and LaTeX after not using them for a few years.

Imprinting

Imprinting has something to do with this too: it’s easier to remember what we learn when we’re young. Felsing says he started using Linux in 2006, and his site says he graduated college in 2012, so presumably he was a high school or college student when he learned Linux.

When I was a student, my software world consisted primarily of Unix, Emacs, LaTeX, and Mathematica. These are all tools that I quit using for a few years, later came back to, and use today. I probably remember LaTeX and Mathematica syntax in part because I used it when I was a student. (I also think Mathematica in particular has an internal consistency that makes its syntax easier to remember.)

Picking your memory battles

I see the value in Felsing’s choice of tools. For example, the xmonad window manager. I’ve tried it, and I could imagine that it would make you more productive if you mastered it. But I don’t see myself mastering it.

I’ve learned a few tools with lots of arbitrary syntax, e.g. Emacs. But since I don’t have a prodigious memory for such things, I have to limit the number of tools I try to keep loaded in memory. Other things I load as needed, such as a language a client wants me to use that I haven’t used in a while.

Revisiting a piece of math doesn’t feel to me like revisiting a programming language. Brushing up on something from differential equations, for example, feels like pulling a book off a mental shelf. Brushing up on C# feels like driving to a storage unit, bringing back an old couch, and struggling to cram it in the door.

Middle ground

There are things you use so often that you remember their syntax without trying. And there are things you may never use again, and it’s not worth memorizing their syntax just in case. Some things in the middle, things you don’t use often enough to naturally remember, but often enough that you’d like to deliberately remember them. Some of these are what I call bicycle skills, things that you can’t learn just-in-time. For things in this middle ground, you might try something like Anki, a flashcard program with spaced repetition.

However, this middle ground should be very narrow, at least in my experience/opinion. For the most part, if you don’t use something often enough to keep it loaded in memory, I’d say either let it go or practice using it regularly.

Related posts

The hard part in becoming a command line wizard

I’ve long been impressed by shell one-liners. They seem like magical incantations. Pipe a few terse commands together, et voilà! Out pops the solution to a problem that would seem to require pages of code.

Source http://dilbert.com/strip/1995-06-24

Are these one-liners real or mythology? To some extent, they’re both. Below I’ll give a famous real example. Then I’ll argue that even though such examples do occur, they may create unrealistic expectations.

Bentley’s exercise

In 1986, Jon Bentley posted the following exercise:

Given a text file and an integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.

Donald Knuth wrote an elegant program in response. Knuth’s program runs for 17 pages in his book Literate Programming.

McIlroy’s solution is short enough to quote below [1].

    tr -cs A-Za-z '
    ' |
    tr A-Z a-z |
    sort |
    uniq -c |
    sort -rn |
    sed ${1}q

McIlroy’s response to Knuth was like Abraham Lincoln’s response to Edward Everett at Gettysburg. Lincoln’s famous address was 50x shorter than that of the orator who preceded him [2]. (Update: There’s more to the story. See [3].)

Knuth and McIlroy had very different objectives and placed different constraints on themselves, and so their solutions are not directly comparable. But McIlroy’s solution has become famous. Knuth’s solution is remembered, if at all, as the verbose program that McIlroy responded to.

The stereotype of a Unix wizard is someone who could improvise programs like the one above. Maybe McIlroy carefully thought about his program for days, looking for the most elegant solution. That would seem plausible, but in fact he says the script was “written on the spot and worked on the first try.” He said that the script was similar to one he had written a year before, but it still counts as an improvisation.

Why can’t I write scripts like that?

McIlroy’s script was a real example of the kind of wizardry attributed to Unix adepts. Why can’t more people quickly improvise scripts like that?

The exercise that Bentley posed was the kind of problem that programmers like McIlroy solved routinely at the time. The tools he piped together were developed precisely for such problems. McIlroy didn’t see his solution as extraordinary but said “Old UNIX hands know instinctively how to solve this one in a jiffy.”

The traditional Unix toolbox is full of utilities for text manipulation. Not only are they useful, but they compose well. This composability depends not only on the tools themselves, but also the shell environment they were designed to operate in. (The latter is why some utilities don’t work as well when ported to other operating systems, even if the functionality is duplicated.)

Bentley’s exercise was clearly text-based: given a text file, produce a text file. What about problems that are not text manipulation? The trick to being productive from a command line is to turn problems into text manipulation problems.  The output of a shell command is text. Programs are text. Once you get into the necessary mindset, everything is text. This may not be the most efficient approach to a given problem, but it’s a possible strategy.

The hard part

The hard part on the path to becoming a command line wizard, or any kind of wizard, is thinking about how to apply existing tools to your particular problems. You could memorize McIlroy’s script and be prepared next time you need to report word frequencies, but applying the spirit of his script to your particular problems takes work. Reading one-liners that other people have developed for their work may be inspiring, or intimidating, but they’re no substitute for thinking hard about your particular work.

Repetition

You get faster at anything with repetition. Maybe you don’t solve any particular kind of problem often enough to be fluent at solving it. If someone can solve a problem by quickly typing a one-liner in a shell, maybe they are clever, or maybe their job is repetitive. Or maybe both: maybe they’ve found a way to make semi-repetitive tasks repetitive enough to automate. One way to become more productive is to split semi-repetitive tasks into more creative and more repetitive parts.

More command line posts

[1] The odd-looking line break is a quoted newline.

[2] Everett’s speech contained 13,607 words while Lincoln’s Gettysburg Address contained 272, a ratio of almost exactly 50 to 1.

[3] See Hillel Wayne’s post Donald Knuth was Framed. Here’s an excerpt:

Most of the “eight pages” aren’t because Knuth is doing LP [literate programming], but because he’s Donald Knuth:

  • One page is him setting up the problem (“what do we mean by ‘word’? What if multiple words share the same frequency?”) and one page is just the index.
  • Another page is just about working around specific Pascal issues no modern language has, like “how do we read in an integer” and “how do we identify letters when Pascal’s character set is poorly defined.”
  • Then there’s almost four pages of handrolling a hash trie.

The “eight pages” refers to the length of the original publication. I described the paper as 17 pages because that the length in the book where I found it.

Why “work smarter, not harder” bothers me

welder working hard

One of my most popular posts on Twitter was an implicit criticism of the cliché “work smarter, not harder.”

I agree with the idea that you can often be more productive by stepping back and thinking about what you’re doing. I’ve written before, for example, that programmers need to spend less time in front of a computer.

But one thing I don’t like about “work smarter” is the implication that being smart eliminates the need to work hard. It’s like a form of gnosticism.

Also, “working smarter” is kind of a given. People don’t often say “I know of a smarter way to do this, but I prefer working hard at the dumb way.” [1] Instead, they’re being as smart as they know how, or at least they think they are. To suggest otherwise is to insult their intelligence.

One way to “work smarter, not harder” is to take good advice. This is different from “working smarter” in the sense of thinking alone in an empty room, waiting for a flash of insight. Maybe you’re doing what you’re doing as well as you can, but you should be doing something else. Maybe you’re cleverly doing something that doesn’t need to be done.

Related links

[1] If they do, they’re still being smart at a different level. Someone might think “Yeah, I know of a way to do this that would be more impressive. But I’m going to take a more brute-force approach that’s more likely to be correct.” Or they might think “I could imagine a faster way to do this, but I’m too tired right now to do that.” They’re still being optimal, but they’re including more factors in the objective they’re trying to optimize.

Pareto’s 80-20 rule

Vilfredo Pareto

Pareto’s 80-20 rule says that 80% of your results often come from 20% of your effort. Maybe 80% of your profit comes from 20% of your customers, or maybe 80% of the bugs in your software are removed in the first 20% of the time you spend debugging.

The rule is named after Italian economist Vilfredo Pareto who observed that 80% of his country’s land belonged to 20% of its population. The exact ratio of 80-20 isn’t important, though it is surprisingly common. The same principle applies whenever a large majority of effects come from a small number of causes.

The 80-20 rule, or Pareto principle, is startling the first time you hear it. It suggests you can be a lot more productive by focusing your effort where it does the most good. For example, there may be 100,000 to 1,000,000 words in English, depending on how you count them. But you could be pretty fluent in English by knowing the 1,000 most common words.

The thousand most frequently used words in any language are far more important than all the rest combined. Studying these words first makes much more sense than a uniformitarian approach, going through a dictionary in alphabetic order on the assumption that all words are equally important.

I’ve thought about the Pareto principle off and on for many years. When I bring it up for discussion, people are often defensive, bringing up the same objections every time.

Objections

The most common objection is the recursive argument. If you could be more effective by focusing on the 20% that’s most important, then you should do that again: focus on the 20% of the 20% that’s most important. Apply this argument repeatedly and you can be infinitely productive with no effort.

The recursive argument takes the “80” and “20” of the 80-20 rule too literally. The point is not the exact ratios. The point is that return on effort invested is not uniformly distributed. In fact, it’s often far from uniformly distributed. I prefer the term Pareto principle to “80-20 rule” just because it does not reference particular numbers that could distract from the general principle.

Could you apply a Pareto principle recursively to English words, say by focusing on the 200 most common words? In fact your could. But that doesn’t mean that you could keep doing this repeatedly, learning only the most common word (“the”) and declaring yourself fluent in English. This doesn’t negate the fact that the importance of English words is very unevenly distributed.

Another objection is the completionist argument. It says that everything has to be done, so the fact that you get less return on some things than others doesn’t matter. For example, the letters E, T, and A appear about 100 times as often as J, Q, and Z. That doesn’t mean you could leave J, Q, and Z off your keyboard. On the other hand, it does mean that you might design a keyboard so that E, T, and A are easier to reach than J, Q, and Z. And Samuel Morse was smart to assign his shortest codes to the most frequently used letters. [1]

A final objection is the ignorance argument: we simply don’t what the most effective 20% will be beforehand. This is a serious objection, and it should temper our optimism regarding the Pareto principle. If a salesman knew which 20% of his prospects were going to buy, he should just sell to them. But of course he doesn’t know ahead of time who those 20% will be. On the other hand, he has some idea who is likely to buy (and how much they may buy) and doesn’t approach prospects randomly.

These objections take the Pareto principle to extremes to justify disregarding it. Since you can’t repeatedly apply it indefinitely, there must be nothing to it. Or if you can’t completely eliminate the least productive work, you should treat everything equally. Or if you don’t have absolute certainty regarding what’s most important, you shouldn’t consider what’s likely to be most important.

Applications

Despite the objections above, it is true that returns on effort are often very unevenly distributed. There’s a common tendency to under estimate the variance [2]. We might have a rough idea how effective a list of possible actions would be, and maybe imagine than the most effective choice would be ten times better than the least effective choice, but in fact the ratio might be a hundred to one or even a thousand to one [3]. Somehow we mentally compress these ratios, maybe on something like a logarithmic scale.

So one key to taking advantage of the Pareto principle is simply to keep in mind that something like the Pareto principle might hold. You’re not likely to find a Pareto rule if you don’t think they exist.

Another key is to be honest with ourselves regarding how effective we want to be. Maybe the most effective thing to do is something we simply don’t want to do. If so, we can either make a principled decision to not do what we know to be more effective, or get over our sloth.

I mentioned ignorance above. “Uncertainty” is a more helpful word than “ignorance” here because we’re not often completely ignorant. We usually have some idea which actions are more likely to be effective. Data can help. Start by using whatever information or intuition you have, and update it as you gather data.

This could be a formal Bayesian process if you have quantifiable data. Or it could be as simple as just trying something. If it works, try it again. If not, try something different. You may be able to bootstrap this “play the winner” strategy until you have enough data to be more formal about making decisions.

***

[1] How well does Morse code symbol length correspond to frequency? I looked into that here.

[2] I have a friend who has helped me with this. He will suggest I do X, and I agree, but say I’d rather do Y. Then he will reply with something like “Sure, you could do that. But X could be a thousand times more effective. It’s up to you.” I’ve done the same for others. It’s easier to see someone else’s decisions objectively than your own.

[3] This is not an exaggeration. I’ve seen this, for example, in software optimization. Some changes might make 1,000x more of a difference than others.