Improving on the Unix shell

Yesterday I ran across Askar Safin’s blog post The Collapse of the UNIX Philosophy. Two quotes from the post stood out. One was from Rob Pike about the Unix ideal of little tools that each do one job:

Those days are dead and gone and the eulogy was delivered by Perl.

The other was a line from James Hague:

… if you romanticize Unix, if you view it as a thing of perfection, then you lose your ability to imagine better alternatives and become blind to potentially dramatic shifts in thinking.

This brings up something I’ve long wondered about: What did the Unix shell get right that has made it so hard to improve on? It has some truly awful quirks, and yet people keep coming back to it. Alternatives that seem more rational don’t work so well in practice. Maybe it’s just inertia, but I don’t think so. There are other technologies from the 1970’s that had inertia behind them but have been replaced. The Unix shell got something so right that it makes it worth tolerating the flaws. Maybe some of the flaws aren’t even flaws but features that serve some purpose that isn’t obvious.

(By the way, when I say “the Unix shell” I have in mind similar environments as well, such as the Windows command line.)

On a related note, I’ve wondered why programming languages and shells work so differently. We want different things from a programming language and from a shell or REPL. Attempts to bring a programming language and shell closer together sound great, but they inevitably run into obstacles. At some point, we have different expectations of languages and shells and don’t want the two to be too similar.

Anthony Scopatz and I discussed this in an interview a while back in the context of xonsh, “a Python-powered, cross-platform, Unix-gazing shell language and command prompt.” While writing this post I went back to reread Anthony’s comments and appreciate them more now than I did then.

Maybe the Unix shell is near a local optimum. It’s hard to make much improvement without making big changes. As Anthony said, “you quickly end up where many traditional computer science people are not willing to go.”

Related postWhat’s your backplane?

Big data and the law

computer law

Excerpt from the new book Big Data of Complex Networks:

Big Data and data protection law provide for a number of mutual conflicts: from the perspective of Big Data analytics, a strict application of data protection law as we know it today would set an immediate end to most Big Data applications. From the perspective of the law, Big Data is either a big threat … or a major challenge for international and national lawmakers to adopt today’s data protection laws to the latest technological and economic developments.

Emphasis added.

The author of the chapter on legal matters is Swiss and writes primarily in a European context, though all countries face similar problems.

I’m not a lawyer, though I sometimes work with lawyers as a technical expert, and sometimes help companies with the statistical aspects of HIPAA law. But as a layman the observation above sounds reasonable to me, that strict application of the law could bring many applications to a halt, for better and for worse.

In my opinion the regulations around HIPAA and de-identification are mostly reasonable. The things it prohibits mostly should be prohibited. And it has a common sense provision in the form of expert determination. If your data uses fall outside the regulation’s specific recommendations but don’t endanger privacy, you can have an expert can certify that this is the case.

Related:

Branch cuts and Common Lisp

“Nearly everything is really interesting if you go into it deeply enough.” — Richard Feynman

If you thumb through Guy Steele’s book Common Lisp: The Language, 2nd Edition, you might be surprised how much space is devoted to defining familiar functions: square root, log, arcsine, etc. He gives some history of how these functions were first defined in Lisp and then refined by the ANSI (X3JI3) standardization committee in 1989.

There are three sources of complexity:

  1. Complex number arguments
  2. Multi-valued functions
  3. +0 and -0

Complex arguments

The functions under discussion are defined for either real or complex inputs. This does not complicate things much in itself. Defining some functions for complex arguments, such as the exp function, is simple. The complication comes from the interaction of complex arguments with multi-valued functions and floating point representations of zero.

Multi-valued functions

The tricky functions to define are inverse functions, functions where we have to make a choice of range.

Real multi-valued functions

Let’s restrict our attention to real numbers for a moment. How do you define the square root of a positive number x? There are two solutions to the equation y2 = x, and √x is defined to be the positive solution.

What about the arcsine of x? This is the number whose sine is x. Except there is no “the” number. There are infinitely many numbers whose sine is x, so we have to make a choice. It seems natural to chose values in an interval symmetric about 0, so we take arcsine of x to be the number between -π/2 and π/2 whose sine is x.

Now what about arctangent? As with arcsine, we have to make a choice because for any x there are infinitely many numbers y whose tangent is x. Again it’s convenient to define the range to be in an interval symmetric about zero, so we define the arctangent of x to be the number y between -π/2 and π/2 whose tangent is x. But now we have a subtle complication with tangent we didn’t have with sine because tangent is unbounded. How do we want to define the tangent of a vertical angle? Should we call it ∞ or -∞? What do we want to return if someone asks for the arctangent of ±∞? Should we return π/2 or -π/2?

Complex multi-valued functions

The discussion shows there are some minor complications in defining inverse functions on the real line. Things get more complicated when working in the complex plane. To take the square root example, it’s easy to say we’re going to define square root so that the square root of a positive number is another positive number. Fine. But which solution to z2 = w should we take to be the square root of a complex number w, such as 2 + 3i or -5 + 17i?

Or consider logarithms. For positive numbers x there is only one real number y such at exp(y) = x. But what if we take a negative value of x such as -1? There’s no real number whose exponential is -1, but there is a complex number. In fact, there are infinitely many complex numbers whose exponential is -1. Which one should we choose?

Floating point representations of zero

A little known feature of floating point arithmetic (specified by the IEEE 754 standard) is that there are two kinds of zero: +0 and -0. This sounds bizarre at first, but there are good reasons for this, which I explain here. But defining functions to work properly with  two kinds of zero takes a lot of work. This was the main reason the ANSI Common Lisp committee had to revise their definitions of several transcendental functions. If a function has a branch cut discontinuity along the real axis, for example, you want your definition to be continuous as you approach x + 0i from above and as you approach x -0i from below.

The Common Lisp solution

I’ll cut to the chase and present the solution the X3J13 came up with. For a discussion of the changes this required and the detailed justifications, see Guy Steele’s book.

The first step is to carefully define the two-argument arctangent function (atan y x) for all 16 combinations of y and x being positive, negative, +0, or -0. Then other functions are defined as follows.

  1. Define phase in terms of atan.
  2. Define complex abs in terms of real sqrt.
  3. Define complex log in terms of phase, complex abs, and real log.
  4. Define complex sqrt in terms of complex log.
  5. Define everything else in terms of the functions above.

The actual implementations may not follow this sequence, but they have to produce results consistent with this sequence.

The phase of z is defined as the arctangent with arguments the imaginary and real parts of z.

The complex log of z is defined as log |z| + i phase(z).

Square root of z is defined as exp( log(z) / 2 ).

The inverses of circular and hyperbolic functions are defined as follows.

\begin{align*} \arcsin (z) &= -i \log \left( iz + \sqrt{1 - z^2} \right) \\ \arccos (z) &= -i \log \left( z + i \sqrt{1 - z^2} \right) \\ \text{arcsinh} (z) &= \log \left( z + \sqrt{1 + z^2} \right) \\ \text{arccosh} (z) &= 2 \log\left( \sqrt{(z+1)/2} + \sqrt{(z-1)/2} \right) \\ \text{arctanh} (z) &= \log \left( (1+z) \sqrt{1/(1 - z^2)}\right) \end{align*}

Note that there are many ways to define these functions that seem to be equivalent, and are equivalent in some region. Getting the branch cuts right is what makes this complicated.

Ultra-reliable software

From a NASA page advocating formal methods:

We are very good at building complex software systems that work 95% of the time. But we do not know how to build complex software systems that are ultra-reliably safe (i.e. P_f < 10^-7/hour).

Emphasis added.

Developing medium-reliability and high-reliability software are almost entirely different professions. Using typical software development procedures on systems that must be ultra-reliable would invite disaster. But using extremely cautious development methods on systems that can afford to fail relatively often would be an economic disaster.

Related post: Formal validation methods let you explore the corners

 

Technological allegiances

I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from one to the other. In the words of Stephen Sondheim, “Is it always or, and never and?”

Rivalries seem sillier to outsiders the more similar the two options are. And yet this makes sense. I’ve forgotten the psychological term for this, but it has a name: Similar things compete for space in your brain more than things that are dissimilar. For example, studying French can make it harder to spell English words. (Does literature have two t’s in French and one in English or is it the other way around?) But studying Chinese doesn’t impair English orthography.

It’s been said that academic politics are so vicious because the stakes are so small [1]. Similarly, there are fierce technological loyalties because the differences with competing technologies are so small, small enough to cause confusion. My favorite example: I can’t keep straight which languages use else if, elif, elseif, … in branching.

If you have to learn two similar technologies, it may be easier to devote yourself exclusively to one, then to the other, then use both and learn to keep them straight.

Related post: Ford-Chevy arguments in technology

[1] I first heard this attributed to Henry Kissinger, but there’s no agreement on who first said it. Several people have said similar things.

 

Gentle introduction to R

The R language is closely tied to statistics. It’s ancestor was named S, because it was a language for Statistics. The open source descendant could have been named ‘T’, but its creators chose to call it’R.’

Most people learn R as they learn statistics: Here’s a statistical concept, and here’s how you can compute it in R. Statisticians aren’t that interested in the R language itself but see it as connective tissue between commands that are their primary interest.

This works for statisticians, but it makes the language hard for non-statisticians to approach. Years ago I managed a group of programmers who supported statisticians. At the time, there were no books for learning R without concurrently learning statistics. This created quite a barrier to entry for programmers whose immediate concern was not the statistical content of an R program.

Now there are more books on R, and some are more approachable to non-statisticians. The most accessible one I’ve seen so far is Learning Base R by Lawrence Leemis. It gets into statistical applications of R—that is ultimately why anyone is interested in R—but it doesn’t start there. The first 40% or so of the book is devoted to basic language features, things you’re supposed to pick up by osmosis from a book focused more on statistics than on R per se. This is the book I wish I could have handed my programmers who had to pick up R.

 

Efficiency of C# on Linux

This week I attended Mads Torgersen’s talk Why you should take another look at C#. Afterward I asked him about the efficiency of C# on Linux. When I last looked into it, it wasn’t good. A few years ago I asked someone on my team to try running some C# software on Linux using Mono. The code worked correctly, but it ran several times slower than on Windows.

Mads said that now with .NET Core, C# code runs about as fast on Linux as Windows. Maybe a few percent slower on Linux. Scott Hanselman joined the conversation and explained that with .NET Core, the same code runs on every platform. The Mono project duplicated the much of the functionality of the .NET framework, but it was an entirely independent implementation and not as efficient.

I had assumed the difference was due to compiler optimizations or lack thereof, but Scott and Mads said that the difference was mostly the code implementation. There are some compiler optimizations that are better on the Windows side, and so C# might run a little faster on Windows, but the performance is essentially the same on both platforms.

I could recommend C# to a client running Linux if there’s a 5% performance penalty, but a 500% performance penalty was a show-stopper. Now I’d consider using C# on a project where I need more performance than Python or R, but wanted to use something easier to work with than C++.

Years ago I developed software with the Microsoft stack, but I moved away from Microsoft tools when I quit doing the kind of software development the tools are geared for. So I don’t write C# much any more. It’s been about a year since I had a project where I needed to write C# code. But I like the C# language. You can tell that a lot of thought has gone into the design, and the tool support is great. Now that the performance is better on Linux I’d consider using it for numerical simulations.

Humble Lisp programmers

Maybe from the headline you were expecting a blank post? No, that’s not where I’m going.

Yesterday I was on Amazon.com and noticed that nearly all the books they recommended for me were either about Lisp or mountain climbing. I thought this was odd, and mentioned it on Twitter. Carl Vogel had a witty reply: “I guess they weren’t sure whether you want to figuratively or literally look down on everyone.”

The stereotype Lisp programmer does look down on everyone. But this is based on a tiny, and perhaps unrepresentative, sample of people writing about Lisp compared to the much larger number of people who are writing in Lisp.

Lisp has been around for over 50 years and shows no signs of going away. There are a lot of people writing Lisp in obscurity. Kate Gregory said something similar about C++ developers, calling them the dark matter of programmers because there are lot of them but they don’t make much of a splash. They’re quietly doing their job, not speaking at conferences or writing much about their language.

I imagine there are a lot of humble Lisp programmers. It takes some humility to commit to an older technology, especially given the pervasive neomania of the programmer community. It also takes some humility to work on projects that have been around for years or that are deep within the infrastructure of more visible projects, which is where I expect a lot of Lisp lives.

You can do very clever things in Lisp, but you don’t have to. As Ed Post famously said, “The determined Real Programmer can write FORTRAN programs in any language.” There must be a lot of code out there that writes (f x) instead of f(x) but otherwise isn’t that different from FORTRAN.

Kalman filters and functional programming

A few weeks ago I started a series of posts on various things you could do with a functional fold. In the first post I mentioned that the idea came from a paper by Brian Beckman on Kalman filters and folds:

This post was inspired by a paper by Brian Beckman (in progress) that shows how a Kalman filter can be implemented as a fold. From a Bayesian perspective, the thing that makes the Kalman filter work is that a certain multivariate normal model has a conjugate prior. This post shows that conjugate models more generally can be implemented as folds over the data. That’s interesting, but what does it buy you? Brian’s paper discusses this in detail, but one advantage is that it completely separates the accumulator function from the data, so the former can be tested in isolation.

At the time Brian was working on one big paper in private. This has since been split into several papers and they’re now public.

See also: Kalman filters, particle filters, and tracking