Iterating between theory and code

Yesterday I said on Twitter “Time to see whether practice agrees with theory, moving from LaTeX to Python. Wish me luck.” I got a lot of responses to that because it describes the experience of a lot of people. Someone asked if I’d blog about this. The content is confidential, but I’ll talk about the process. It’s a common pattern.

I’m writing code based on a theoretical result in a journal article.

First the program gets absurd results. Theory pinpoints a bug in the code.

Then the code confirms my suspicions of an error in the paper.

The code also uncovers, not an error per se, but an important missing detail in the paper.

Then code and theory differ by about 1%. Uncertain whether this is theoretical approximation error or a subtle bug.

Then try the code for different arguments, ones which theory predicts will have less approximation error, and now code and theory differ by 0.1%.

Then I have more confidence in (my understanding of) the theory and in my code.

The most disliked programming language

According to this post from Stack Overflow, Perl is the most disliked programming language.

I have fond memories of writing Perl, though it’s been a long time since I used it. I mostly wrote scripts for file munging, the task it does best, and never had to maintain someone else’s Perl code. Under different circumstances I probably would have had less favorable memories.

Perl is a very large, expressive language. That’s a positive if you’re working alone but a negative if working with others. Individuals can carve out their favorite subsets of Perl and ignore the rest, but two people may carve out different subsets. You may personally avoid some feature, but you have to learn it anyway if your colleague uses it. Also, in a large language there’s greater chance that you’ll accidentally use a feature you didn’t intend to. For example, in Perl you might use an array in a scalar context. This works, but not as you’d expect if you didn’t intend to do it.

I suspect that people who like large languages like C++ and Common Lisp are more inclined to like Perl, while people who prefer small languages like C and Scheme have opposite inclinations.

More Perl posts

Programming language life expectancy

The Lindy effect says that what’s been around the longest is likely to remain around the longest. It applies to creative artifacts, not living things. A puppy is likely to live longer than an elderly dog, but a book that has been in press for a century is likely to be in press for another century.

In a previous post I go into the mathematical detail of the Lindy effect: power law distributions etc. The key fact we need for this blog post is that if something has the kind of survival distribution described by the Lindy effect, then the expected future lifetime equals the current age. For example, the 100 year old book in the opening paragraph is expected to be around for another 100 years.

Note that this is all framed in terms of probability distributions. It’s not saying, for example, that everything new will go away soon. Everything was once new. Someone watching Hamlet on opening night would be right to speculate that nobody would care about Hamlet in a few years. But now that we know Hamlet has been around for four centuries and is going strong, the Lindy effect would predict that people will be performing Hamlet in the 25th century.

Note that Lindy takes nothing into account except survival to date. Someone might have been more bullish on Hamlet after opening night based on other information such as how well the play was received, but that’s beyond the scope of the Lindy effect.

If we apply the Lindy effect to programming languages, we only consider how long they’ve been around and whether they’re thriving today. You might think that Go, for example, will be along for a long time based on Google’s enormous influence, but the Lindy effect does not take such information into account.

So, if we assume the Lindy effect holds, here are the expected years when programming languages would die. (How exactly would you define the time of death for a programming language? Doesn’t really matter. This isn’t that precise or that serious.)

Language Born Expected death
Go 2009 2025
C# 2000 2034
Java 1995 2039
Python 1991 2043
Haskell 1990 2044
C 1972 2062
Lisp 1959 2075
Fortran 1957 2077

You can debate what it even means for a language to survive. For example, I’d consider Lisp to be alive and well if in the future people are programming Clojure but not Common Lisp, for example, but others might disagree.

“We don’t know what language engineers will be coding in the year 2100. However, we do know that it will be called FORTRAN.” — C.A.R. Hoare

Building software the right way

Yesterday a friend told me about a software project whose owners said “We’re going to do this the right way.” I told him I have two opposite reactions when I hear that:

  • Ooh, that sounds like fun!
  • Run away!

I’ve been on several projects where the sponsors have identified some aspect of the status quo that clearly needs improving. Working on a project that realizes these problems and is willing to address them sounds like fun. But then the project runs to the opposite extreme and creates something worse.

For example, most software development is sloppy. So a project reacts against this and becomes so formal that nobody can get any work done. In those settings I like to say “Hold off on buying a tux. Just start by tucking your shirt tail in. Maybe that’s as formal as you need to be.”

I’d be more optimistic about a project with more modest goals, say one that wants to move 50% of the way toward an ideal, rather than wanting to jump from one end of a spectrum to the other. Or even better, a project that has identified a direction they want to move in, and thinks in terms of experimenting to find their optimal position along that direction.

Unnatural language processing

Japanese Russian dictionary

Larry Wall, creator of the Perl programming language, created a custom degree plan in college, an interdisciplinary course of study in natural and artificial languages, i.e. linguistics and programming languages. Many of the features of Perl were designed as an attempt to apply natural language principles to the design of an artificial language.

I’ve been thinking of a different connection between natural and artificial languages, namely using natural language processing (NLP) to reverse engineer source code.

The source code of computer program is text, but not a text. That is, it consists of plain text files, but it’s not a text in the sense that Paradise Lost or an email is a text. The most efficient way to parse a programming language is as a programming language. Treating it as an English text will loose vital structure, and wrongly try to impose a foreign structure.

But what if you have two computer programs? That’s the problem I’ve been thinking about. I have code in two very different programming languages, and I’d like to know how functions in one code base relate to those in the other. The connections are not ones that a compiler could find. The connections are more psychological than algorithmic. I’d like to reverse engineer, for example, which function in language A a developer had in mind when he wrote a function in language B.

Both code bases are in programming language, but the function names are approximately natural language. If a pair of functions have the same name in both languages, and that name is not generic, then there’s a good chance they’re related. And if the names are similar, maybe they’re related.

I’ve done this sort of thing informally forever. I imagine most programmers do something like this from time to time. But only recently have I needed to do this on such a large scale that proceeding informally was not an option. I wrote a script to automate some of the work by looking for fuzzy matches between function names in both languages. This was far from perfect, but it reduced the amount of sleuthing necessary to line up the two sets of source code.

Around a year ago I had to infer which parts of an old Fortran program corresponded to different functions in a Python program. I also had to infer how some poorly written articles mapped to either set of source code. I did all this informally, but I wonder now whether NLP might have sped up my detective work.

Another situation where natural language processing could be helpful in software engineering is determining code authorship. Again this is something most programmers have probably done informally, saying things like “I bet Bill wrote this part of the code because it looks like his style” or “Looks like Pat left her fingerprints here.” This could be formalized using NLP techniques, and I imagine it has been. Just as Frederick Mosteller and colleagues did a statistical analysis of The Federalist Papers to determine who wrote which paper, I’m sure there have been similar analyses to try to find out who wrote what code, say for legal reasons.

Maybe this already has a name, but I like “unnatural language processing” for the application of natural language processing to unnatural (i.e. programming) languages. I’ve done a lot of ad hoc unnatural language processing, and I’m curious how much of it I could automate in the future.

Related NLP posts

Branch cuts and Common Lisp

“Nearly everything is really interesting if you go into it deeply enough.” — Richard Feynman

If you thumb through Guy Steele’s book Common Lisp: The Language, 2nd Edition, you might be surprised how much space is devoted to defining familiar functions: square root, log, arcsine, etc. He gives some history of how these functions were first defined in Lisp and then refined by the ANSI (X3JI3) standardization committee in 1989.

There are three sources of complexity:

  1. Complex number arguments
  2. Multi-valued functions
  3. +0 and -0

Complex arguments

The functions under discussion are defined for either real or complex inputs. This does not complicate things much in itself. Defining some functions for complex arguments, such as the exp function, is simple. The complication comes from the interaction of complex arguments with multi-valued functions and floating point representations of zero.

Multi-valued functions

The tricky functions to define are inverse functions, functions where we have to make a choice of range.

Real multi-valued functions

Let’s restrict our attention to real numbers for a moment. How do you define the square root of a positive number x? There are two solutions to the equation y2 = x, and √x is defined to be the positive solution.

What about the arcsine of x? This is the number whose sine is x. Except there is no “the” number. There are infinitely many numbers whose sine is x, so we have to make a choice. It seems natural to chose values in an interval symmetric about 0, so we take arcsine of x to be the number between -π/2 and π/2 whose sine is x.

Now what about arctangent? As with arcsine, we have to make a choice because for any x there are infinitely many numbers y whose tangent is x. Again it’s convenient to define the range to be in an interval symmetric about zero, so we define the arctangent of x to be the number y between -π/2 and π/2 whose tangent is x. But now we have a subtle complication with tangent we didn’t have with sine because tangent is unbounded. How do we want to define the tangent of a vertical angle? Should we call it ∞ or -∞? What do we want to return if someone asks for the arctangent of ±∞? Should we return π/2 or -π/2?

Complex multi-valued functions

The discussion shows there are some minor complications in defining inverse functions on the real line. Things get more complicated when working in the complex plane. To take the square root example, it’s easy to say we’re going to define square root so that the square root of a positive number is another positive number. Fine. But which solution to z2 = w should we take to be the square root of a complex number w, such as 2 + 3i or -5 + 17i?

Or consider logarithms. For positive numbers x there is only one real number y such at exp(y) = x. But what if we take a negative value of x such as -1? There’s no real number whose exponential is -1, but there is a complex number. In fact, there are infinitely many complex numbers whose exponential is -1. Which one should we choose?

Floating point representations of zero

A little known feature of floating point arithmetic (specified by the IEEE 754 standard) is that there are two kinds of zero: +0 and -0. This sounds bizarre at first, but there are good reasons for this, which I explain here. But defining functions to work properly with  two kinds of zero takes a lot of work. This was the main reason the ANSI Common Lisp committee had to revise their definitions of several transcendental functions. If a function has a branch cut discontinuity along the real axis, for example, you want your definition to be continuous as you approach x + 0i from above and as you approach x -0i from below.

The Common Lisp solution

I’ll cut to the chase and present the solution the X3J13 came up with. For a discussion of the changes this required and the detailed justifications, see Guy Steele’s book.

The first step is to carefully define the two-argument arctangent function (atan y x) for all 16 combinations of y and x being positive, negative, +0, or -0. Then other functions are defined as follows.

  1. Define phase in terms of atan.
  2. Define complex abs in terms of real sqrt.
  3. Define complex log in terms of phase, complex abs, and real log.
  4. Define complex sqrt in terms of complex log.
  5. Define everything else in terms of the functions above.

The actual implementations may not follow this sequence, but they have to produce results consistent with this sequence.

The phase of z is defined as the arctangent with arguments the imaginary and real parts of z.

The complex log of z is defined as log |z| + i phase(z).

Square root of z is defined as exp( log(z) / 2 ).

The inverses of circular and hyperbolic functions are defined as follows.

\begin{align*} \arcsin (z) &= -i \log \left( iz + \sqrt{1 - z^2} \right) \\ \arccos (z) &= -i \log \left( z + i \sqrt{1 - z^2} \right) \\ \text{arcsinh} (z) &= \log \left( z + \sqrt{1 + z^2} \right) \\ \text{arccosh} (z) &= 2 \log\left( \sqrt{(z+1)/2} + \sqrt{(z-1)/2} \right) \\ \text{arctanh} (z) &= \log \left( (1+z) \sqrt{1/(1 - z^2)}\right) \end{align*}

Note that there are many ways to define these functions that seem to be equivalent, and are equivalent in some region. Getting the branch cuts right is what makes this complicated.

Technological allegiances

I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from one to the other. In the words of Stephen Sondheim, “Is it always or, and never and?”

Rivalries seem sillier to outsiders the more similar the two options are. And yet this makes sense. I’ve forgotten the psychological term for this, but it has a name: Similar things compete for space in your brain more than things that are dissimilar. For example, studying French can make it harder to spell English words. (Does literature have two t’s in French and one in English or is it the other way around?) But studying Chinese doesn’t impair English orthography.

It’s been said that academic politics are so vicious because the stakes are so small [1]. Similarly, there are fierce technological loyalties because the differences with competing technologies are so small, small enough to cause confusion. My favorite example: I can’t keep straight which languages use else if, elif, elseif, … in branching.

If you have to learn two similar technologies, it may be easier to devote yourself exclusively to one, then to the other, then use both and learn to keep them straight.

Related post: Ford-Chevy arguments in technology

[1] I first heard this attributed to Henry Kissinger, but there’s no agreement on who first said it. Several people have said similar things.

 

Efficiency of C# on Linux

This week I attended Mads Torgersen’s talk Why you should take another look at C#. Afterward I asked him about the efficiency of C# on Linux. When I last looked into it, it wasn’t good. A few years ago I asked someone on my team to try running some C# software on Linux using Mono. The code worked correctly, but it ran several times slower than on Windows.

Mads said that now with .NET Core, C# code runs about as fast on Linux as Windows. Maybe a few percent slower on Linux. Scott Hanselman joined the conversation and explained that with .NET Core, the same code runs on every platform. The Mono project duplicated the much of the functionality of the .NET framework, but it was an entirely independent implementation and not as efficient.

I had assumed the difference was due to compiler optimizations or lack thereof, but Scott and Mads said that the difference was mostly the code implementation. There are some compiler optimizations that are better on the Windows side, and so C# might run a little faster on Windows, but the performance is essentially the same on both platforms.

I could recommend C# to a client running Linux if there’s a 5% performance penalty, but a 500% performance penalty was a show-stopper. Now I’d consider using C# on a project where I need more performance than Python or R, but wanted to use something easier to work with than C++.

Years ago I developed software with the Microsoft stack, but I moved away from Microsoft tools when I quit doing the kind of software development the tools are geared for. So I don’t write C# much any more. It’s been about a year since I had a project where I needed to write C# code. But I like the C# language. You can tell that a lot of thought has gone into the design, and the tool support is great. Now that the performance is better on Linux I’d consider using it for numerical simulations.