Software quality: better in practice than in theory

Sir Tony Hoare

C. A. R. Hoare wrote an article How Did Software Get So Reliable Without Proof? in 1996 that still sounds contemporary for the most part.

In the 1980’s many believed that programs could not get much bigger unless we started using formal proof methods. The argument was that bugs are fairly common, and that each bug has the potential to bring a system down. Therefore the only way to build much larger systems was to rely on formal methods to catch bugs. And yet programs continued to get larger and formal methods never caught on. Hoare asks

Why have twenty years of pessimistic predictions been falsified?

Another twenty years later we can ask the same question. Systems have gotten far larger, and formal methods have not become common. Formal methods are used—more on that shortly—but have not become common.

Better in practice than in theory

It’s interesting that Hoare was the one to write this paper. He is best known for the quicksort, a sorting algorithm that works better in practice than in theory! Quicksort is commonly used in practice, even though has terrible worst-case efficiency, because its average efficiency has optimal asymptotic order [1], and in practice it works better than other algorithms with the same asymptotic order.

Economic considerations

It is logically possible that the smallest bug could bring down a system. And there have been examples, such as the Mars Climate Orbiter, where a single bug did in fact lead to complete failure. But this is rare. Most bugs are inconsequential.

Some will object “How can you be so blasé about bugs? A bug crashed a $300 million probe!” But what is the realistic alternative? Would spending an additional billion dollars on formal software verification have prevented the crash? Possibly, though not certainly, and the same money could send three more missions to Mars. (More along these lines here.)

It’s all a matter of economics. Formal verification is extremely tedious and expensive. The expense is worth it in some settings and not in others. The software that runs pacemakers is more critical than the software that runs a video game. For most software development, less formal methods have proved more cost effective at achieving acceptable quality: code reviews, unit tests, integration testing, etc.

Formal verification

I have some experience with formal software verification, including formal methods software used by NASA. When someone says that software has been formally verified, there’s an implicit disclaimer. It’s usually the algorithms have been formally verified, not the implementation of those algorithms in software. Also, maybe not all the algorithms have been verified, but say 90%, the remaining 10% being too difficult to verify. In any case, formally verified software can and has failed. Formal verification greatly reduces the probability of encountering a bug, but it does not reduce the probability to zero.

There has been a small resurgence of interest in formal methods since Hoare wrote his paper. And again, it’s all about economics. Theorem proving technology has improved over the last 20 years. And software is being used in contexts where the consequences of failure are high. But for most software, the most economical way to achieve acceptable quality is not through theorem proving.

There are also degrees of formality. Full theorem proving is extraordinarily tedious. If I remember correctly, one research group said that they could formally verify about one page of a mathematics textbook per man-week. But there’s a continuum between full formality and no formality. For example, you could have formal assurance that your software satisfies certain conditions, even if you can’t formally prove that the software is completely correct. Where you want to be along this continuum of formality is again a matter of economics. It depends on the probability and consequences of errors, and the cost of reducing these probabilities.

Related posts

[1] The worst-case performance of quicksort is O(n²) but the average performance is O(n log n).

Photograph of C. A. R. Hoare by Rama, Wikimedia Commons, Cc-by-sa-2.0-fr, CC BY-SA 2.0 fr

Does computer science help you program?

The relationship between programming and computer science is hard to describe. Purists will say that computer science has nothing to do with programming, but that goes too far.

Computer science is about more than programming, but it’s is all motivated by getting computers to do things. With few exceptions. students major in computer science in college with the intention of becoming programmers.

I asked on Twitter yesterday how helpful people found computer science in writing software.

In a follow up tweet I said “For this poll, basic CS would be data structures and analysis of algorithms. Advanced CS is anything after that.”

So about a quarter didn’t find computer science useful, but the rest either expected it to be useful or at least found the basic theory useful.

I suspect some of those who said they haven’t found (advanced) CS theory useful don’t know (advanced) CS theory. This isn’t a knock on them. It’s only the observation that you can’t use what you aren’t at least aware of. In fact, you may need to know something quite well before you can recognize an opportunity to use it. (More on that here.)

Many programmers are in jobs where they don’t have much need for computer science theory. I thought about making that a possible choice, something like “No, but I wish I were in a job that could use more theory.” Unfortunately Twitter survey responses have to be fairly short.

Of course this isn’t a scientific survey. (Even supposedly scientific surveys aren’t that great.) People who follow the CompSciFact twitter account have an interest in computer science. Maybe people who had strong feelings about CS, such as resentment for having to study something they didn’t want to or excitement for being able to use something they find interesting, were more likely to answer the question.

 

Proving life exists on Earth

NASA’s Galileo mission was primarily designed to explore Jupiter and its moons. In 1989, the Galileo probe started out traveling away from Jupiter in order to do a gravity assist swing around Venus. About a year later it also did a gravity assist maneuver around Earth. Carl Sagan suggested that when passing Earth, the Galileo probe should turn its sensors on Earth to look for signs of life. [1]

Artist conception of Galileo probe surveying Jupiter and its moons

Now obviously we know there’s life on Earth. But if we’re going look for life on other planets, it’s reasonable to ask that our methods return positive results when examining the one planet we know for sure does host life. So scientists looked at the data from Galileo as if it were coming from another planet to see what patterns in the data might indicate life.

I’ve started using looking for life on Earth as a metaphor. I’m working on a project right now where I’m looking for a needle in a haystack, or rather another needle in a haystack: I knew that one needle existed before I got started. So I want to make sure that my search procedure at least finds the one positive result I already know exists. I explained to a colleague that we need to make sure we can at least find life on Earth.

This reminds me of simulation work. You make up a scenario and treat it as the state of nature. Then pretend you don’t know that state, and see how successful your method is at discovering that state. It’s sort of a schizophrenic way of thinking, pretending that half of your brain doesn’t know what the other half is doing.

It also reminds me of software testing. The most trivial tests can be surprisingly effective at finding bugs. So you write a few tests to confirm that there’s life on Earth.

Related posts

[1] I found out about Galileo’s Earth reconnaissance listening to the latest episode of the .NET Rocks! podcast.

Iterating between theory and code

Yesterday I said on Twitter “Time to see whether practice agrees with theory, moving from LaTeX to Python. Wish me luck.” I got a lot of responses to that because it describes the experience of a lot of people. Someone asked if I’d blog about this. The content is confidential, but I’ll talk about the process. It’s a common pattern.

I’m writing code based on a theoretical result in a journal article.

First the program gets absurd results. Theory pinpoints a bug in the code.

Then the code confirms my suspicions of an error in the paper.

The code also uncovers, not an error per se, but an important missing detail in the paper.

Then code and theory differ by about 1%. Uncertain whether this is theoretical approximation error or a subtle bug.

Then try the code for different arguments, ones which theory predicts will have less approximation error, and now code and theory differ by 0.1%.

Then I have more confidence in (my understanding of) the theory and in my code.

The most disliked programming language

According to this post from Stack Overflow, Perl is the most disliked programming language.

I have fond memories of writing Perl, though it’s been a long time since I used it. I mostly wrote scripts for file munging, the task it does best, and never had to maintain someone else’s Perl code. Under different circumstances I probably would have had less favorable memories.

Perl is a very large, expressive language. That’s a positive if you’re working alone but a negative if working with others. Individuals can carve out their favorite subsets of Perl and ignore the rest, but two people may carve out different subsets. You may personally avoid some feature, but you have to learn it anyway if your colleague uses it. Also, in a large language there’s greater chance that you’ll accidentally use a feature you didn’t intend to. For example, in Perl you might use an array in a scalar context. This works, but not as you’d expect if you didn’t intend to do it.

I suspect that people who like large languages like C++ and Common Lisp are more inclined to like Perl, while people who prefer small languages like C and Scheme have opposite inclinations.

Related posts:

Programming language life expectancy

The Lindy effect says that what’s been around the longest is likely to remain around the longest. It applies to creative artifacts, not living things. A puppy is likely to live longer than an elderly dog, but a book that has been in press for a century is likely to be in press for another century.

In a previous post I go into the mathematical detail of the Lindy effect: power law distributions etc. The key fact we need for this blog post is that if something has the kind of survival distribution described by the Lindy effect, then the expected future lifetime equals the current age. For example, the 100 year old book in the opening paragraph is expected to be around for another 100 years.

Note that this is all framed in terms of probability distributions. It’s not saying, for example, that everything new will go away soon. Everything was once new. Someone watching Hamlet on opening night would be right to speculate that nobody would care about Hamlet in a few years. But now that we know Hamlet has been around for four centuries and is going strong, the Lindy effect would predict that people will be performing Hamlet in the 25th century.

Note that Lindy takes nothing into account except survival to date. Someone might have been more bullish on Hamlet after opening night based on other information such as how well the play was received, but that’s beyond the scope of the Lindy effect.

If we apply the Lindy effect to programming languages, we only consider how long they’ve been around and whether they’re thriving today. You might think that Go, for example, will be along for a long time based on Google’s enormous influence, but the Lindy effect does not take such information into account.

So, if we assume the Lindy effect holds, here are the expected years when programming languages would die. (How exactly would you define the time of death for a programming language? Doesn’t really matter. This isn’t that precise or that serious.)

LanguageBornExpected death
Go20092025
C#20002034
Java19952039
Python19912043
Haskell19902044
C19722062
Lisp19592075
Fortran19572077

You can debate what it even means for a language to survive. For example, I’d consider Lisp to be alive and well if in the future people are programming Clojure but not Common Lisp, for example, but others might disagree.

“We don’t know what language engineers will be coding in in the year 2100. However, we do know that it will be called FORTRAN.” — C.A.R. Hoare

Building software the right way

Yesterday a friend told me about a software project whose owners said “We’re going to do this the right way.” I told him I have two opposite reactions when I hear that:

  • Ooh, that sounds like fun!
  • Run away!

I’ve been on several projects where the sponsors have identified some aspect of the status quo that clearly needs improving. Working on a project that realizes these problems and is willing to address them sounds like fun. But then the project runs to the opposite extreme and creates something worse.

For example, most software development is sloppy. So a project reacts against this and becomes so formal that nobody can get any work done. In those settings I like to say “Hold off on buying a tux. Just start by tucking your shirt tail in. Maybe that’s as formal as you need to be.”

I’d be more optimistic about a project with more modest goals, say one that wants to move 50% of the way toward an ideal, rather than wanting to jump from one end of a spectrum to the other. Or even better, a project that has identified a direction they want to move in, and thinks in terms of experimenting to find their optimal position along that direction.

Unnatural language processing

Japanese Russian dictionary

Larry Wall, creator of the Perl programming language, created a custom degree plan in college, an interdisciplinary course of study in natural and artificial languages, i.e. linguistics and programming languages. Many of the features of Perl were designed as an attempt to apply natural language principles to the design of an artificial language.

I’ve been thinking of a different connection between natural and artificial languages, namely using natural language processing (NLP) to reverse engineer source code.

The source code of computer program is text, but not a text. That is, it consists of plain text files, but it’s not a text in the sense that Paradise Lost or an email is a text. The most efficient way to parse a programming language is as a programming language. Treating it as an English text will loose vital structure, and wrongly try to impose a foreign structure.

But what if you have two computer programs? That’s the problem I’ve been thinking about. I have code in two very different programming languages, and I’d like to know how functions in one code base relate to those in the other. The connections are not ones that a compiler could find. The connections are more psychological than algorithmic. I’d like to reverse engineer, for example, which function in language A a developer had in mind when he wrote a function in language B.

Both code bases are in programming language, but the function names are approximately natural language. If a pair of functions have the same name in both languages, and that name is not generic, then there’s a good chance they’re related. And if the names are similar, maybe they’re related.

I’ve done this sort of thing informally forever. I imagine most programmers do something like this from time to time. But only recently have I needed to do this on such a large scale that proceeding informally was not an option. I wrote a script to automate some of the work by looking for fuzzy matches between function names in both languages. This was far from perfect, but it reduced the amount of sleuthing necessary to line up the two sets of source code.

Around a year ago I had to infer which parts of an old Fortran program corresponded to different functions in a Python program. I also had to infer how some poorly written articles mapped to either set of source code. I did all this informally, but I wonder now whether NLP might have sped up my detective work.

Another situation where natural language processing could be helpful in software engineering is determining code authorship. Again this is something most programmers have probably done informally, saying things like “I bet Bill wrote this part of the code because it looks like his style” or “Looks like Pat left her fingerprints here.” This could be formalized using NLP techniques, and I imagine it has been. Just as Frederick Mosteller and colleagues did a statistical analysis of The Federalist Papers to determine who wrote which paper, I’m sure there have been similar analyses to try to find out who wrote what code, say for legal reasons.

Maybe this already has a name, but I like “unnatural language processing” for the application of natural language processing to unnatural (i.e. programming) languages. I’ve done a lot of ad hoc unnatural language processing, and I’m curious how much of it I could automate in the future.

Branch cuts and Common Lisp

“Nearly everything is really interesting if you go into it deeply enough.” — Richard Feynman

If you thumb through Guy Steele’s book Common Lisp: The Language, 2nd Edition, you might be surprised how much space is devoted to defining familiar functions: square root, log, arcsine, etc. He gives some history of how these functions were first defined in Lisp and then refined by the ANSI (X3JI3) standardization committee in 1989.

There are three sources of complexity:

  1. Complex number arguments
  2. Multi-valued functions
  3. +0 and -0

Complex arguments

The functions under discussion are defined for either real or complex inputs. This does not complicate things much in itself. Defining some functions for complex arguments, such as the exp function, is simple. The complication comes from the interaction of complex arguments with multi-valued functions and floating point representations of zero.

Multi-valued functions

The tricky functions to define are inverse functions, functions where we have to make a choice of range.

Real multi-valued functions

Let’s restrict our attention to real numbers for a moment. How do you define the square root of a positive number x? There are two solutions to the equation y2 = x, and √x is defined to be the positive solution.

What about the arcsine of x? This is the number whose sine is x. Except there is no “the” number. There are infinitely many numbers whose sine is x, so we have to make a choice. It seems natural to chose values in an interval symmetric about 0, so we take arcsine of x to be the number between -π/2 and π/2 whose sine is x.

Now what about arctangent? As with arcsine, we have to make a choice because for any x there are infinitely many numbers y whose tangent is x. Again it’s convenient to define the range to be in an interval symmetric about zero, so we define the arctangent of x to be the number y between -π/2 and π/2 whose tangent is x. But now we have a subtle complication with tangent we didn’t have with sine because tangent is unbounded. How do we want to define the tangent of a vertical angle? Should we call it ∞ or -∞? What do we want to return if someone asks for the arctangent of ±∞? Should we return π/2 or -π/2?

Complex multi-valued functions

The discussion shows there are some minor complications in defining inverse functions on the real line. Things get more complicated when working in the complex plane. To take the square root example, it’s easy to say we’re going to define square root so that the square root of a positive number is another positive number. Fine. But which solution to z2 = w should we take to be the square root of a complex number w, such as 2 + 3i or -5 + 17i?

Or consider logarithms. For positive numbers x there is only one real number y such at exp(y) = x. But what if we take a negative value of x such as -1? There’s no real number whose exponential is -1, but there is a complex number. In fact, there are infinitely many complex numbers whose exponential is -1. Which one should we choose?

Floating point representations of zero

A little known feature of floating point arithmetic (specified by the IEEE 754 standard) is that there are two kinds of zero: +0 and -0. This sounds bizarre at first, but there are good reasons for this, which I explain here. But defining functions to work properly with  two kinds of zero takes a lot of work. This was the main reason the ANSI Common Lisp committee had to revise their definitions of several transcendental functions. If a function has a branch cut discontinuity along the real axis, for example, you want your definition to be continuous as you approach x + 0i from above and as you approach x -0i from below.

The Common Lisp solution

I’ll cut to the chase and present the solution the X3J13 came up with. For a discussion of the changes this required and the detailed justifications, see Guy Steele’s book.

The first step is to carefully define the two-argument arctangent function (atan y x) for all 16 combinations of y and x being positive, negative, +0, or -0. Then other functions are defined as follows.

  1. Define phase in terms of atan.
  2. Define complex abs in terms of real sqrt.
  3. Define complex log in terms of phase, complex abs, and real log.
  4. Define complex sqrt in terms of complex log.
  5. Define everything else in terms of the functions above.

The actual implementations may not follow this sequence, but they have to produce results consistent with this sequence.

The phase of z is defined as the arctangent with arguments the imaginary and real parts of z.

The complex log of z is defined as log |z| + i phase(z).

Square root of z is defined as exp( log(z) / 2 ).

The inverses of circular and hyperbolic functions are defined as follows.

\arcsin z &=& -i \log \left( iz + \sqrt{1 - z^2} \right) \\ \arccos z &=& -i \log \left( z + i \sqrt{1 - z^2} \right) \\ \mbox{arcsinh} z &=& \log \left( z + \sqrt{1 + z^2} \right) \\ \mbox{arccosh} z &=& 2 \log\left( \sqrt{(z+1)/2} + \sqrt{(z-1)/2} \right) \\ \mbox{arctanh} z &=& \log \left( (1+z) \sqrt{1/(1 - z^2)}\right)

Note that there are many ways to define these functions that seem to be equivalent, and are equivalent in some region. Getting the branch cuts right is what makes this complicated.

Click to find out more about consulting for numerical computing

 

Technological allegiances

I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from one to the other. In the words of Stephen Sondheim, “Is it always or, and never and?”

Rivalries seem sillier to outsiders the more similar the two options are. And yet this makes sense. I’ve forgotten the psychological term for this, but it has a name: Similar things compete for space in your brain more than things that are dissimilar. For example, studying French can make it harder to spell English words. (Does literature have two t’s in French and one in English or is it the other way around?) But studying Chinese doesn’t impair English orthography.

It’s been said that academic politics are so vicious because the stakes are so small [1]. Similarly, there are fierce technological loyalties because the differences with competing technologies are so small, small enough to cause confusion. My favorite example: I can’t keep straight which languages use else if, elif, elseif, … in branching.

If you have to learn two similar technologies, it may be easier to devote yourself exclusively to one, then to the other, then use both and learn to keep them straight.

Related post: Ford-Chevy arguments in technology

[1] I first heard this attributed to Henry Kissinger, but there’s no agreement on who first said it. Several people have said similar things.

 

Efficiency of C# on Linux

This week I attended Mads Torgersen’s talk Why you should take another look at C#. Afterward I asked him about the efficiency of C# on Linux. When I last looked into it, it wasn’t good. A few years ago I asked someone on my team to try running some C# software on Linux using Mono. The code worked correctly, but it ran several times slower than on Windows.

Mads said that now with .NET Core, C# code runs about as fast on Linux as Windows. Maybe a few percent slower on Linux. Scott Hanselman joined the conversation and explained that with .NET Core, the same code runs on every platform. The Mono project duplicated the much of the functionality of the .NET framework, but it was an entirely independent implementation and not as efficient.

I had assumed the difference was due to compiler optimizations or lack thereof, but Scott and Mads said that the difference was mostly the code implementation. There are some compiler optimizations that are better on the Windows side, and so C# might run a little faster on Windows, but the performance is essentially the same on both platforms.

I could recommend C# to a client running Linux if there’s a 5% performance penalty, but a 500% performance penalty was a show-stopper. Now I’d consider using C# on a project where I need more performance than Python or R, but wanted to use something easier to work with than C++.

Years ago I developed software with the Microsoft stack, but I moved away from Microsoft tools when I quit doing the kind of software development the tools are geared for. So I don’t write C# much any more. It’s been about a year since I had a project where I needed to write C# code. But I like the C# language. You can tell that a lot of thought has gone into the design, and the tool support is great. Now that the performance is better on Linux I’d consider using it for numerical simulations.

Click to find out more about consulting for numerical computing

Humble Lisp programmers

Maybe from the headline you were expecting a blank post? No, that’s not where I’m going.

Yesterday I was on Amazon.com and noticed that nearly all the books they recommended for me were either about Lisp or mountain climbing. I thought this was odd, and mentioned it on Twitter. Carl Vogel had a witty reply: “I guess they weren’t sure whether you want to figuratively or literally look down on everyone.”

The stereotype Lisp programmer does look down on everyone. But this is based on a tiny, and perhaps unrepresentative, sample of people writing about Lisp compared to the much larger number of people who are writing in Lisp.

Lisp has been around for over 50 years and shows no signs of going away. There are a lot of people writing Lisp in obscurity. Kate Gregory said something similar about C++ developers, calling them the dark matter of programmers because there are lot of them but they don’t make much of a splash. They’re quietly doing their job, not speaking at conferences or writing much about their language.

I imagine there are a lot of humble Lisp programmers. It takes some humility to commit to an older technology, especially given the pervasive neomania of the programmer community. It also takes some humility to work on projects that have been around for years or that are deep within the infrastructure of more visible projects, which is where I expect a lot of Lisp lives.

You can do very clever things in Lisp, but you don’t have to. As Ed Post famously said, “The determined Real Programmer can write FORTRAN programs in any language.” There must be a lot of code out there that writes (f x) instead of f(x) but otherwise isn’t that different from FORTRAN.

Formal methods let you explore the corners

I heard someone say the other day that the advantage of formal software validation methods is that they let you explore the corners, cases where intuition doesn’t naturally take you.

This made me think of corners in the geometric sense. If you have a sphere in a box in high dimensions, nearly all the volume is in the corners, i.e. outside the sphere. This is more than a metaphor. You can think of software options geometrically, with each independent choice corresponding to a dimension. Paths through a piece of software that are individually rare may account for nearly all use when considered together.

With a circle inside a square, nearly 78.5% of the area is inside the circle. With a ball sitting inside a 3-D box, 52.4% of the volume is inside the ball. As the dimension increases, the proportion of volume inside the sphere rapidly decreases. For a 10-dimensional sphere sitting in a 10-dimensional box, 0.25% of the volume is in the sphere. Said another way, 99.75% of the volume is in the corners.

When you go up to 100 dimensions, the proportion of volume inside the sphere is about 2 parts in 1070, a 1 followed by 70 zeros [1]. If 100 dimensions sounds like pure fantasy, think about a piece of software with more than 100 features. Those feature combinations multiply like geometric dimensions [2].

Here’s a little Python code you could use to see how much volume is in a sphere as a function of dimension.

    from scipy.special import gamma
    from math import pi

    def unit_sphere_volume(n): 
        return pi**(0.5*n)/gamma(0.5*n + 1)

    def unit_cube_volume(n): 
        return 2**n

    def ratio(n):
        return unit_sphere_volume(n) / unit_cube_volume(n)

    print( [ratio(n) for n in range(1, 20)] )

* * *

[1] There are names for such extremely large numbers. These names are hardly ever used—scientific notation is much more practical— but they’re fun to say. 1070 is ten duovigintillion in American nomenclature, ten undecilliard in European.

[2] Geometric dimensions are perfectly independent, but software feature combinations are not. In terms of logic, some combinations may not be possible. Or in terms of probability, the probability of exploring some paths is conditional on the probability of exploring other paths. Even so, there are inconceivably many paths through any large software system. And in large-scale operations, events that should “never happen” happen regularly.