Software that gets used

I’ve been looking back at software projects that I either developed or managed. I thought about which projects produced software that is actively used and which didn’t. Here’s what the popular projects had in common. The software

  1. was developed to address existing needs, not speculation of future needs;
  2. solved a general problem; and
  3. was simple, often controversially simple.

The software used most often is a numerical library. It addresses general problems, but at the same time it is specialized to our unique needs. It has some functions you won’t find in other libraries, and it lacks some functions you’d expect to see but that we haven’t needed.

A couple of the more successful projects were re-writes of existing software that deleted maybe 90% of the original functionality. The remaining 10% was made easier to use and was tested thoroughly. No one missed the deleted functionality in one project. In the other, users requested that we add back 1% of the functionality we had deleted.

Related posts:

Simple legacy
The simplest thing that might work

Organizational scar tissue

Here’s a quote from Jason Fried I found recently.

Policies are organizational scar tissue. They are codified overreactions to unlikely-to-happen-again situations.

Of course that’s not always true, but quite often it is. Policies can be a way of fighting the last war, defending the Maginot Line.

The entrance to Ouvrage Schoenenbourg along the Maginot Line in Alsace, public domain image from Wikipedia

When you see a stupid policy, don’t assume a stupid person created it. It may have been the decision of a very intelligent person. It probably sounded like a good idea at the time given the motivating circumstances. Maybe it was a good idea at the time. But the letter lives on after the spirit dies. You can make a game out of this. When you run into a stupid policy, try to imagine circumstances that would have motivated an intelligent person to make such a policy. The more stupid the policy, the more challenging the game.

Large organizations will accumulate stupid policies like scar tissue over time. It’s inevitable. Common sense doesn’t scale well.

The scar tissue metaphor reminds me of Michael Nielsen metaphor of organizational immune systems. Nielsen points to organizational immune systems as one factor in the decline of newspapers. The defense mechanisms that allowed newspapers to thrive in the past are making it difficult for them to survive now.

Carl Franklin interview

Carl Franklin is a many of many talents: talk show host, producer, software developer, musician, etc. He’s probably best known for his excellent .NET Rocks podcast and for the other podcasts he hosts and produces. I hope you enjoy the following interview with Carl.

JC: Your .NET Rocks podcast goes back further than podcasting. Did the show start on radio or was it always online?

CF: It was always online. Although I was inspired by public radio programs like Car Talk and Whad’Ya Know, I always thought the audience was too narrow for general radio. That, and I had web resources readily available.

JC: So the show was a set of downloadable MP3 files before RSS feeds came along to organize the files into a podcast?

CF: Exactly. We had a site more or less like it is now, with links to and info about the current show on the front page, and an archives page. We also had a newsletter we used to notify people of new shows.

JC: Could you say something about your podcasts, ones you host, produce, etc.?

CF: Well, .NET Rocks is a twice-weekly interview show for .NET devs. I am the host and Richard Campbell is the co-host. It’s an hour long, more or less. Topics range from low-level techie stuff to new technologies and methodologies to speculation about the future.

We also produce a weekly video screencast/interview show also about an hour long called dnrTV. Topics are hands-on practical. It’s recorded at 1024×768 so it will fit most projectors.

Hanselminutes is a 30-50 minute podcast with Scott Hanselman covering a wide variety of developer and technology topics. Also weekly.

RunAs Radio is a 30-50 minute weekly interview show on Microsoft-centric IT topics with Richard Campbell and Greg Hughes.

We also do an adult comedy podcast called Mondays. Richard Campbell and I basically spend an hour or so laughing at the stories and wit of Mark Miller and Karen Mangiacotti. NSFW but hilarious.

JC: On .NET Rocks, you’re the alpha geek programmer, but sometimes you mention your life as a musician and entrepreneur.Were you a musician first?

CF: Yes. I was singing in the Westerly Chorus from age 8. Piano since age 4. Guitar since age 10. Trumpet since age 10. Bass and drums came later. Programming didn’t come around till I was 17. I went to Berklee School of Music in 85-86 and Full Sail School of Recording Arts 86-87. Learned computers on my own. I was lucky to have many smart programmer friends who were willing to share their knowledge. That experience has shaped everything I have done since.

JC: How did you get started as a programmer?

CF: My dad bought a TRS-80 model 4 when I was a kid to do taxes and bills. I think VisiCalc was the only program he used. It had a guide to BASIC programming that I started reading. Between that and the TRSDOS manual I started writing some cool programs. Then I got a modem and was introduced to the BBS world. That was it. I was hooked on writing serial communications programs.

JC: You’ve mentioned Franklins.Net and Pwop productions on .NET Rocks. Could you describe these businesses and how you got started?

CF: Franklins.Net was started in 1999 as a training company. I taught VB6 and then VB.NET for several years. Pwop was started as a media production company to support the podcasts. Now Franklins.Net is the .NET education brand and Pwop is all about audio/video/music production.

JC: Do you have any other businesses?

CF: No.

JC: Let’s go back to your music. Who are some musicians that influenced you? Who do you like to listen to now?

CF: I was brought up on good old classic rock. On acoustic guitar I was influenced by John Fahey, Leo Kottke, Jorma Kaukonen, and the like. On electric guitar: Jeff Beck, Brian May, Peter Frampton, Eagles, Skynyrd, Duane Allman, Jerry Garcia, and more recently John Scofield, John Pisano, Lee Rittenour, and Pat Martino. Nowadays I’m on a New Orleans kick, hanging out with The Meters and Professor Longhair.

JC: Sounds like you’re active as a performer and a producer.

CF: Yes. I’ve produced music for a handful of artists and I play in local venues regularly.

JC: Your web site says recorded a CD with your brother Jay a few years ago. Where can we find it?

CF: We will announce a website soon with our new album, and free links to our old album.

JC: Tell me about the new CD you’re working on.

CF: It’s all original but you’ll be able to hear and identify our influences easily.

JC: Anything else you want to talk about?

CF: Sounds good to me! Thanks!!!

Related post:

Best podcast intro music (Includes a couple links to Carl’s music.)

Can you predict the "20" in 80/20?

A simplest form of the 80/20 rule says that 80% of results come from only 20% of efforts. For example, maybe the top two people on a team of 10 are responsible for 80% of the team’s output. Maybe the most popular 20% of items on the menu account for 80% of a restaurant’s sales. Maybe you read 10 books on a subject but most of what you learned comes from the best two (or the first two).

The exact numbers 80 and 20 are not special. For example, one study showed that 75% of Twitter traffic comes from the most active 5% of users. That’s still an example of the 80/20 rule. The point is that a small portion of inputs are responsible for a large portion of outputs.

One criticism of the 80/20 rule is that you can only know which 20% was most effective in hindsight. A salesman could call on 100 prospects in a week and only make sales to 20. At the end of the week he could ask “Why didn’t I just call on those 20?” Of course he had to call on all 100 before he could know who the 20 were going to be. Or maybe the best 20% of your stock portfolio accounted for 80% of your growth. Why didn’t you just invest in those stocks? If you could have predicted which ones they were going to be, you would have done just that.

It’s easy to be cynical about the 80/20 rule. There are too many hucksters selling books and consulting services that boil down to saying “concentrate on what’s most productive.” Thanks. Never would have thought of that. Let me write you a check.

At one extreme is the belief that everything is equally important, or at least equally likely to be important. At the other extreme is the belief that 80/20 principles are everywhere an that it is possible to predict the “20” part. Reality lies somewhere between these extremes, but I believe it is often closer to the latter than we think. In many circumstances, acting as if everything were equally important is either idiocy or sloth.

You can improve your chances of correctly guessing which activities are going to be most productive. Nobody is going to write a book that tells you how to do this in your particular circumstances. It takes experience and hard work. But you can get better at it over time.

Related posts:

Four reasons we don’t apply the 80/20 rule
Weinberg’s law of twins

Baklava code

“Spaghetti code” is a well-known phrase for software with tangled logic, especially legacy code with goto statements.

The term “lasagna code” is not nearly as common. I first heard it used to describe code with too many architectural layers. Then I found the following older reference to lasagna code with a different slant on the term.

Lasagna code is used to describe software that has a simple, understandable, and layered structure. Lasagna code, although structured, is unfortunately monolithic and not easy to modify. An attempt to change one layer conceptually simple, is often very difficult in actual practice.

Since “lasagna code” has a different usage, I propose the term “baklava code” for code with too many layers.

Baklava. Photo credit Wikipedia

Baklava is a delicious pastry make with many paper-thin layers of phyllo dough. While thin layers are fine for a pastry, thin software layers don’t add much value, especially when you have many such layers piled on each other. Each layer has to be pushed onto your mental stack as you dive into the code. Furthermore, the layers of phyllo dough are permeable, allowing the honey to soak through. But software abstractions are best when they don’t leak. When you pile layer on top of layer in software, the layers are bound to leak.

Related posts:

Important because it's unimportant

Some things are important because they’re unimportant. These things are not intrinsically important, but if not handled correctly they distract from what is important.

Content is more important than spelling and grammar. But grammatical errors are a distraction. Correct spelling and grammar are important so readers will focus on the content. Typos are trivial (more on “trivial” below) but worth eliminating.

When I was in college, the computer science department deliberately used a different programming language in nearly every course. The idea was that programming language syntax is unimportant, and constantly changing syntax would cause students to focus on concepts. This had the opposite of the desired effect. Since students were always changing languages, they were always focused on syntax. It would have made more sense to say that since we don’t believe programming language syntax is important, we’re going to teach all our lower division courses using the same language. That way the syntax can become second nature and students will focus on the concepts.

Grammar, whether in spoken languages or programming languages, is trivial. It is literally trivial in the original sense of belonging to the classical trivium of grammar, logic, and rhetoric. These subjects were not the goal of classical education but the foundation of classical education. We now say something is “trivial” to indicate that it is unimportant, but in the past this meant that the thing was foundational. Calling something “trivial” meant that it was important in support of something else of greater interest.

When people call something trivial, they may be correct, but not in the sense they intended. They might mean that something is trivial in the modern sense when actually it’s trivial in the classical sense. For example, unit conversions are trivial. Just ask NASA about the Mars Climate Orbiter.

Mars Climate Orbiter NASA photo

For a day or two, make note of every time you hear something called “trivial.” Ask yourself whether it is trivial in the modern sense of being simple and unimportant or whether it could be trivial in the classical sense of being foundational.

Weekend Miscellany

Math and space exploration

A mathematician behind the moon landing
Mathematicians behind the Mars rover

PowerShell

Mastering PowerShell free e-book (567 pages)
PowerShell Day 1 free e-booklet (10 pages)

Python

Run a Python session from a browser
Getting started with SciPy (Scientific Python)

Teaching

Reflections from Gian-Carlo Rota on teaching, research, etc.
Math teachers at play blog carnival

Physics

Tape makes frosted glass clear video

Adding fonts to the PowerShell and cmd.exe consoles

The default font options for the PowerShell console are limited: raster fonts and Lucida Console. Raster fonts are the default, though Lucida Console is an improvement. In my opinion, Consolas is even better, but it’s not on the list of options.

Mastering PowerShell by Tobias Weltner explains how to expand the list of font options for the PowerShell console. The same trick increases the list of font options in the Windows command prompt cmd.exe as well. The book is free for download. See page 16 for details. However, I have two comments about the instructions it gives.

First, the book says “The name must be exactly the same as the official font name, just the way it’s stated under [registry key].” However, the Consolas font is listed in the registry as “Consolas (True Type)”. You should enter “Consolas” and leave out the parenthetical description.

Second, the book says “the new font will work only after you either log off at least once or restart your computer.” When I tried it, logging off was not sufficient; I had to reboot my computer before the font change would work.

Update: In order to make this post self-contained, I’ve added below the necessary information from Mastering PowerShell.

Run regedit.exe and navigate to HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionConsoleTrueTypeFont.

Right-click in the panel on the right side and create a new string value. Name that value “0” or “00” or however many zeros you need to create a new key. That string’s value is the name of the font to add.

Update: See Necessary criteria for fonts to be available in a command window

Related posts:

Improved PowerShell prompt
A couple thoughts on typography
Better R console fonts

R Q&A

There is an organized effort to promote the StackOverflow site for questions and answers around the R programming language. It’s working: the amount of R activity on StackOverflow has greatly increased lately.

If you’re familiar with StackOverflow but not R, you might want to take a look at the R Project web site and these notes about the R language.

If you’re familiar with R but not StackOverflow, allow me to introduce you. StackOverflow is a web site for questions and answers related to programming. The site is open to all programming languages and environments, but it’s pretty strict about sticking to programming questions. (StackOverflow has two sister sites for other computing questions: ServerFault for system administration and IT issues, and Superuser for almost anything else related to computing.)

I’d like to see the R community take advantage of StackOverflow’s platform. According to Metcalfe’s law, the value of a network is proportional to the square of the number of users in the network. As more people go to StackOverflow for R Q&A, everyone gets better information and faster responses.

Related posts:

Five kinds of subscripts in R
The R book I wish someone would write
Civic duty on StackOverflow

Bad programmers create jobs

Jeff Atwood quotes an interview with David Parnas in his most recent blog post.

Q: What is the most often-overlooked risk in software engineering?

A: Incompetent programmers. There are estimates that the number of programmers needed in the U.S. exceeds 200,000. This is entirely misleading. It is not a quantity problem; we have a quality problem. One bad programmer can easily create two new jobs a year. Hiring more bad programmers will just increase our perceived need for them. If we had more good programmers, and could easily identify them, we would need fewer, not more.

IEEE floating point arithmetic in Python

Sometimes a number is not a number. Numeric data types represent real numbers in a computer fairly well most of the time, but sometimes the abstraction leaks. The sum of two numeric types is always a numeric type, but the result might be a special bit pattern that says overflow occurred. Similarly, the ratio of two numeric types is a numeric type, but that type might be a special type that says the result is not a number.

The IEEE 754 standard dictates how floating point numbers work. I’ve talked about IEEE exceptions in C++ before. This post is the Python counterpart. Python’s floating point types are implemented in terms of C’s double type  and so the C++ notes describe what’s going on at a low level. However, Python creates a higher level abstraction for floating point numbers. (Python also has arbitrary precision integers, which we will discuss at the end of this post.)

There are two kinds of exceptional floating point values: infinities and NaNs. Infinite values are represented by inf and can be positive or negative. A NaN, not a number, is represented by nan. Let x = 10200. Then x2 will overflow because 10400 is too big to fit inside a C double. (To understand just why, see Anatomy of a floating point number.) In the following code, y will contain a positive infinity.

x = 1e200; y = x*x

If you’re running Python 3.0 and you print y, you’ll see inf. If you’re running an earlier version of Python, the result may depend on your operating system. On Windows, you’ll see 1.#INF but on Linux you’ll see inf. Now keep the previous value of y and run the following code.

z = y; z /= y

Since z = y/y, you might think z should be 1. But since y was infinite, it doesn’t work that way. There’s no meaningful way to assign a numeric value to the ratio of infinite values and so z contains a NaN. (You’d have to know “how they got there” so you could take limits.) So if you print z you’d see nan or 1.#IND depending on your version of Python and your operating system.

The way you test for inf and nan values depends on your version of Python. In Python 3.0, you can use the functions math.isinf and math.isnan respectively. Earlier versions of Python do not have these functions. However, the SciPy library has corresponding functions scipy.isinf and scipy.isnan.

What if you want to deliberately create an inf or a nan? In Python 3.0, you can use float('inf') or float('nan'). In earlier versions of Python you can use scipy.inf and scipy.nan if you have SciPy installed.

IronPython does not yet support Python 3.0, nor does it support SciPy directly. However, you can use SciPy with IronPython by using Ironclad from Resolver Systems. If you don’t need a general numerical library but just want functions like isinf and isnan you can create your own.


def isnan(x): return type(x) is float and x != x
def isinf(x): inf = 1e5000; return x == inf or x == -inf

The isnan function above looks odd. Why would x != x ever be true? According to the IEEE standard, NaNs don’t equal anything, even each other. (See comments on the function IsFinite here for more explanation.) The isinf function is really a dirty hack but it works.

To wrap things up, we should talk a little about integers in Python. Although Python floating point numbers are essentially C floating point numbers, Python integers are not C integers. Python integers have arbitrary precision, and so we can sometimes avoid problems with overflow by working with integers. For example, if we had defined x as 10**200 in the example above, x would be an integer and so would y = x*x and y would not overflow; a Python integer can hold 10400 with no problem. We’re OK as long as we keep producing integer results, but we could run into trouble if we do anything that produces a non-integer result. For example,

x = 10**200; y = (x + 0.5)*x

would cause y to be inf, and

x = 10**200; y = x*x + 0.5

would throw an OverflowError exception.

Related posts:

Floating point numbers are a leaky abstraction
Anatomy of a floating point number
Overflow and loss of precision

Probability distributions in SciPy

Here are some notes on how to work with probability distributions using the SciPy numerical library for Python.

Functions related to probability distributions are located in scipy.stats. The general pattern is

scipy.stats.<distribution family>.<function>

There are 81 supported continuous distribution families and 12 discrete distribution families. Some distributions have obvious names: gamma, cauchy, t, f, etc. The only possible surprise is that all distributions begin with a lower-case letter, even those corresponding to a proper name (e.g. Cauchy). Other distribution names are less obvious: expon for the exponential, chi2 for chi-squared distribution, etc.

Each distribution supports several functions. The density and cumulative distribution functions are pdf and cdf respectively. (Discrete distributions use pmf rather than pdf.) One surprise here is that the inverse CDF function is called ppf for “percentage point function.” I’d never heard that terminology and would have expected something like “quantile.”

Example: scipy.stats.beta.cdf(0.1, 2, 3) evaluates the CDF of a beta(2, 3) random variable at 0.1.

Random values are generated using rvs which takes an optional size argument. The size is set to 1 by default.

Example: scipy.stats.norm.rvs(2, 3) generates a random sample from a normal (Gaussian) random variable with mean 2 and standard deviation 3. The function call scipy.stats.norm.rvs(2, 3, size = 10) returns an array of 10 samples from the same distribution.

The command line help() facility does not document the distribution parameterizations, but the external documentation does. Most distributions are parameterized in terms of location and scale. This means, for example, that the exponential distribution is parameterized in terms of its mean, not its rate. Somewhat surprisingly, the exponential distribution has a location parameter. This means, for example, that scipy.stats.expon.pdf(x, 7) evaluates at x the PDF of an exponential distribution with location 7. This is not what I expected. I assumed there would be no location parameter and that the second argument, 7, would be the mean (scale). Instead, the location was set to 7 and the scale was left at its default value 1. Writing scipy.stats.expon.pdf(x, scale=7) would have given the expected result because the default location value is 0.

SciPy also provides constructors for objects representing random variables.

Example: x = scipy.stats.norm(3, 1); x.cdf(2.7) returns the same value as scipy.stats.norm.cdf(2.7, 3, 1).

Constructing objects representing random variables encapsulates the differences between distributions in the constructors. For example, some distributions take more parameters than others and so their object constructors require more arguments. But once a distribution object is created, its PDF, for example, can be called with a single argument. This makes it easier to write code that takes a general distribution object as an argument.

Related posts:

Numerical computing in IronPython with IronClad
Stand-alone error function erf(x)

Weekend miscellany

Science

Sports

Operating systems

Software testing

Math

  • Calculus limericks
  • Detexify Draw a character and it will tell you the TeX notation for it.
  • History of proofs of the fundamental theorem of algebra

Business

Education