Scaling the number of projects

Software engineers typically use the term “horizontal scalability” to mean throwing servers at a problem. A website scales horizontally if you can handle increasing traffic simply by adding more servers to a server farm. I think of horizontal scalability as the number of projects increases, rather than increasing the performance demands on a single project. My biggest challenges have come from managing lots of small projects, more projects than developers.

I’ve seen countless books and articles about how to scale a single project, but I don’t remember ever seeing anything written about scaling the number of projects. It sounds easy to manage independent projects: if the projects are for different clients and they have different developers, just let each one go their own way. But there are two problems. One is a single developer maintaining an accumulation of his or her own projects, and the other is the ability (or more important, the inability) of peers to maintain each other’s projects. Projects that were independent during development become dependent in maintenance because they are maintained at the same time by the same people. Consistency across projects didn’t seem necessary during development, but then in maintenance you look back and wish there had been more consistency.

Maintenance becomes a tractor pull. Robert Martin describes a software tractor pull in his essay The Tortoise and the Hare:

Have you ever been to a tractor pull? Imagine a huge arena filled with mud and churned up soil. Huge tractors tie themselves up to devices of torture and try to pull them across the arena. The devices get harder to pull the farther they go. They are inclined planes with wheels on the rear and a wide shoe at the front that sits squarely on the ground. There is a huge weight at the rear that is attached to a mechanism that drags the weight up the inclined plane and over the shoe as the wheels turn. This steadily increases the weight over the shoe until the friction overcomes the ability of the tractor.

Writing software is like a tractor pull. You start out fast without a lot of friction. Productivity is high, and you get a lot done. But the more you write the harder it gets to write more. The weight is being dragged up over the shoe. The more you write the more the mess builds. Productivity slows. Overtime increases. Teams grow larger. More and more code is piled up over the shoe, and the development team grinds to a halt unable to pull the huge mass of code any farther through the mud.

Robert Martin had in mind a single project slowing down over time, but I believe his analogy applies even better to maintenance of multiple projects.

To scale your number of projects you’ve got to enforce consistency before there’s an immediate need for it. But there you face several dangers. Enforcing apparently unnecessary consistency could make you appear arbitrary and damage morale. And you’ll make some wrong decisions. You’ve got to have a lot of experience to predict what sort of policies you’ll wish in the future that you had enforced. These issues are challenging when scaling a single project, but they are more of challenging when scaling across smaller projects because you don’t get feedback as quickly. On a single large project, you may feel the pain of a bad decision quickly, but with multiple small projects you may not feel the pain until much later.

Quality is critical when scaling the number of projects. Each project needs to be better than seems necessary. When you look at a single project in isolation, maybe it’s acceptable to have one bug report a month. But then when you have an accumulation of such projects, you’ll get bug reports every day. And the cost per bug fix goes up over time because developers can most easily fix bugs in the code freshest in their minds. Fixing a bug in an old project that no one wants to think about anymore will be unpleasant and expensive.

Scaling your number of projects requires more discipline than scaling a single project because feedback takes longer. Although scaling single projects gets far more attention, I suspect a lot of people are struggling with scaling their number of projects.

Quantity and quality

Here’s a quote from a recent blog post from Tom Peters:

You will be remembered in the long haul for the quality of your work, not the quantity of your work—the quantity part is just your defective ego talking—no one evaluates Picasso based on the number of paintings he churned out.

Wine, Beer, and Statistics

William Sealy Gosset a.k.a.

William Gosset discovered the t-distribution while working for the Guinness brewing company. Because his employer prevented employees from publishing papers, Gosset published his research under the pseudonym Student. That’s why his distribution is often called Student’s t-distribution.

This story is fairly well known. It often appears in the footnotes of statistics textbooks. However, I don’t think many people realize why it’s not surprising that fundamental statistical research should come from a brewery, and why we don’t hear of statistical research coming out of wineries.

Beer makers pride themselves on consistency while wine makers pride themselves on variety. That’s why you’ll never hear beer fans talk about a “good year” the way wine connoisseurs do. Because they value consistency, beer makers invest more in extensive statistical quality control than wine makers do.

Bugs in food and software

What is an acceptable probability of finding bug parts in a box of cereal? You can’t say zero. As the acceptable probability goes to zero, the price of a box of cereal goes to infinity. In practice, the software for space probes. And yet even these projects have to tolerate some non-zero probability of error. It’s not worthwhile to spend 10 billion dollars to prevent a bug in a billion dollar mission.

Bugs are a fact of life. We can insist that they are unacceptable or we can pretend they don’t exist, but neither approach is constructive. It’s better to focus on the probability of running into bugs and consequences of running into bugs.

Not all bugs have the same consequences. It’s distasteful to find a piece of a roach leg in your can of green beans, but it’s not the end of the world. Toxic microscopic bugs are more serious. Along the same lines, a software bug that causes incorrect hyphenation is hardly the same as a bug that causes a plane crash. To get an idea of the potential economic cost of  running into a bug, and therefore the resources worthwhile to detect and fix it, multiply the probability by the consequences.

How do you estimate the probabilities of software bugs? The same way you estimate the probability of bugs in food: by conducting experiments and analyzing data. Some people find this very hard to accept. They understand that testing is necessary in the physical world, but they think software is entirely different and must be proven correct in some mathematical sense. They object that computer programs are complex systems, too complex to test. Computer programs are complex, but human bodies are far more complex, and yet we conduct tests on human subjects all the time to estimate different probabilities, such as the probabilities of drug toxicity.

Another objection to software testing is that it can only test paths through the software that are actually taken, not all potential paths. That’s true, but the most important data when estimating the probability of running into a bug is data from people using the software under normal conditions. A bug that you never run into has no consequences.

But what about people using software in unanticipated ways? I certainly find it frustrating when I uncover bugs when I use a program in an atypical way. But this is not terribly different from physical systems. Bridges may fail when they’re subject to loads they weren’t designed for. There is a difference, however. Most software is designed to permit far more uses than can be tested, whereas there’s less of a gap in physical systems between what is permissible and what is testable. Unit testing helps. If every component of a software system works correctly in isolation, it more likely, though not certain, that the components will work correctly together in a new situation. Still, there’s no getting around the fact that the best tested uses are the most likely to succeed.

Plane crashes, software crashes, and business crashes

I’ve run into the same theme in very different contexts lately: people ignore data from crashes.

FlowingData has an article today claiming that, contrary to popular belief, some parts of an airplane are safer than others.  According to the article, pundits routinely claim that all seats are equally safe even though data show that the probability of surviving a plane crash varies from 49% in the front of the aircraft up to 69% in the rear.

Also today, Coding Horror published its second article on software crashes. See Crashing Responsibly and Twitter: How Not To Crash Responsibly. Many applications don’t collect data from crashes, and those that do don’t always make good use of it.

Finally, Scott Shane’s book The Illusions of Entrepreneurship (ISBN 0300113315) examines small business crashes. Entrepreneurs, investors, and policy makers often make decisions based on myths that are soundly refuted by data.

Barriers to good statistical software

I attended a National Cancer Institute workshop yesterday entitled “Barriers to producing well-tested, user-friendly software for cutting-edge statistical methodology.” I was pleased that everyone there realized there is a huge difference between code created for personal use and reliable software that others would willingly use. Not all statisticians appreciate the magnitude of the difference.

I was also pleased that several people at the workshop were aware of the problem of irreproducible statistical analyses. Not everyone was aware how serious or how common the problem is, but those who were aware were adamant that something needs to be done about it, such as journals requiring authors to publish the code used to analyze their data.

Publishing correct sample code

It’s infuriating to read published sample code that’s wrong. Sometimes code given in books is not even syntactically correct. I’ve wondered why publishers didn’t have a way to verify that the code at least compiles, and maybe even check that it gives the stated output.

Dave Thomas said in recent interview that his publishing company, The Pragmatic Programmers, does just that. Authors write in a logical mark-up language and software turns that into a publishable form, compiling code samples and inserting the output. Sample code from one of their books is more likely to work the first time you type it in than code from other publishers.

Automated software builds

My first assignment as a professional programmer was to build another person’s program. I learned right away not to assume a project will build just because the author says it will. I’ve seen the same pattern repeated everywhere I’ve worked. Despite version control systems and procedures, there’s usually some detail in the developer’s head that doesn’t get codified and only the original developer can build the project easily.

The first step in making software builds reproducible is documentation. There’s got to be a document explaining how to extract the project from version control and build it. Requiring screen shots helps since developers have to rehearse their own instructions in order to produce the shots.

The second step is verification. Documentation needs to be tested, just like software. Someone who hasn’t worked on the project needs to extract the code onto a clean machine and build the project using only written instructions — no conversation with the developer allowed. Everyone thinks their code is easy to build; experience says most people are wrong.

The verifiers need to rotate. If one person serves as build master very long, they develop the same implicit knowledge that the original programmers failed to codify.

The third step is automation. Automated instructions are explicit and testable. If automation also saves time, so much the better, but automation is worthwhile even if it does not save time. Clift Norris and I just wrote an article on CodeProject entitled Automated Extract and Build from Team System using PowerShell that helps with this third step if you’re using Visual Studio and VSTS.

Text reviews for software

When users find spelling and grammar errors in your software, your credibility takes a hit. But apparently very few software projects review the text their software displays. I imagine the ones that do review their text use a combination of two leaky methods: asking execution testers to take note of prose errors, and requiring that all text displayed to users be stored in a string table.

There are a couple problems with asking execution testers to be copy editors. First, they’re not copy editors. They may not recognize a grammatical error when they see it. Second, they only see the text that their path through the software exposes. Messages displayed to the user under unusual circumstances slip through testing.

String tables are a good idea. They can be reviewed by a professional editor. (Or translator, if you’re application is internationalized.) But it’s difficult to make sure that every string the user might see is in the string table. When you need to add a few quick lines of error-handling code, it’s so easy to just include the text right there in the code rather than adding an entry to the string table. After all, you say to yourself, the code’s probably not going to run anyway.

My solution was to write a script that extracts all the quoted text from a source tree so it can be reviewed separately. The script tries to only pick out strings that a user could see, filtering out, for example, code quoted inside code. Doing this perfectly would be very hard, but by tolerating a small error rate, the problem can be solved quickly in a few lines of code. I’ve used this script for years. Nearly every time I run it I discover potentially embarrassing errors.

In addition to helping with copy editing, an extract of all the string literals in a project gives an interesting perspective on the source code. For example, it could help uncover security risks such as SQL injection vulnerabilities.

I’ve posted an article on CodeProject along with the script I wrote.

PowerShell Script for Reviewing Text Shown to Users

The script on CodeProject is written for Microsoft’s PowerShell. If anyone would like a Perl version of the script, just let me know. I first wrote the script in Perl, but then moved it to PowerShell as my team was moving to PowerShell for all administrative scripting.

New spin on the cathedral and the bazaar

Eric Raymond’s famous essay The Cathedral and the Bazaar compares commercial software projects to cathedrals and open source software projects to bazaars. Cathedrals are carefully planned. Bazaars are not. The organizational structure a bazaars emerges without deliberate coordination of its participants. The open source community has embraced the metaphor of the bazaar and the informality and spontaneity it implies.

Shmork wrote the following observation in the comments to a Coding Horror post yesterday that discussed the difficulties of using Linux software.

Almost nobody in the Western world shops at real-life bazaars either, because they are dodgy, unsafe, and unregulated. And in the Western world, we like things to be reliable, working, safe. So cathedral it is. Even our flea markets aren’t bazaars, really, they’re just knock-off cathedrals.