Posts Tagged ‘Quality’

Quantity and quality

Thursday, July 3rd, 2008

Here’s a quote from a recent blog post from Tom Peters:

You will be remembered in the long haul for the quality of your work, not the quantity of your work—the quantity part is just your defective ego talking—no one evaluates Picasso based on the number of paintings he churned out.

Wine, Beer, and Statistics

Friday, June 27th, 2008

William Gosset discovered the t-distribution while working for the Guinness brewing company. Because his employer prevented employees from publishing papers, Gosset published his research under the pseudonym Student. That’s why his distribution is often called Student’s t-distribution.

This story is fairly well know. It often appears in the footnotes of statistics textbooks. However, I don’t think many people realize why it’s not surprising that fundamental statistical research should come from a brewery, and why we don’t hear of statistical research coming out of wineries.

Beer makers pride themselves on consistency while wine makers pride themselves on variety. That’s why you’ll never hear beer fans talk about a “good year” the way wine connoisseurs do. Because they value consistency, beer makers invest more in extensive statistical quality control than wine makers do.

Bugs in food and software

Thursday, June 19th, 2008

What is an acceptable probability of finding bug parts in a box of cereal? You can’t say zero. As the acceptable probability goes to zero, the price of a box of cereal goes to infinity. In practice, the FDA sets very small but non-zero limits on the probability of finding bug parts in food. This is unsettling at first, but there’s no rational way around it.

What is an acceptable probability of finding bugs in your software? Again, you can’t say zero. The cost increases without bound as the quality requirements increase. In my previous post, I wrote about the extraordinary quality procedures for writing software for space probes. And yet even these projects have to tolerate some non-zero probability of error. It’s not worthwhile to spend 10 billion dollars to prevent a bug in a billion dollar mission.

Bugs are a fact of life. We can insist that they are unacceptable or we can pretend they don’t exist, but neither approach is constructive. It’s better to focus on the probability of running into bugs and consequences of running into bugs.

Not all bugs have the same consequences. It’s distasteful to find a piece of a roach leg in your can of green beans, but it’s not the end of the world. Toxic microscopic bugs are more serious. Along the same lines, a software bug that causes incorrect hyphenation is hardly the same as a bug that causes a plane crash. To get an idea of the potential economic cost of  running into a bug, and therefore the resources worthwhile to detect and fix it, multiply the probability by the consequences.

How do you estimate the probabilities of software bugs? The same way you estimate the probability of bugs in food: by conducting experiments and analyzing data. Some people find this very hard to accept. They understand that testing is necessary in the physical world, but they think software is entirely different and must be proven correct in some mathematical sense. They object that computer programs are complex systems, too complex to test. Computer programs are complex, but human bodies are far more complex, and yet we conduct tests on human subjects all the time to estimate different probabilities, such as the probabilities of drug toxicity.

Another objection to software testing is that it can only test paths through the software that are actually taken, not all potential paths. That’s true, but the most important data when estimating the probability of running into a bug is data from people using the software under normal conditions. A bug that you never run into has no consequences.

But what about people using software in unanticipated ways? I certainly find it frustrating when I uncover bugs when I use a program in an atypical way. But this is not terribly different from physical systems. Bridges may fail when they’re subject to loads they weren’t designed for. There is a difference, however. Most software is designed to permit far more uses than can be tested, whereas there’s less of a gap in physical systems between what is permissible and what is testable. Unit testing helps. If every component of a software system works correctly in isolation, it more likely, though not certain, that the components will work correctly together in a new situation. Still, there’s no getting around the fact that the best tested uses are the most likely to succeed.

Plane crashes, software crashes, and business crashes

Tuesday, May 20th, 2008

I’ve run into the same theme in very different contexts lately: people ignore data from crashes.

FlowingData has an article today claiming that, contrary to popular belief, some parts of an airplane are safer than others.  According to the article, pundits routinely claim that all seats are equally safe even though data show that the probability of surviving a plane crash varies from 49% in the front of the aircraft up to 69% in the rear.

Also today, Coding Horror published its second article on software crashes. See Crashing Responsibly and Twitter: How Not To Crash Responsibly. Many applications don’t collect data from crashes, and those that do don’t always make good use of it.

Finally, Scott Shane’s book The Illusions of Entrepreneurship examines small business crashes. Entrepreneurs, investors, and policy makers often make decisions based on myths that are soundly refuted by data.

Barriers to good statistical software

Friday, May 16th, 2008

I attended a National Cancer Institute workshop yesterday entitled “Barriers to producing well-tested, user-friendly software for cutting-edge statistical methodology.” I was pleased that everyone there realized there is a huge difference between code created for personal use and reliable software that others would willingly use. Not all statisticians appreciate the magnitude of the difference.

I was also pleased that several people at the workshop were aware of the problem of irreproducible statistical analyses. Not everyone was aware how serious or how common the problem is, but those who were aware were adamant that something needs to be done about it, such as journals requiring authors to publish the code used to analyze their data.

Publishing correct sample code

Friday, May 9th, 2008

It’s infuriating to read published sample code that’s wrong. Sometimes code given in books is not even syntactically correct. I’ve wondered why publishers didn’t have a way to verify that the code at least compiles, and maybe even check that it gives the stated output.

Dave Thomas said in recent interview that his publishing company, The Pragmatic Programmers, does just that. Authors write in a logical mark-up language and software turns that into a publishable form, compiling code samples and inserting the output. Sample code from one of their books is more likely to work the first time you type it in than code from other publishers.

Automated software builds

Sunday, April 20th, 2008

My first assignment as a professional programmer was to build another person’s program. I learned right away not to assume a project will build just because the author says it will. I’ve seen the same pattern repeated everywhere I’ve worked. Despite version control systems and procedures, there’s usually some detail in the developer’s head that doesn’t get codified and only the original developer can build the project easily.

The first step in making software builds reproducible is documentation. There’s got to be a document explaining how to extract the project from version control and build it. Requiring screen shots helps since developers have to rehearse their own instructions in order to produce the shots.

The second step is verification. Documentation needs to be tested, just like software. Someone who hasn’t worked on the project needs to extract the code onto a clean machine and build the project using only written instructions — no conversation with the developer allowed. Everyone thinks their code is easy to build; experience says most people are wrong.

The verifiers need to rotate. If one person serves as build master very long, they develop the same implicit knowledge that the original programmers failed to codify.

The third step is automation. Automated instructions are explicit and testable. If automation also saves time, so much the better, but automation is worthwhile even if it does not save time. Clift Norris and I just wrote an article on CodeProject entitled Automated Extract and Build from Team System using PowerShell that helps with this third step if you’re using Visual Studio and VSTS.

Text reviews for software

Friday, April 11th, 2008

When users find spelling and grammar errors in your software, your credibility takes a hit. But apparently very few software projects review the text their software displays. I imagine the ones that do review their text use a combination of two leaky methods: asking execution testers to take note of prose errors, and requiring that all text displayed to users be stored in a string table.

There are a couple problems with asking execution testers to be copy editors. First, they’re not copy editors. They may not recognize a grammatical error when they see it. Second, they only see the text that their path through the software exposes. Messages displayed to the user under unusual circumstances slip through testing.

String tables are a good idea. They can be reviewed by a professional editor. (Or translator, if you’re application is internationalized.) But it’s difficult to make sure that every string the user might see is in the string table. When you need to add a few quick lines of error-handling code, it’s so easy to just include the text right there in the code rather than adding an entry to the string table. After all, you say to yourself, the code’s probably not going to run anyway.

My solution was to write a script that extracts all the quoted text from a source tree so it can be reviewed separately. The script tries to only pick out strings that a user could see, filtering out, for example, code quoted inside code. Doing this perfectly would be very hard, but by tolerating a small error rate, the problem can be solved quickly in a few lines of code. I’ve used this script for years. Nearly every time I run it I discover potentially embarrassing errors.

In addition to helping with copy editing, an extract of all the string literals in a project gives an interesting perspective on the source code. For example, it could help uncover security risks such as SQL injection vulnerabilities.

I’ve posted an article on CodeProject along with the script I wrote.

PowerShell Script for Reviewing Text Shown to Users

The script on CodeProject is written for Microsoft’s PowerShell. If anyone would like a Perl version of the script, just let me know. I first wrote the script in Perl, but then moved it to PowerShell as my team was moving to PowerShell for all administrative scripting.

New spin on the cathedral and the bazaar

Wednesday, April 2nd, 2008

Eric Raymond’s famous essay The Cathedral and the Bazaar compares commercial software projects to cathedrals and open source software projects to bazaars. Cathedrals are carefully planned. Bazaars are not. The organizational structure a bazaars emerges without deliberate coordination of its participants. The open source community has embraced the metaphor of the bazaar and the informality and spontaneity it implies.

Shmork wrote the following observation in the comments to a Coding Horror post yesterday that discussed the difficulties of using Linux software.

Almost nobody in the Western world shops at real-life bazaars either, because they are dodgy, unsafe, and unregulated. And in the Western world, we like things to be reliable, working, safe. So cathedral it is. Even our flea markets aren’t bazaars, really, they’re just knock-off cathedrals.

Good, fast, or cheap: Can you really pick two?

Monday, March 10th, 2008

There’s a saying that clients can have good, fast, or cheap. Pick two, but then the third will be whatever it has to be based on the other two choices. You can have good and fast if you’re willing to spend a lot of money. You can have fast and cheap, but the quality will be poor. You might even be able to get good and cheap, if you’re willing to wait a long time.

A variation on this theme is the iron triangle. You draw a triangle with vertices labeled “features”, “time” and ”resources.” If you make two of the sides longer, the third has to become longer too. Here goodness is defined as a feature set rather than quality, but the same principle applies.

There’s a problem with this line of reasoning: no matter what clients say, they want quality. They may say they want fast and cheap, and if you tell them you’ll sacrifice quality to deliver fast and cheap, you’ll be a hero — until you deliver. Then they want quality. As Howard Newton put it

People forget how fast you did a job, but they remember how well you did it.

Sometimes you can cut features as long as you do a good job on the features that remain, but only to a point. Clients are not going to be happy unless you meet their expectations, even if those expectations are explicitly contradicted in a contract. You can tell a client you’ll cut out frills to give them something fast and cheap, and they’ll gladly agree. But they still want their frills, or they will want them. The client may be silently disappointed. Or they may be vocally disappointed, demanding excluded features for free and complaining about your work. Eventually you learn what features to insist on including, even if a client says they can live without them.

Proofs of false statements

Wednesday, February 6th, 2008

Mark Dominus brought up an interesting question last month: have there been major screw-ups in mathematics? He defines a “major screw-up” to be a flawed proof of an incorrect statement that was accepted for a significant period of time. He excludes the case of incorrect proofs of statements that were nevertheless true.

It’s remarkable that he can even ask the question. Can you imagine someone asking with a straight face whether there have ever been major screw-ups in, say, software development? And yet it takes some hard thought to come up with examples of really big blunders in math.

No doubt there are plenty of flawed proofs of false statements in areas too obscure for anyone to care about. But in mainstream areas of math, blunders are usually uncovered very quickly. And there are examples of theorems that were essentially correct but neglected some edge case. Proofs of statements that are just plain wrong are hard to think of. But Mark Dominus came up with a few.

Yesterday he gave an example of a statement by Kurt Gödel that was flat-out wrong but accepted for over 30 years. Warning: reader discretion advised. His post is not suitable for those who get queasy at the sight of symbolic logic.

Paper doesn’t abort

Wednesday, January 30th, 2008

My daughter asked me recently what I thought about a Rube Goldberg machine she sketched for a school project. I immediately thought about how difficult it would be to implement parts of her design. I asked her if she really had to build it or just had to sketch it. When she said she didn’t really have to build it, I told her it was great.

I thought of a friend’s comment about designing code versus actually writing code: “I’ve never seen a piece of paper abort.” The hard part about writing software is that people can tell when you’re wrong.

Programming the last mile

Tuesday, January 29th, 2008

In any programming project there comes a point where the programming ends and manual processes begin. That boundary is where problems occur, particularly for reproducibility.

Before you can build a software project, there are always things you need to know in addition to having all the source code. And usually at least one of those things isn’t documented. Statistical analyses are perhaps worse. Software projects typically yield their secrets after a moderate amount of trial and error; statistical analyses may remain inscrutable forever.

The solution to reproducibility problems is to automate more of the manual steps. It is becoming more common for programmers to realize the need for one-click builds. (See Pragmatic Project Automation for a good discussion of why and how to do this.  Here’s a one-page summary of the book.) Progress is slower on the statistical side, but a few people have discovered the need for reproducible analysis.

It’s all a question of how much of a problem should be solved with code. Programming has to stop at some point, but we often stop too soon. We stop when it’s easier to do the remaining steps by hand, but we’re often short-sighted in our idea of “easier”. We mean easier for me to do by hand this time. We don’t think about someone else needing to do the task, or the need for someone (maybe ourselves) to do the task repeatedly. And we don’t think of the possible debugging/reverse-engineering effort in the future.

I’ve tried to come up with a name for the discipline of including more work in the programming portion of problem solving. “Extreme programming” has already been used for something else. Maybe “turnkey programming” would do; it doesn’t have much of a ring to it, but it sorta captures the idea.

Empirical support for TDD

Saturday, January 26th, 2008

Phil Haack gives his summary of a recent study on the benefits of test-driven development (TDD). The study had two groups of students write unit tests for their programming assignments. Students assigned to the test-first group were instructed to write their unit tests before writing their production code, as required by TDD. Students assigned to the test-last group were told to write their tests after writing their production code. Students in the test-first group wrote higher quality code.

The study concluded that code quality was correlated with the number of unit tests, independent of whether the test were written first or last. However, the test-first students wrote more tests in the same amount of time.

Note that students were assigned to test-first or test-last. Most programming studies are just surveys.  The results are always questionable because professional programmers decide their tools. So, for example, you cannot conclude from a survey that technique X makes progrogrammers more productive than technique Y. The survey may say more about the programmers who chose each technique than about the techniques themselves.

Complementary validation

Thursday, January 10th, 2008

Edsgar Dijkstra quipped that software testing can only prove the existence of bugs, not the absense of bugs. His research focused on formal techniques for proving the correctness of software, with the implicit assumption that proofs are infallible. But proofs are written by humans, just as software is, and are also subject to error. Donald Knuth had this in mind when he said “Beware of bugs in the above code; I have only proved it correct, not tried it.” The way to make progress is to shift from thinking about the possibility of error to thinking about the probability of error.

Testing software cannot prove the impossibility of bugs, but it can increase your confidence that there are no bugs, or at least lower your estimate of the probability of running into a bug. And while proofs can contain errors, they’re generally less error-prone than source code. (See a recent discussion by Mark Dominus about how reliable proofs have been.) At any rate, people tend to make different kinds of errors when proving theorems than when writing software. If software passes tests and has a formal proof of correctness, it’s more likely to be correct. And if theoretical results are accompanied by numerical demonstrations, they’re more believable.

Leslie Lamport wrote an article entitled How to Write a Proof where he addresses the problem of errors in proofs and recommends a pattern of writing proofs which increases the probability of the proof being valid. Interestingly, his proofs resemble programs. And while Lamport is urging people to make proofs more like programs, the literate programming folks are urging us to write programs that are more like prose. Both are advocating complementary modes of validation, adding machine-like validation to prosaic proofs and adding prosaic explanations to machine instructions.