As bad as corporate software may be, academic software is usually worse. I’ve worked in industry and in academia and have seen first-hand how much lower the quality bar is in academia. And I’m not the only one who has noticed this.
Why does this matter? Because buggy code is biased code. Bugs that cause the software to give unwanted results are more likely to be noticed and fixed. Bugs that cause software to produce expected results are more likely to remain in place.
If your software simulates some complex phenomena, you don’t know what it’s supposed to do; that’s why you’re simulating. Errors are easier to spot in consumer software. A climate model needs a higher level of quality assurance than a word processor because bugs in the latter are more obvious. Genomic analysis may contain egregious errors and no one ever know, but a bug in an MP3 player is sure to annoy users.
You have to test simulation software carefully. You have to test special cases and individual components to have any confidence in the final output. You can’t just look at the results and say “Yeah, that’s about what I expected.”
wouldn’t this also go hand in hand with open sourcing most of the research and working together on core pieces? no matter how much you test you will always miss bugs. if you are the only user you have no chance of finding them, so spread it amongst several departments/universities.
which is funny because if i remember correctly you even wrote a post about the subject of missed bugs and testing.
Open source development would be great, if it meant multiple eyes looking at the code.
But most academic software is written one person, often a grad student, working in isolation. The code might be “open source” in the sense of being available for download, but nobody but the author could use it. If someone else could get it to compile, it would probably crash on any dataset other than the one it was written for.
The problem I have with Open Source projects is that you end up with code like this
template
calculated-result-type lgamma(T z, const Policy&);
instead of code like this
double lgamma(double x);
Computer scientists may think the former is great, but should other scientists working on small-lab projects? Open Source projects may result in a gap between people who understand and are able to modify the code and people who understand the science behind the code.
How true! Recently I was working on a project where we were comparing two different measurement tools in order to decide which one to purchase. Our user base gave really positive comments on one of the tools. These tools almost always are set to measure when no one is around so we need them to return good automated results. In our analysis, it turned out that that tool was giving expected results even when there was nothing to be measured. Of course the users liked this tool better – it was giving them the result they wanted even when that was wrong!
The goal of research code, by definition, is research, not production. I don’t think it’s a quality issue per se.
Any code that someone depends on should be tested. How much you test depends on whether lives depend on the outcomes (i.e. flight control or medical device code).
I don’t think researchers shouldn’t be taxed with writing code that installs on a variety of operating systems and runs out of the box for different data sets or is easy to adapt to all sorts of variants. I find it most useful if it’s tight and clean and I can easily see what’s going on algorithmically.
As to Steve’s comment above, overgeneralization is a problem that plagues all sorts of code. Especially so-called “enterprise” code and code written by someone who thinks that’s the “right” way to write any piece of code. I usually find research code a bit under-generalized in the sense of too much cut-and-paste. It introduces errors, but generalizing the right way is costly in terms of effort, especially for academics that don’t have a lot of experience writing reusable code libraries.
I agree that exploratory code doesn’t necessarily need multiple data sets. It depends on why the code won’t run on a new data sets. Maybe there’s something about the new data set that the code wasn’t designed to handle. Fine. Or maybe the new data set exposes a bug whose effect on the original data set went unnoticed.
Suppose a program is transposing its data set by mistake, reading rows as columns and vice versa. If you pass in a square data set, your results will be wrong, but the program won’t crash. But if you have more rows than columns, the new data set may cause a crash. The crash is just a more dramatic indication of a bug that was there all along.
An idea would be to integrate “some” stochastic variations in the initial dataset and look at how the code handles uncertainties. Depending on the situation, the generated distribution can be evaluated and therefore compared to what the code provides. It is not better for providing the code but it gives a larger confidence zone as for the data it is aimed at simulating
If there is some objective criterion for correctness, you can accept junky code. But there seems to be quite a bit of research results that are not much more than what the output of the model says.
I fully agree with the article. I am a lone (lonely?) grad student writing simulation code and have found bugs in my code as well as others. One cool thing about some of the buggy code is the results are biased in such a way that it produces a new and interesting result. I might never have realized that one variable could have such an effect on the final outcome and cause the results to skew a certain way. Though bad, bugs can be informative too.
Fully agree with article.
Also, to one of the commenters (John). Not to sound as hater, but let’s be realistic about OS as well, without all this fun boy stuff.
In majority of OS projects, code quality is poor. And if you not a contributor an not really familiar with code, you are facing long, long hours to figure out why your tests failing in some obscure cases.
Take for example some big project as SignalR and try to figure out why hub subscriptions sometimes fails.
I think it’s that ore buggy code is written by those who are NOT “professional programmers” but are in some other field and do programming as only part of their job. They learn enough to get by and “get things done” but don’t understand about reliable code and what it takes to write it. If you point this out, they might argue “I don’t have time to do a deep-dive into programming theory/philosophy/whatever, I just need to get this done.”
Structured programming was “the new exciting thing” when I was in college learning FORTRAN, BASIC and thankfully Pascal, but it eventually became so universally accepted that it’s rarely mentioned. There are more recent innovations that also help reduce bugs and make code clearer and easier to read (thus many bugs are easier to spot just by reading the source): limiting the length of functions, limiting the depth of nested blocks, refactoring (rewriting code to comply with such limits and choosing appropriate and clear function and variable names) and such. At present these are “advanced” topics one may learn well after one has written 100+ line programs. I might argue these should be learned as soon as possible, in the first programming class, as soon one learns what a function or subroutine is, and is writing programs over a dozen lines long. Don’t let people fall into bad habits that need to be unlearned – you don’t want them to end up like what Dijkstra said about those exposed to BASIC.