Software exoskeletons

There’s a major divide between the way scientists and programmers view the software they write.

Scientists see their software as a kind of exoskeleton, an extension of themselves. Think Dr. Octopus. The software may do heavy lifting, but the scientists remain actively involved in its use. The software is a tool, not a self-contained product.

Spiderman versus Dr. Ock

Programmers see their software as something they will hand over to someone else, more like building a robot than an exoskeleton. Programmers believe it’s their job to encapsulate intelligence in software. If users have to depend on programmers after the software is written, the programmers didn’t finish their job.

I work with scientists and programmers, often bridging the gaps between the two cultures. One point of tension is defining when a project is done. To a scientist, the software is done when they get what they want out of it, such as a table of numbers for a paper. Professional programmers give more thought to reproducibility, maintainability, and correctness. Scientists think programmers are anal retentive. Programmers think scientists are cowboys.

Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision. Scientists need to understand that prototype code may need a complete rewrite before it can be used in production.

The real tension comes when a piece of research software is suddenly expected to be ready for production. The scientist will say “the code has already been written” and can’t imagine it would take much work, if any, to prepare the software for its new responsibilities. They don’t understand how hard it is for an engineer to turn an exoskeleton into a self-sufficient robot.

More software development posts

47 thoughts on “Software exoskeletons

  1. There is an opposite aspect to this too. Software developers generally expect there to be a predetermined, well-defined goal, a specific set of functionality the software will perform. They expect that the software they write will be used for what it is initially written, in more or less the way originally envisioned. So they design and implement the system to do this, and leave some architectural wiggle room to extend or modify it in line with the original intention, but explicitly make it difficult to go outside that sphere.

    Scientists who, as you say, view software as an interactive component, will inevitably want to modify the software before it’s even finished to apply it to wildly different sets of problems. They expect to be able to use it as a component ready to be jury-rigged onto whatever project they (or their new graduate student) now want to do, even if it has only a weak connection to the previous project. They don’t appreciate encapsulation and other “robust” design features since they make it harder, not easier, to reuse the code in hitherto unexpected ways.

  2. In my misspent youth as a so-so software developer, my rule of thumb was that we’d spend 90% of our effort developing the functionality (the scientist part) and the other 90% putting it into production.

    Hence The Basic Cost Estimate: (1) make a realistic estimate of the effort required. (2) Double it. (3) Change the unit of measurement to the next higher order of magnitude. So, something you can code in a hour will take 2 days; something you pencil as a $200 job will cost out at $4000; etc.

  3. SteveBrooklineMA

    Remember the Climate-gate code that was leaked a couple of years ago? As a scientist-programmer, the code looked pretty good to me. Indeed, it was better than a lot of what I see and deal with every day. But the programmer-blogger community was appalled.

  4. Your fx picture at the top is a real life example of that tension between the two styles of coding. In order to produce a character such as spiderman or Dr.Oct there will almost always be a round of experimentation with some rough code; either to find a look or to design some physical aspects.
    In the vfx world this could be anything from a standalone app to plugins for a 3d package (e.g. houdini) to tinkering with matlab/mathematica. Once a movie gets going the code will generally need to iron out its rough edges and from there tools will be generalized and possibly retained for the next production (spiderman 4). Even within a single fx house, code will drift back and forth between td’s (aka scientists) and a more formal software dept. It’s nice to reuse code but audiences spend $12+ to something new each time.

  5. >”Programmers think scientists are cowboys.”

    YEEEHAW! :-)

    A great analogy. Sometimes I write a bit of code just to confirm or reject an idea, and there is never a thought of making it useful for any other purpose.

  6. “Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision.”

    Programmers do understand that – shell pipelines?

    I think the difference is more along the lines of packaging, like Mike Anderson says above. Scientists don’t have to package, programmers do. And until one has packaged and released software for someone else to use, that person knows nothing about packaging and releasing software.

  7. Replace “Programmer” with “Engineer”, and your article can be reposted about nearly ANY field.

    (How’s that for production-worthy?)

  8. As a “software engineer” I think it’s important to definite the scope of the work you need to do. If you are building a system that needs to withstand the test of time its best to think of ways to make your software as flexible as possible. I can write something that has a specific purpose to prove or disprove an idea but I can also write something that is easy to add new features to, test and release to production. In grad school I had the luxury of testing and tweaking and writing as much throw away code I needed for testing etc but in the real world I’m usually don’t have that luxury. I really enjoyed just tinkering and experimenting so I guess I can see both sides of the coin.

  9. Another interesting way to look at it is that software code is like any other language. An engineer has a different goal than a research scientist as Dusty pointed out. Scientists papers aren’t necessarily meant to be used like a cookbook recipe while a document produced by an Engineer *definately* is.

  10. Scientists might view code as a throwaway analysis tool, but perhaps they should think of it more as an experimental apparatus. It need not be permanent, but it should be constructed well enough to make its results trustworthy. Just as importantly, it should be described and documented sufficiently well that other scientists can replicate their results.

    “Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision.”

    This may be true of a data analysis script, but I have a hard time imagining this is true of a simulation.

  11. Change “scientists” with “designers” and you can add that post in the web development category (the cowboy thing still applies).

  12. Mike Anderson…You rock! Your “Basic Cost Estimate” must play out in every software shop in the country!

  13. Wonderfully written.

    The risk in the scientist approach is the assumption that the code may not be needed again. There is a solution though. Throw it up somewhere (github?). At least that way you might be able to use it again if needed, or confirm assumptions 5 years after the fact.

  14. If you want something done once have a scientist do it. If you need it more than once get an engineer/programmer. In addition, scientist rarely work in a true collaboration environment, working with grad students doesn’t count cause they are not a peer. While programmers/engineers have to work closely with peers.

  15. The same applies to hw engineers who create adapters (think video or communications), then write a simple device driver to check the functionality of the hw. Their software “tests” the hardware, and they are then done, “throwing it over the fence”, and expect it to be shipped ASAP.
    Programmers take the original device driver and may take it apart, add to it, or rewrite it in order to meet quality expectations in production. HW Engineers don’t necessarily see the need, and think that they are doing both hw AND sw jobs…

  16. Scientists also tend to use huge, old, very well tested libraries, like Numerical Recipes. I remember chatting with physicists a few years go. They were vaguely considering transitioning from Fortran to C for a new project, but they were not completely convinced that Numerical Recipes would work in exactly the same way.
    So I think that they see that the part that should be kept from project to project is the library.
    Another important difference is that scientists, in my experience mostly with physicists, are very reluctant to share their code, particularly if it is old, beloved and able to produce simulations that then turn into papers that then turn into (forgive me the brevity) tenure.
    On the other hand, engineers like to show off their mad software skills.
    Another thing is that we all think oh yeah, this is just a one-time program. And then it slowly worms its way until it is at the core of your business.

    One last thing, if you want to see true sloppiness and madness, just look at some code written by designers (worse yet, design students). Most times, it will make you cry bitter tears of despair.

  17. This is a lovely little post. The concept of “technical debt” is useful for reconciling these two points of view (though I would tread carefully introducing a financial metaphor to scientists! :) Projects can be “done” from a scope point of view, but still carry large technical debts. The level of technical debt will be unimportant for “exoskeletal” code, but critical for “robotic” code. Report technical debt from the inception of all projects to avoid nasty surprises when readying for production.

  18. I have been thinking about this very thing for many years now. Though I may not agree with the analogies in places, I do agree with the spirit of the post in many ways. I am at present a scientific software developer and have lived in both worlds of scientist and programmer. I relate strongly to John’s last paragraph:

    The real tension comes when a piece of research software is suddenly expected to be ready for production. The scientist will say “the code has already been written” and can’t imagine it would take much work, if any, to prepare the software for its new responsibilities. They don’t understand how hard it is for an engineer to turn an exoskeleton into a self-sufficient robot.

    I see this all too often with “scientists.” I use this term loosely because many of these scientists I know come from engineering and applied math backgrounds too. So I cannot pigeon-hole anyone discipline to this issue.

    As a computational physicist, I feel unless you have been through the whole life-cycle as a user, developer, producer, and maintainer (as someone mentioned) in the computational sciences then you often have blind spots and inexperienced conclusions as to what it takes to do this work.

    Like Mark Hoemmen, I have been saying for many years now that scientific software is an experimental apparatus and that scientist should apply and expect the same standards to their software construction and the use of it as they do experiments. To a scientist the word “reproducibility” should be dear to their heart. If it is not reproducible, for themselves and others, then it is not really science when it is all said and done. I have followed John’s blog for awhile and he is no stranger to the idea of reproducibility so I do not think has missed that point at all.

    Many experimental scientists are both scientist and engineer, so I do not see this as a programmer vs a scientist or an engineer vs a scientist or etc…

    I see this as about doing good science which takes both good science and engineering … yes even software engineering.

    What really bothers me is that many “scientist” let science, especially reproducibility, fly out the window when computer simulations are involved. Why I am not sure? But I think John’s post is exploring possible answers to this question. I appreciate him taking the time to write his thoughts about it.

  19. Scientist Person

    I’m a scientist who is programmer-minded. My tendency to write good code slows down my publishing of papers compared to many scientists. Unfortunately, the establishment treats number of publications as the be-all and end-all of science. What I know is that writing “throw-away” code is very subject to error. My guess is that large numbers of publications have MAJOR errors affecting the analysis but as it was done only once, they were never caught. Effectively the scientist keeps reducing the data until they get the results that “make sense” and are roughly what are to be expected. This tends to prevent detection of major blunders that still lead to reasonable looking results. I know this occurs from my editing of my peers code. Being anal retentive I find tons of mistakes and shortcuts that I consider inexcusable. Another problem is that scientists tend to use untrustworthy code as a blackbox to a degree far beyond what is healthy.

  20. Walter: People are funny about how they evaluate sources of error. They may be extremely conservative in their choice of libraries, then not bother to test the code they write using those libraries. Or they will make crude modeling assumptions then wonder whether floating point error is a problem.

    Scientist Person: A great deal of scientific analysis is simply wrong. (Here are some blog posts I’ve written about problems with reproducibility.) But there’s little incentive to be careful, unless you care about telling the truth. You can get more publications by being sloppy. Not only can you write each paper faster, you may get more positive results. In a field with few positive results, you may get significant results more easily by making mistakes than by being clever.

  21. .

    To sum it up, scientists never have to prove that their code can survive end-users interaction and expectations.

    Engineers are less lucky.

    .

  22. Scientists should learn to program! Then they too, could know the superhuman strength of watching their ideas become something tangible! As a developer, we have enough problems translating the ideas of others into usable requirements. If your field requires a quick turnaround and the ability to quickly adapt and rapidly develop one time code, then it sounds like you really need to adopt the skills you need to do your job better. Anyone can learn to program if motivated. My 9 year old son was writing software in less than a week with python and a few youtube videos. The reality is that at some point soon everyone should learn how to make a computer do what they want, the way they want it. Learn to drive, we are getting tired of driving everyone else around in their own expensive cars… As more and more people buy computers colleges will need to train more programmers, as it stands in 25 years as a software developer we haven’t grown in numbers in the way we need to keep up with the demand! We need more doers and less demanding users or as a society we will drown under our own weight!

  23. John: Scientists can program, some quite well. But scientists and programmers have different goals.

    I prefer working with scientists who either cannot program or who can program well. The ones in the middle are the most frustrating to work with.

  24. We’ve seen far too much of this in the climate modeling community. The rush to publish, high dollar grants and, yes, personal ideologies have led to some very bad ‘science’. The infamous hockey-stick is simply the most prominent example, not the only abuse of software (intentional or no) going on.

  25. Very interesting article.

    I’m a PhD scientist (well, half applied maths and half freshwater scientist) with a background in programming. I struggle with this issue of whether to just dash of the code to make the correct plot from the data I currently have or to think more broadly about getting the data structures to increase future usability and reduce errors.

    Does anyone have any recommendations of how to learn what the best balance is? I guess experience and looking at the code of more experienced programmers will get me there in the end, but some shortcuts would be much appreciated.

    It seems well worth writing

  26. As a software engineer , when deciding which features to add or remove I always ask myself “Will the presence/absence of this feature increase the probability that I will be called at 2 am in the morning or during the weekend.” The result of this is that my company’s operational team (the guys who support the software 24×7) is very pleased with my job performance and I can sleep at better at nights and have my weekends to myself.

    I think many people involved in the design of software (scientists, product planners, etc) don’t realize how much effort is required to make sure software has robust support for such trivial sounding things as logging, error checking, configuration, and release management (e.g. what is the workflow + support for putting out new releases, backing them out if something goes wrong, etc). This is a lot of fixed overhead and needs to be done right (and optimally consistently across the entire organization).

  27. @Charlie It’s surprising how often “dashed-off” code intended for one use gets reused. Furthermore, you are your own end user. So the point is not so much to make your code maximally general, but to make it maximally reusable — by documenting it, testing it, and archiving it (e.g., as you would archive a lab notebook entry). If you can pick it up a year or two later, use it after a couple minutes reading your own “user” documentation, and start modifying it after a couple more minutes of reading the code documentation, you’ve succeeded :-)

  28. Thanks Mark. That sounds like a good level of documentation quality to aim for.

    Some of the software I work with is used commercially. In this code, the programmers have clearly thought about how best to structure their data and how to make it flexible for future developments. I think about doing this for my scripts and codes, but I think it would be more hassle than it is worth. Also it is pretty much impossible to make code which is entirely flexible to any unknown future development. It’s certainly something I think about though.

  29. I suppose being reusable, as you say, is a different thing and a bit more easily attainable.

  30. @Charlie That’s right. I think of reusability as a step towards reproducibility. If you can come back to a script and your original data a year later, and see right away how to run it again, it means you’re that much closer to being able to reproduce your results.

  31. I’m (unfortunately) in the best position to observe this difference in attitude as my job involves getting enterprise software developers to write programs that serve the scientific community.
    The fundamental problem is that the scientists are also the clients and they don’t want a robot: Probably they just asked for a better exoskeleton. Scientists want to get things done: they want a tool that does the job and generally, the really important part is the algorithm. My analogy is that of a scientist who wants to move a load of heavy stuff and asks the developer to build him a truck but is presented instead with a Ferrari, because is faster and more elegant.

    Scientists need to understand that prototype code may need a complete rewrite before it can be used in production.
    Says who? The scientist coder is also the scientist client: if she thinks that the software is feature complete is because she understands the community expectations, however lax these may appear to the professional developer.
    The real tension comes when a piece of research software is suddenly expected to be ready for production. The scientist will say “the code has already been written” and can’t imagine it would take much work, if any, to prepare the software for its new responsibilities.
    Same concept applies here: are those changes strictly necessary, or are they scope creep, where it’s the developer that is adding features that the scientist sees no need for?

  32. There is more to software development than just a programmer or a developer / engineer.

    A client orders some software and has requirements for it. A software company comes up with a definition of the program that fullfills the requirements. Then the definition is given to a software designer. The designer designs the program, and then a programmer writes program code. After that the code is tested. And finally the program is installed and the installation is given to a maintainer.

    Sure there can be less overhead if you write little pieces of programs, but usually things get bigger. Then if you think just that “oh, we need software, ok hire a programmer”, then you miss a lot of software industry.

  33. I will add to the content of the blog that
    the end product of a programmers work is the software, and that is why
    they want to sell that or make its services available. The end product
    of a scientist producing software is not their software but the
    research outcome publications/copyrights IP generated from their
    software and so they see their software not as the end product but an
    extension to their body as exoskeleton which can help them get
    different end products!

  34. Dr. Oc. and others. I have found, in my experience, it is not the addition of features that is the main problem though that is certainly a concern that has to be watched out for. But scope creep can be overcome with continuous planning and prioritizing with both the client and developers present as a team. It is how the actual software is developed that is often in question.

    The “scientist” will say for example, “Why are you testing so much at this level? Just write the code and we will test it when we get it.” They think we are taking too long and are wasting time with how we are doing development. They want to test the code mainly in an very integrated product way and are not testing each component to see if mets certain “specs” as an experimentalist would do on the experimental apparatus. They often threaten to write their own code or hire their own “programmers” to do it how they think it should be done. They pretty much throw the scientific method out the window for reasons they would not for an experiment. And even after getting an integrated product the “scientist” does not even do their simulation in a scientific way which is a story for another time. I see this over and over in different places. I no longer think it is just the culture of one place and the other places have it right. It is wife spread.

    When you have spent time doing each piece of the computational science life-cycle, which includes the software development life-cycle too, then you see there is in general a very strong disconnect between the computational and experimental methods when there really should not be. And computational reproducibility is usually seriously in question.

  35. My own thoughts turned into a rather lengthy blog post:

    http://sciblogs.co.nz/code-for-life/2011/07/27/research-project-coding-v-enduser-application-coding/

    As key addition I wanted to offer is that if an aim is to spin off an end-user application ideally this wants to be recognised and built into the project from the onset, so that the coding approach used reflects that the code later intends to be part of an application. Of course, there is a lot of other stuff I ramble on about ;-)

  36. There’s at least one group who realized that the gap between what scientists write and production quality software needs to be addressed. Instead of writing stuff in Matlab which needs to be converted to C++, they wrote a nice Matlab-like library for C++:
    http://arma.sourceforge.net

  37. @tetley99 While easier-to-use libraries are good, that’s not really the point of this article as I understand it. The point is that many scientists see code as nothing more than whiteboard scribblings, whereas programmers see code as a document describing their activities. The programming language has very little to do with it.

  38. LOL… If the code is there lets build it a ship it….I dare you. Regarding the “done” status..software is often “done”, but rarely is the work finished. I on the other hand am often finished..even tho the software isn’t done. I am finished at least until someone tries to use the product. I understand the difference between shelfware and software, and know that you can sell either.

  39. The problem manifests when a scientist takes a program that was intended to be used once in a limited fashion and wants to expand it to work differently. Software engineers know how to develop code that is flexible and internally decoupled to allow

    I recently worked with a statistician who had developed a novel modeling program using Excel as a front end to drive statistical models in R. This was quite novel and worked well. Unfortunately, the way it worked was that Excel was used to populate a few tables as input, then generate some CSV files as input to the R programs — Excel didn’t really do any calculations because R was doing the heavy lifting.

    When I was consulted to create a web-based interface, I found it to be difficult because so much of the code was structured around the way that Excel needed things to be represented instead of proper data structures. Eventually an intern was brought in to add even more complicated scripting to the Excel sheet, instead of scrapping the UI and starting over as a proper web application interfacing to an R engine. Really a bad approach in the long run forced by a series of bad application design decisions.

  40. If you want a programmer to understand that something is being developed for one time throwaway use, stop calling it programming, and start calling it scripting. This is a problem of implicit expectations regarding terminology selection.

Comments are closed.