Reasoning about code

Posted on 11 August 2009 by John

I ran across a rant the other day from someone frustrated with Ruby on Rails. (I’d include a link, but the article has been deleted.) The main argument was this:

You cannot reason about your program, ever. There are no semantics!

I’m not going to take sides on whether he’s right; I’ve never written a line of Ruby. However, I believe I understand what he’s referring to. Ruby allows developers a great deal of flexibility. That flexibility can be a blessing when you want to use it yourself but a curse when others want to exercise it. The ability to redefine everything in sight can help you get your work done faster while confusing those who have to modify your code later.

The person quoted above complained that his colleagues had abused the flexibility of Ruby to the point that he couldn’t tell what their code did. He believes Ruby, and especially the Rails framework, encourages such abuse. A Ruby advocate might reply “Ruby is a powerful language, and with great power comes great responsibility. It’s not Ruby’s fault if your coworkers are irresponsible.” I don’t want to take sides in an argument over the virtues and vices of Ruby; I’m not qualified to comment. But there is a deeper issue that I do want to explore: the importance of being able to reason about code.

What assumptions can you make when looking at a line of code? You can only count on what the language enforces, but with some confidence you can depend on language conventions, at least as a working hypothesis. Therein lies the source of many programming language debates. To what extent are you willing to live with the constraints necessary to allow you to draw strong conclusions about what a line of code does and does not do? To what extent are you willing to trust your colleagues (and yourself) to make the intent of code plain?This is not an absolute distinction. No language can guarantee everything you’d like to assume about a chunk of code, and you’ve got to have some level of trust in order to work on a substantial program. But some languages do allow you to take more for granted than others.

Statically typed languages guarantee that if code compiles, at least functions are being passed the correct data types. Attempting to pass an apple to a function expecting an orange will give a compile-time error. Proponents of dynamic typing say that such guarantees are not so valuable. Knowing that your orange function will only receive oranges doesn’t assure you that it will do the right thing with that orange. Some argue that the ability to detect type errors at compile time is not worth the increased overhead of explicit typing, especially if you have good unit tests.

Functional programming languages offer different kinds of guarantees. These languages do not allow functions to change the contents of variables as a side effect, or at least they require you to explicitly mark functions that can have side effects. The only impact of calling a function is the value it returns. This makes it possible to make strong assumptions when reasoning about code.

I find functional programming more interesting than static versus dynamic typing. It is nice to know, for example, that I didn’t accidentally pass a string to a function expecting a number, for example. But it’s much more valuable to know that calling a function didn’t change the state of my program in an unexpected way. You can understand a functional programs by understanding each function in isolation. With conventional imperative programs, you have to worry more about the interactions of functions and the order in which they are called.

You can choose to program in a functional style using a non-functional language, say in Python, but doing so is a matter of convention and is not enforced by the language. This returns to the matter of trusting yourself and your colleagues rather than trusting the language.

I believe that in the future there will be more emphasis on language features that make it easier to reason about code. One driving factor is the rise of multi-core processors. Reasoning about multi-threaded code is very difficult and most programmers have avoid it until now, but programmers are being forced to write multi-threaded software in order to take advantage of multi-core processors. I don’t believe functional programming will ever become dominant, though I do believe it will become more common.

10 thoughts on “Reasoning about code”

Daniel Lemire

11 August 2009 at 09:30

My experience has been that working with the codes of others is not a problem if everyone is an expert programmer. In these cases, the language is not much of an issue though you need everyone to be fluent in the chosen languages.

You have problems when weak programmers are part of your team. Maybe they are weak because they have little experience with the language, or they are just not good programmers in general.

Regarding Python and functional programming, the problem with a pure functional approach is that it may make things awkward at times. Look for Haskell recipes and you will find out that people have had to be extremely clever to implement what would be an elementary algorithm in Java. In this regard, a hybrid approach like the one Python uses is best.

Thus, I am convinced that Ruby on Rails is indeed a great thing (even though I never tried it) but it solves one particular problem. Maybe functional programming in MapReduce is absolutely fantastic too.

Sadly, or happily, to be good at programming, you need to know many languages, many tricks, many techniques.
Danny

11 August 2009 at 10:40

This seems to be an argument for strict style guidelines. At the big software company I worked for, there were many disallowed constructs, even for C++ (e.g., no passing by reference if it’s not const). Obviously if there was some strange reason that you absolutely had to break the rules, you could, but it would require convincing the code reviewer that there was no other clean way, and you’d need to put in a comment explaining your decision and warning people that the typical convention doesn’t apply.

This was a real revelation for me. Once you have strict style guidelines (even down to the naming of functions and spacing of for loops), you can make very strong assumptions about what a line of code means, to the point where you can understand a whole block of somebody else’s just by glancing at it. I don’t know how you would reasonably collaboratively develop code otherwise.

So what I’m getting at is that it seems to be a shortcoming of the company’s style practices, and it seems cheap to blame it on the language/framework itself.
Daniel Lemire

11 August 2009 at 11:09

@Danny Well, you had formal code review, which is, in itself, a fantastic tool. I’m not sure you need the company-specific conventions if you have code review. The code reviewer will enforce, de facto, the necessary conventions.

Combining unit testing and code review, and you will get clean, maintainable code every time. The language will not matter a lot. (Of course, higher-level languages will cost less.)
Daniel Lemire

11 August 2009 at 11:15

@John

Automated tests are useful, but necessarily shallow (given that we still don’t have strong AI). You may abide by the conventions, but write overly complicated code because you forgot to check for a simpler approach. A human being peering through your code will tend to ask “code this be simpler?”.

While style matters, I’m not sure it is all that important to know where you put your brackets. It is fine to have a common style… but it is probably less useful than people imagine (I’d love to see a study!).

Consider the case of research papers. We have fairly intense conventions. For example, even if you write the paper all by yourself, you must refer to yourself as “we”. In fact, I catch myself often talking to others about my research and referring to “we” or “us” even when I mean “me alone”. How useful is this convention? Not very useful, I would think.

Ultimately, to determine whether a piece of code or a research paper is well made, we need an expert human being to go through it all. Sure, I’m hoping this will not be true in 10, 20 or 50 years… but for now…
John

11 August 2009 at 10:57

@Daniel: I agree that the purest functional programming is impractical. But it should be possible to have large sections that are pure. Also, it is possible for a machine to verify that sections marked as pure really are pure.

@Danny: I agree that strict style guidelines are a good thing, that rather than limiting creativity they channel creativity into more productive areas. But I prefer machine-verified conventions when possible. Companies would do well to write static analyzers to verify the information implied by their conventions is true. Trust but verify.
David Gladfelter

11 August 2009 at 14:02

Excellent post, thanks. I’ve been thinking along these lines myself. One feature that I wish C#/.NET had was “const” methods and properties. This feature is in C++ and I used it all the time in that language.

This would guarantee that property accessors, etc didn’t have side-effects and it would help both general readability and especially multi-threaded programming. Knowing a method didn’t change an object would make it much simpler for human and machine analysis of data shared between multiple threads.

Don’t underestimate the importance of standard casing/prefixing conventions, either. An experienced programmer knows that a capital letter at the start of a word means a type or a method/property without any consious thought. I personally like the ‘m_’ prefix for private type data members as it allows me to unconsiously scan for side-effects and state-dependence in code. Back in the ’90’s I did follow the guidance from Microsoft and used Hungarian notation. I don’t go full hungarian any more as types are becoming less and less important as unit testing and runtime libraries are replacing the compiler as enforcers of and guiders towards correctness. Also, Hungarian’s just plug-ugly.

Dave
John Moeller

11 August 2009 at 13:36

This reminds me of Scott Meyers’ injunction against writing “write-only” code. I’ve certainly been guilty of it, but fortunately that code (AFAIK) has had few maintenance issues. I also agree that FP deserves a serious look for concurrent programming.

On the topic of enforcement, whether it be automated testing, code reviews, or using the language’s rules in your favor, I think that unfortunately, any method requires discipline.

There are really good ways of using C++’s rules so that the compiler does the heavy lifting of enforcing correctness. Unfortunately, another programmer can just wield the broadsword of (void *) to circumvent all your work instead of asking for your clarification.

Problems like this are only solved by formal code reviews to the extent that anyone wants to take the time to participate. One can also game code reviews through social engineering (anyone who thinks that they’re immune to this is mistaken). Automated tests can be circumvented by simply not running them. This happens, for example, when there’s too much pressure to release on an unrealistic deadline.

I think that these systems are most effective when a culture of robustness and correctness is fostered. Then people will agree with the use of these systems.

(P.S. A comment RSS feed would be handy; I prefer to use RSS for checking up on comments)
John Moeller

11 August 2009 at 14:29

Also, Hungarian’s just plug-ugly.

I’ll say. There’s something about Hungarian notation that just offends me to the core. I can’t stand the stuff. I will say, grudgingly, that it leaves no question about what is what. However, its need is usually obviated by a good modern code editor.

What I find important about code conventions is readability more than anything else, and I like tools like Eclipse that give you a ridiculously easy way of enforcing style without getting in the way.
D. Nielsen

11 August 2009 at 15:33

The underlying topic here (referring back to the first paragraph of the original post) is really linguistic relativity (or the Sapir-Whorf postulate): the idea–or reality–that language constrains thought. It seems to be as true of programming languages as it is of natural languages. However with programming languages it’s easier than with natural languages to switch and try reasoning in a different language.

In part, we are seduced by the power of modern functional languages to think that we can always and only reason about our code in the language itself. Contrast that with software development in assembly language, where reasoning about the code usually makes use of pseudocode, flowcharts, and other representations–other languages, if you will.

I mostly use Python these days, and appreciate its succinct expressive power, but there are certainly problems for which the solutions can be easily implemented in Python, but not easily arrived at just by thinking in Python. I’ve never used Ruby, but the same must certainly be true of it, so the answer to the original complaint about Ruby is simply to reason about your program in a different language. Having to do so is not (necessarily) any slight to your chosen implementation language.
Omar Gómez

13 August 2009 at 18:04

Pure functional programing is just like Math. A well designed Haskell program can’t fail. But at the same time, some type of problems are just too impractical for Haskell to model (try the simplest OpenGl videogame in Haskell)

Python is dynamic typed and you sure can make a mess, but is this flexibility what allows you to express complex problems in elegant ways.

Neither is good or bad, they are just tools. You see, IMHO I think the problem with our craft is that we tend to forget that the language is just a tool. In the same way a carpenter doesn’t ask his hammer about the chair he’s going to build, programmers can’t pretend his language of choice is any guarantee of program’s semantics. Some things live outside the language. Look at GoF patterns. They’re extremely helpful to comprehend the complexity of large programs, but no tool can parse code and say, hey there’s a template method here, or proxy there. That can only lives in programmer’s head. Now consider that every domain problem has its own set of “patterns”. See how complex the problem of semantics is?

I think that that Ruby community has failed at building a good vocabulary of patterns that programmers can use to communicate their problems/models. Not Ruby community really but RoR’s which use Ruby as their weapon of choice.
I’m sure there’s great Ruby engineers (Martin Fowler comes to mind) but they are eclipsed by a myriad of mediocre RoR developers and their use-don’t-think philosophy.

Thanks for this fascinating post
–Omar

Comments are closed.

Related posts

10 thoughts on “Reasoning about code”