What does this code do?

At the SciPy 2010 conference, a speaker showed several short code samples and asked us what each sample did. The samples were clearly written, but we had no comments to provide context. This was the last sample.

    def what( x, n ):
        if n < 0:         
            n = -n         
            x = 1.0 / x     
        z = 1.0     
        while n > 0:
            if n % 2 == 1:
                z *= x
            x *= x
            n /= 2
        return z

The quiz was at the end of the day and I was tired. I couldn’t tell what the code does. Then I found out to my chagrin that the sample above implements an algorithm I know well. I’ve written the same code and I’ve even blogged about here.

This exercise changed my opinion of “self-documenting” code. Without some contextual clue, it is hard to understand the purpose of even a small piece of code.

Meaningful variable and function names would have helped, but a tiny comment might have helped even more. Not some redundant comment like explaining that the line x = 1.0 / x takes a reciprocal, but a comment explaining the problem the code is trying to solve.

For another example, what do you think this code does?

    uint what()
    {
        m_z = 36969 * (m_z & 65535) + (m_z >> 16);
        m_w = 18000 * (m_w & 65535) + (m_w >> 16);
        return (m_z << 16) + (m_w & 65535);
    }

It’s clear enough what the code does at a low level—it’s just a few operations—but it’s not at all clear what it’s for.

Try to figure out what the code samples do before reading further. But if you give up, the first example is described here and the second example comes from here.

In an ordinary face-to-face conversation, more information is conveyed non-verbally than verbally. We may think that our literal words are most important, but so much is conveyed by voice inflection, facial expression, posture, etc. Something similar is going on with source code. When we read a piece of source code, we typically bring a huge amount of implicit knowledge with us.

Suppose a coworker Sam asks you to look at his code. The fact that the question came up at work provides a large amount of context; this isn’t just a random code fragment on the web. More specifically, you know what kinds of projects Sam works on. You know why Sam wants you to look at the code. He may be showing you something he’s proud of or he may be asking for help finding a bug. You know a lot about his code before you see it.

Now suppose you’re a contractor. Sam was hit by a bus and you’ve been asked to work on his projects until he gets out of the hospital. You may complain to his office mate that Sam’s code is an awful mess, but she can’t understand what you’re talking about. She thinks his code is perfectly clear.

Now suppose you’re a contractor on the opposite side of the world from Sam. You have even less context than if you were in his office talking to his office mate. After a great deal of agony, you send your contribution back to Sam’s company. You comment your code beautifully, but Sam’s colleagues complain that your code is poorly written and that you didn’t solve the right problem.

Institutional memory is more valuable than source code comments. It costs a great deal to replace a programmer, even one who leaves behind well-commented code.

Related posts

26 thoughts on “What does this code do?

  1. In the first case, a meaningful name would have been perfectly sufficient, but in the second, a brief comment on the method — or just John’s link — would have been welcomed by me.

  2. Dimitris Leventeas

    # This is a comment.
    Every function/class/structure etc should have a short description which explains its goal. When a part of a function is heavily optimised (or just obscured), it should have a comment explaining the desired functionality and what this code snippet does.
    In other words, I believe that both functions demonstrated here should be commented, but for the first one a short description or the name of the algorithm should be enough.

  3. Mat: I agree that meaningful function names can be very helpful, maybe more helpful than comments. But a project can still be a mystery even if all function names are skillfully chosen.

  4. The biggest gotcha (to me) in the first example is requiring n to be an integer.
    what( 3, 1.0 ) has very different side effects than what( 3, 1 )

    Communicating the assumptions a method has is often as important and communicating what the method is for.

  5. Self-documenting code is a fallacy. There are several intrinsic issues with the notion. A symbolic language of operations and instructions can not convey the same level of detail and context as the written word. Alas, neither are satisfactory in their own right. This should be no surprise to anyone, and I believe it’s only been a surprise to those who have decided not to observe the texts from which they gleaned their formal education.

    Maths are a fantastic analog. A mathematical expression does not stand alone in self-documenting form, despite the level of concision that mathematical notation provides. A chart illustrating statistical findings is without point unless context is provided. A program is little other than collection of instructions, who’s point is often obscured without a description of the function of the program. There is no reason program instructions differ from equations or figures. The figure may say a thousand words, and the same for a program, but it does not stand alone.

    Beyond the lack of context created by the void of documentation — program instructions often deserve a description of rationale. Irrespective of the constraints of a language or environment, there is always more than one way to solve a problem. Clearly provided rationale benefits future on-lookers and should be considered a necessary element of any algorithm implementation. Dubious and redundant comments are naturally unnecessary — just as a mathematical proof can omit many given postulates. But even a mathematical proof cannot omit written word. I see no reason program instructions would be different, and scoff at the repeated attempts to convince the communities otherwise.

    Writing legible code is a noble goal, and we should all share this goal and continue to reduce to practice methods for providing logical syntax and grammar which is descriptive as possible. Asserting that intent can be clearly described using descriptive grammar, and that this practice is independently satisfactory for expressing our intent as authors, is ignoring all evidence to the contrary. Ignoring evidence is the antithesis of the scientific method, and a bane to progress. To accept any postulate ignoring evidence is to rely on faith. Faith belongs in religion, not in the discipline of developing software.

  6. uint multiply_with_carry_generator()
    {
    m_z = 36969 * (m_z & 65535) + (m_z >> 16);
    m_w = 18000 * (m_w & 65535) + (m_w >> 16);
    return (m_z << 16) + (m_w & 65535);
    }

  7. I’d like to add that meaningful variable names don’t always help.

    Suppose you’re coding up the quadratic formula. The code would be more readable with variables a, b, and c rather than quadratic_coefficient, linear_coefficient, and constant_term. It would be better to explain the meanings of a, b, and c in a comment than to use long variable names.

    Long variable names are often very useful, but sometimes there’s a conflict with problem domain conventions, as in the quadratic equation example. Or maybe the length of variables just makes the code too cluttered. I’d err on the side of long names, but you can’t make that a rigid rule.

    Larry Wall suggested that the length of a variable’s name should be proportional to its scope. I think that’s a sensible rule of thumb.

  8. It exponentiate x with n.


    #!/usr/bin/python

    def what( x, n ):
    ....

    def main():
    for x in xrange(-10, 10):
    for n in xrange(0, 10):
    print("x=%d, n=%d what=%d" % (x, n,what(x, n)))

    if __name__ == '__main__': main()

  9. Paul Infield-Harm

    John: You imply that there is a difference between being “clearly written” and “understandable.” For example, you describe the snippet as “clearly written” but not easily understood. I’m not quite sure I get the distinction.

    By analogy, suppose we were talking about regular English writing. How would the following statement make sense? “That sentence is clearly written, but I don’t understand it.” The best I could come up with is “the sentence is legibly presented on the page, and conforms to English grammar, but cannot restate with much degree of confidence what the author intended to convey.” This seems pretty weak, given how people usually use the term “clear” when discussing writing.

    Does your understanding of “clearly written” code extend beyond layout and being able to compile? If so, how?

  10. The first one I figured out pretty quick. The reciprocal and multiplying x repeatedly based on n is a pretty good clue. The second I wasn’t able to figure out. While and65535 and shift16 are used to move upper and lower bits around, the numbers 36969 and 18000 are just magical constants that are meaningless without context. Rename the functions to “exp” and “rng” and I wouldn’t complain, that is all that’s needed to understand the code. What is not said is why these particular versions of algorithms were chosen: Were they more efficient? Better distribution of values? First one found on google?

  11. Comments and well structured code are great. Documentation to reinforce those comments are even better. Too few engineers recognize the tangible benefit to well written documentation.

  12. Paul: I agree that it’s odd to call something “clear’ that isn’t understandable. The code example is clear at the lowest level of abstraction. It is conventional, not deliberately obfuscated, etc. But the meaning isn’t clear. I normally wouldn’t use “clear” to describe such code. I only meant that it was clear at the most concrete level.

  13. nes: The first example is special only because of my experience. I was presented with an algorithm I knew well and I couldn’t recognize it out of context. I’m not claiming that it is an especially tricky algorithm. I probably would have figured out its purpose if I had had more time and patience. But the code was not as easily recognizable as I would have supposed.

    Perhaps the title of this post is misleading. I didn’t intend this to be a puzzle per se; I wanted to use the examples as an introduction. My point was that we depend on context more than we are aware.

  14. In reality, thought, very few commercial code bases are documented at all, and when they are it’s usually useless – my favourite looks something like this:

    // Add 2 to the result
    $i += 2;

    The ability to make sense of other people’s undocumented brainfarts is one of the attributes that distinguishes good and bad programmers.

  15. This is an easy fix…

    The first one can be re-written like this…

    def do_fast_exp( x, n ):
    if n 0:
    if n % 2 == 1:
    z *= x
    x *= x
    n /= 2
    return z

    The second one can be re-written…

    uint simple_random_number_generation()
    {
    m_z = 36969 * (m_z & 65535) + (m_z >> 16);
    m_w = 18000 * (m_w & 65535) + (m_w >> 16);
    return (m_z << 16) + (m_w & 65535);
    }

    In any event… self documenting code is useful at the top of the document for me. I have yet to come across a piece of code with instructions which assisted with making it better.

    I am quite serious when I say the only thing code comments have been good for is figuring out if the person knew what they were doing or if they were chasing imaginary problems. We all do it sometimes…

    What worries me is non-linear code… or code which is written schizophrenically. Trying to figure out where everything ties in can be impossible. Code is executed linearly… so why not write it like that… jmo…

  16. I think the rule is quite simple.

    First, try to attain self-documenting code. This wont cover everything, but I suspect it can cover 80% codebase of every projects.

    Second, if the self-documenting code are hard/impossible to achieve due to either domain conflicts or gamma ray burst, then have comments to assist the context of the code.

    Third, sometimes comments cannot convey big enough pictures. In that case, have documents, which might have all those fancy graphs, diagrams and nice formatting. Better yet, put the link or any other way to access the document in the comment.

    There you have it, simple three rules to follow. Notice that the more you go from rule #1 to rule #2 to rule #3, there are more documentation maintenance that we have to do. Code is easier to update than comment (ie. its hard noticing that you are updating hash from MD5 to SHA-2 without noticing the function name is hashMD5() for example) and comments are easier to update than documents (now which one of the documents that refers to THIS particular line of code that I changed… etc).

    There is no perfect approach to anything, but at least we can start with the most “right” first and then start breaking the inconvenient rule one by one. Note that for each rule, there will be some cost/consequences down the line.

    Lastly, note that when choices have to be made, no one can get it right 100% of the time. Accept the fact even the best programmers might make some mistakes. This will some you from some heart attack few months later when you revisit those codes that you are writing today.

  17. Sorry, but the purpose of the first one should have been pretty obvious to any engineer.

  18. I guess the other lessons that can be learnt is not to analyze code when tired or under time pressure. IMHO, it’s difficult to easily recognize undocumented code when there are no visual landmarks that immediately scream out its purpose. Analyzing code line-by-line until its purpose is absolutely clear takes time.

  19. Khairul, that’s exactly the problem — we *have* to analyze code when we’re tired and under pressure. That’s why things like sensible naming, judicious comments, and maybe even external documents are essential.

  20. Looking at that second example and with the “simple_random_number_generator” name for it as a hint, I’d still find it woefully under-commented — in fact, I immediately have a further question: Did the author of this code get those numbers from someone who knew what they were doing, or did they just make them up? Is this generator actually any good (and for what values of any good?).

    If it were my code, and I were commenting it what I consider “right”, I’d add a comment along the lines of “Basic pseudo-random numbers. Constants from $source, periodicity $N.”

    The heuristic I’ve heard, which certainly applies here, is that comments are to explain why.

  21. I prefer the code in your examples to what I usually get handed to straighten out. I usually receive code where the only comments are lines of code that have been commented out. I particularly loathe large blocks of commented out code which contain one or two lines of operational code.

Comments are closed.