My friend Clift Norris has identified a fundamental constant that I call Norris’ number, the average amount of code an untrained programmer can write before he or she hits a wall. Clift estimates this as 1,500 lines. Beyond that the code becomes so tangled that the author cannot debug or modify it without herculean effort.
37 thoughts on “Norris’ number”
Comments are closed.
So, is the answer giving the novices smaller projects?
The problem isn’t novices, people who intend to be professional programmers but who lack experience. The problem is non-programmers who nevertheless write code. The latter know enough to get something to work, sorta, but they are unaware of strategies to manage complexity.
Isn’t a novice definitionally someone who lacks experience? I’ve (we’ve all) met people who knew their DS&A (and otherwise “trained” in a traditional sense) but couldn’t code their way out of a paper bag.
To make the definition workable, if not more exact, you have to define what effort is herculean.
Don’t we also need to define what a reasonable line length is? I know many novices who cram far more onto a single line that was necessary or sane, all in aid of keeping the line count down. 1,500 lines of code can vary wildly.
This is meant to be humorous, not scientific. However, there is a vaguely quantifiable kernel of truth in there, that there is a remarkable consistency in how much an untrained programmer can write before hitting a wall.
@John There is a major kernel of truth, almost at Mythical Man-Month level.
What counts as training for you and Mr. Norris? I’ve been programming for pay for a decade now with no formal training, and at the beginning what you describe happened for me at about 500 lines. (I didn’t realize at first that variables could be local.) After a decade of coding, working with good coders, and reading everything I can, it starts to get hairy at about 20,000 — but there are still areas in which I’m completely untrained that a computer science education would have covered in the first year. On the other hand, I’ve known experienced “software engineers” with not CS, but CIS degrees who couldn’t write a piece of comprehensible code over 1500 lines.
So what kind of training is relevant here, in your view? From my anecdotes, experience and education even in combination aren’t enough; what’s needed is the combination of native aptitude, education, and experience.
So, is there any good reference for such training or is it only a matter of experience?
Interesting thought. Is the other way around also true? Once you’ve managed a 1500+ lines of code working program, are you a trained programmer then?
Jason: What constitutes training? Not necessarily formal training. Learning from the wisdom of others, wherever you discover the information. Even just being aware that there are techniques for programming is a huge start, knowing that there is something to learn from others.
This is silly. I was an ‘untrained programmer’ (read: no CS degree) for many years before pursuing a MS. I was capable of writing large programs that were maintainable. And to add to that, the greatest programmer I’ve ever had the pleasure to work with never had one day of formal training. These types of generalizations only serve to discourage those who are passionate but lack the background.
What an excellent summation of an observation I’ve noted for years. And even the number itself seems reasonable for certain languages/toolsets.
For years as a larval (largely self-taught) programmer I hit this wall a lot. And then in the middle of the 90’s it just vanished. That “training” kicked in, I guess. I stumbled on the right mixtures of compartmentalization, code documentation, abstraction, etc.. that just made writing large systems easy. But what’s more important is that they become *automatic* after a while.
It’s one thing to know what the optimal length/complexity of a method should be, if your class is getting too tightly coupled, or when a global variable is a good idea; it’s another thing entirely to have this happen unconsciously in the background and know how to deal with it effortlessly. The ability to recognize this in various toolsets (procedural, functional, object) is damned helpful too.
@Jason: I don’t think a degree (CS/CIS) or a toolkit gets you a free pass beyond this boundary.
This number can be verified empirically if you keep track of the drain clogs that wind up on your desk with a note that it needs to be faster or more general, or not to blow up three weeks into a run. I might lean toward counting semicolons, though, since languages with semi-colon statement endings are like bug lights to these folks. And like Homer Simpson putting speed holes in his car with a claw hammer, these guys love to stack up statements in one line or even better in expressions.
As a good friend lined to say, non-programmers think any damn fool can write software and then proceed to prove it.
There’s another wall at 20,000 lines. I hit that repeatedly right after college with different programs, and have seen others hit it pretty hard. I had to change a bunch of things about the way I programmed to break through, and now my wall is 200,000 lines. I was describing this effect to my friend a few weeks ago and he said that new programmers tend to brute-force problems. He didn’t mean using brute-force algorithms, but using force of will (or memory) to write code. Below 1,500 lines (and later 20,000 lines) you can do pretty much anything by just relying on your memory of what your code is doing.
Now, is this for languages like c,c++, c# or higher level languages like R, MATLAB, Python, etc..
I wonder what that number would be for higher level languages? Higher or lower?
Thanks.
Stephen
rjs: Self-trained is still trained. Even a “self-trained” programmer learns from others, maybe reading a blog post or discussion group or a book.
Compare someone who “just plays” chess with someone who knows that there are strategies, patterns, etc. People who know these strategies are trained, no matter where they learned them.
StephenL: There have been studies that suggest programmers produce about the same number of lines of code per day, independent of language. In particular, I believe these studies compared assembly language with higher-level languages.
Assuming lines of code is a measure of complexity (in the Kolmogorov’s sense, that is: the code is a description of the problem hence its length is a measure of the problem complexity) and that Norris’ number holds across languages then a few corollaries can be deduced:
1.- Untrained programmers can tackle problems in a language L1 of at most 1500 current or emph{future} lines.
2.- Any problem exceeding that complexity can be tackled by an untrained programmer using a language L2 if L2 allows the problem complexity to be defined in less lines than L1.
Jcborras, that is only true if L2 can be understood without being trained or becoming trained in the process.
The ‘experienced programmer’ leaves the project before the wall gets hit.
Jason, that’s already in place for the programmer working on an L1-solution. After all that’s why we are calling it “untrained programmer”, right? Also I do agree that while approaching to Norris’ limit one earns a bone or two hence taking off from a totally newbie state.
A more sensible criticism to my comment above is my total disregard of issues that arise in practice like the underlying L2 platform or whatever overhead L2 requires in order to provide a simpler (in lines of code) description of the same problem. I.e. a heavy run-time penalizes your capacity to solve bigdata problems.
Though an actual number as a rule of thumb had never occurred to me, the idea behind it has crossed my mind often. Just watching someone code a nice sized project that does not have the experience, or just does not think they need to do any better, leaves me cringing with what the finally “product” will be like. This includes myself and my own projects at times. Reflection and root-cause analysis should be at the front of our work daily. Attempting to be a better software developer every day is an important trait or characteristic it seems to me.
Funnily enough I’m living proof of this number. I wrote some code (R and Sweave, nothing fancy) VERY quickly and brute forced it, it came in at 1500 lines. Going back and doing any kind of maintenance was totally impossible because it was such a mess, and I have now revised the whole code base twice as I debug and add complexity, each time I simplify and simplify the actual code and add more and more complexity to what it does, and it always comes in at 1500 lines.
I am totally untrained and would love to hear any ideas anyone has about how to acquire coding skills on the job. I already picked up the Code Complete book suggestion and it’s nestling in my Amazon basket as I write.
Chris: I highly recommend Code Complete. Don’t be intimidated by its size; just take it in a little at a time. Then re-read it a year later.
Hi John, Long time no see. Anyway, to the point: I see a lot of comments in agreement, but I just wanted to say that in my experience, Clift’s estimate seems a little light – by about 2 orders of magnitude!
Seriously, I’ve worked on a case where the programmer had domain knowledge but lacked somewhat in general programming skill and had no training in the language used (C++). He was allowed to work on the project way too long (13 years) without being redirected for training. This produced bugs that were unsolved after a month or more of effort, and too many bug fixes that introduced new bugs. To me, the idea that a program undergoes a single development path for 13 years is very sad. By the time a programmer has worked on a program for 5 years, they should recognize lots of opportunity for a complete rewrite, resulting in more flexible, more compact, more robust code. This is not my original idea – I read this long ago from a well-known expert whom I can’t remember and therefore properly credit. But I support this concept fully.
One good aspect of this: I was asked in the latter days to help find and fix bugs, and this really improved my debugging skills.
@Ken if my work is anything to go by, 5 years is too long. There are rewrites i have planned for code less than 1 year old.
Regardless of the actual numbers and their solidity, the flip side of this issue is that there’s value in reducing code size.
A related issue is that if you can define good interfaces between your modules (in the API sense, the java keyword is tangential here), you can do quite a lot with your 1500 lines.
Also, speaking personally, I take a lot of pride in the code that I eliminate, especially when I wrote it. There was value in writing it but there can also be value in eliminating it.
Wow, for me the number has to be more like 25.
I think that equating “lines of code” with “productivity” is silly.
Which would you rather have?
a. removing 1000 lines of code in a fashion which implements 10 features
b. 10000 lines of code which implements a crippled version of integer addition?
Yes, counting lines of code can be silly. Nevertheless, it’s true that a lot of people run into a wall at around 1500 lines of their own code.
Ok… yes… I think I would characterize one aspect of this as “the ability to build reusable abstractions is important”.
If you can build meaningful abstractions, your 1500 line limit can be a non-issue: you write some number of lines of code which does not exceed 1500 lines, then you use the abstractions provided from that code to write a different number of lines of code which does not exceed 1500 lines, repeating until you’ve done what you need to…
(Recognizing, of course that 1500 lines was an approximate concept to begin with.)
I think Clinton P. made a similar observation.
Still, it would be nice if we had some way of putting more attention on the job of “refactoring” or “code cleaning” or “weight reduction” or “redesign” or however we might characterize the refinement of complicated systems.
I hear what you’re saying, but given that most professional programmers work in groups and often enter large projects that have already been or are well on their way to being written, it’s really hard to extract a meaningful ‘lines of code’ estimate. Even if you go by lines of code written by the individual, if the individual is part of a particulary complex and hairy project, likely the wall is hit much sooner.
SDC: Professionals may work in groups, but amateurs don’t, and that’s who Norris had in mind. They may be professionals, but not professional programmers, i.e. they may be scientists, engineers, etc.
1500 lines.
That is unfortunate seeing as, as we all know, all small projects and prototyåes are right around 5000 lines.
As we also all know, å is right next to p on Scandinavian keyboards, so prototyåes is a common mis-typing of prototype.
Do you have any data to back this up or just anecdotal experience?
I think the distinction between data and anecdote is overstated. http://www.johndcook.com/blog/2016/03/03/anecdote-and-data/