My friend Clift Norris has identified a fundamental constant that I call Norris’ number, the average amount of code an untrained programmer can write before he or she hits a wall. Clift estimates this as 1,500 lines. Beyond that the code becomes so tangled that the author cannot debug or modify it without herculean effort.
Related posts:
Writes large correct programs
Little programs versus big programs
Experienced programmers and lines of code


{ 1 trackback }
{ 25 comments… read them below or add one }
Christopher Allen-Poole 11.22.11 at 08:16
So, is the answer giving the novices smaller projects?
John 11.22.11 at 08:21
The problem isn’t novices, people who intend to be professional programmers but who lack experience. The problem is non-programmers who nevertheless write code. The latter know enough to get something to work, sorta, but they are unaware of strategies to manage complexity.
Christopher Allen-Poole 11.22.11 at 08:25
Isn’t a novice definitionally someone who lacks experience? I’ve (we’ve all) met people who knew their DS&A (and otherwise “trained” in a traditional sense) but couldn’t code their way out of a paper bag.
Alexander Bogomolny 11.22.11 at 08:29
To make the definition workable, if not more exact, you have to define what effort is herculean.
Alex Chan 11.22.11 at 08:46
Don’t we also need to define what a reasonable line length is? I know many novices who cram far more onto a single line that was necessary or sane, all in aid of keeping the line count down. 1,500 lines of code can vary wildly.
John 11.22.11 at 08:52
This is meant to be humorous, not scientific. However, there is a vaguely quantifiable kernel of truth in there, that there is a remarkable consistency in how much an untrained programmer can write before hitting a wall.
Christopher Allen-Poole 11.22.11 at 08:55
@John There is a major kernel of truth, almost at Mythical Man-Month level.
Jason Fruit 11.22.11 at 09:08
What counts as training for you and Mr. Norris? I’ve been programming for pay for a decade now with no formal training, and at the beginning what you describe happened for me at about 500 lines. (I didn’t realize at first that variables could be local.) After a decade of coding, working with good coders, and reading everything I can, it starts to get hairy at about 20,000 — but there are still areas in which I’m completely untrained that a computer science education would have covered in the first year. On the other hand, I’ve known experienced “software engineers” with not CS, but CIS degrees who couldn’t write a piece of comprehensible code over 1500 lines.
So what kind of training is relevant here, in your view? From my anecdotes, experience and education even in combination aren’t enough; what’s needed is the combination of native aptitude, education, and experience.
Alex 11.22.11 at 09:13
So, is there any good reference for such training or is it only a matter of experience?
Joost Helberg 11.22.11 at 10:22
Interesting thought. Is the other way around also true? Once you’ve managed a 1500+ lines of code working program, are you a trained programmer then?
John 11.22.11 at 10:39
Jason: What constitutes training? Not necessarily formal training. Learning from the wisdom of others, wherever you discover the information. Even just being aware that there are techniques for programming is a huge start, knowing that there is something to learn from others.
rjs 11.22.11 at 11:03
This is silly. I was an ‘untrained programmer’ (read: no CS degree) for many years before pursuing a MS. I was capable of writing large programs that were maintainable. And to add to that, the greatest programmer I’ve ever had the pleasure to work with never had one day of formal training. These types of generalizations only serve to discourage those who are passionate but lack the background.
Clinton P 11.22.11 at 11:23
What an excellent summation of an observation I’ve noted for years. And even the number itself seems reasonable for certain languages/toolsets.
For years as a larval (largely self-taught) programmer I hit this wall a lot. And then in the middle of the 90’s it just vanished. That “training” kicked in, I guess. I stumbled on the right mixtures of compartmentalization, code documentation, abstraction, etc.. that just made writing large systems easy. But what’s more important is that they become *automatic* after a while.
It’s one thing to know what the optimal length/complexity of a method should be, if your class is getting too tightly coupled, or when a global variable is a good idea; it’s another thing entirely to have this happen unconsciously in the background and know how to deal with it effortlessly. The ability to recognize this in various toolsets (procedural, functional, object) is damned helpful too.
@Jason: I don’t think a degree (CS/CIS) or a toolkit gets you a free pass beyond this boundary.
John Venier 11.22.11 at 11:58
This number can be verified empirically if you keep track of the drain clogs that wind up on your desk with a note that it needs to be faster or more general, or not to blow up three weeks into a run. I might lean toward counting semicolons, though, since languages with semi-colon statement endings are like bug lights to these folks. And like Homer Simpson putting speed holes in his car with a claw hammer, these guys love to stack up statements in one line or even better in expressions.
As a good friend lined to say, non-programmers think any damn fool can write software and then proceed to prove it.
Lawrence Kesteloot 11.22.11 at 12:03
There’s another wall at 20,000 lines. I hit that repeatedly right after college with different programs, and have seen others hit it pretty hard. I had to change a bunch of things about the way I programmed to break through, and now my wall is 200,000 lines. I was describing this effect to my friend a few weeks ago and he said that new programmers tend to brute-force problems. He didn’t mean using brute-force algorithms, but using force of will (or memory) to write code. Below 1,500 lines (and later 20,000 lines) you can do pretty much anything by just relying on your memory of what your code is doing.
StephenL 11.22.11 at 12:27
Now, is this for languages like c,c++, c# or higher level languages like R, MATLAB, Python, etc..
I wonder what that number would be for higher level languages? Higher or lower?
Thanks.
Stephen
John 11.22.11 at 12:32
rjs: Self-trained is still trained. Even a “self-trained” programmer learns from others, maybe reading a blog post or discussion group or a book.
Compare someone who “just plays” chess with someone who knows that there are strategies, patterns, etc. People who know these strategies are trained, no matter where they learned them.
StephenL: There have been studies that suggest programmers produce about the same number of lines of code per day, independent of language. In particular, I believe these studies compared assembly language with higher-level languages.
jcborras 11.22.11 at 14:22
Assuming lines of code is a measure of complexity (in the Kolmogorov’s sense, that is: the code is a description of the problem hence its length is a measure of the problem complexity) and that Norris’ number holds across languages then a few corollaries can be deduced:
1.- Untrained programmers can tackle problems in a language L1 of at most 1500 current or \emph{future} lines.
2.- Any problem exceeding that complexity can be tackled by an untrained programmer using a language L2 if L2 allows the problem complexity to be defined in less lines than L1.
Jason Fruit 11.23.11 at 21:41
Jcborras, that is only true if L2 can be understood without being trained or becoming trained in the process.
Jon 11.24.11 at 10:46
The ‘experienced programmer’ leaves the project before the wall gets hit.
jcborras 11.25.11 at 02:58
Jason, that’s already in place for the programmer working on an L1-solution. After all that’s why we are calling it “untrained programmer”, right? Also I do agree that while approaching to Norris’ limit one earns a bone or two hence taking off from a totally newbie state.
A more sensible criticism to my comment above is my total disregard of issues that arise in practice like the underlying L2 platform or whatever overhead L2 requires in order to provide a simpler (in lines of code) description of the same problem. I.e. a heavy run-time penalizes your capacity to solve bigdata problems.
Dr. Bubba 11.25.11 at 16:33
Though an actual number as a rule of thumb had never occurred to me, the idea behind it has crossed my mind often. Just watching someone code a nice sized project that does not have the experience, or just does not think they need to do any better, leaves me cringing with what the finally “product” will be like. This includes myself and my own projects at times. Reflection and root-cause analysis should be at the front of our work daily. Attempting to be a better software developer every day is an important trait or characteristic it seems to me.
Chris Beeley 11.26.11 at 02:45
Funnily enough I’m living proof of this number. I wrote some code (R and Sweave, nothing fancy) VERY quickly and brute forced it, it came in at 1500 lines. Going back and doing any kind of maintenance was totally impossible because it was such a mess, and I have now revised the whole code base twice as I debug and add complexity, each time I simplify and simplify the actual code and add more and more complexity to what it does, and it always comes in at 1500 lines.
I am totally untrained and would love to hear any ideas anyone has about how to acquire coding skills on the job. I already picked up the Code Complete book suggestion and it’s nestling in my Amazon basket as I write.
John 11.26.11 at 04:28
Chris: I highly recommend Code Complete. Don’t be intimidated by its size; just take it in a little at a time. Then re-read it a year later.
Ken Hagin 11.26.11 at 11:46
Hi John, Long time no see. Anyway, to the point: I see a lot of comments in agreement, but I just wanted to say that in my experience, Clift’s estimate seems a little light – by about 2 orders of magnitude!
Seriously, I’ve worked on a case where the programmer had domain knowledge but lacked somewhat in general programming skill and had no training in the language used (C++). He was allowed to work on the project way too long (13 years) without being redirected for training. This produced bugs that were unsolved after a month or more of effort, and too many bug fixes that introduced new bugs. To me, the idea that a program undergoes a single development path for 13 years is very sad. By the time a programmer has worked on a program for 5 years, they should recognize lots of opportunity for a complete rewrite, resulting in more flexible, more compact, more robust code. This is not my original idea – I read this long ago from a well-known expert whom I can’t remember and therefore properly credit. But I support this concept fully.
One good aspect of this: I was asked in the latter days to help find and fix bugs, and this really improved my debugging skills.