Why computer scientists count from zero
In my previous post, cohort assignments in clinical trials, I mentioned in passing how you could calculate cohort numbers from accrual numbers if the world were simpler than it really is.
Suppose you want to treat patients in groups of 3. If you count patients and cohorts starting from 1, then patients 1, 2, and 3 are in cohort 1. Patients 4, 5, and 6 are in cohort 2. Patients 7, 8, and 9 are in cohort 3, etc. In general patient n is in cohort 1 + ⌊(n-1)/3⌋.
If you start counting patients and cohorts from 0, then patients 0, 1, and 2 are in cohort 0. Patients 3, 4, and 5 are in cohort 1. Patients 6, 7, and 8 are in cohort 2, etc. In general patient n is in cohort ⌊n/3⌋.
These kinds of calculations, common in computer science, are often simpler when you start counting from 0. If you want to divide things (patients, memory locations, etc.) into groups of size k, the nth item is in group ⌊n/k⌋. In C notation, integer division truncates to an integer and so the expression is even simpler: n/k.
Counting centuries is confusing because we count from 1. That’s why the 1900’s were the 20th century etc. If we called the century immediately following the birth of Christ the 0th century, then the 1900’s would be the 19th century.
Because computer scientists usually count from 0, most programming languages also count from zero. Fortran and Visual Basic are notable exceptions.
The vast majority of humanity finds counting from 0 unnatural and so there is a conflict between how software producers and consumers count. Demanding that average users learn to count from zero is absurd. So the programmer must either use one-based counting internally, and risk confusing his peers, or use zero-based counting internally, and risk forgetting to do a conversion for input or output. I prefer the latter. The worst option is to vascillate between the two approaches.
Tags: Programming
June 26th, 2008 at 09:42
I think a preprocessor / kludge is really bad, probably worse than vascillating. Unless that’s what you mean by vascillating. I really hated the code in Numerical Recipies for C because of the kludges they used in converting their Fortran to C. Double-plus ungood.
In a kindergarten somewhere a child of a programmer is counting to ten by saying, “Zero, one, two, three, four, five, six, seven, eight, nine.”
June 26th, 2008 at 10:01
By vascillating I meant inconsistent approaches in the same code base. Imagine this conversation.
“I wonder whether this index is one-based or zero based.”
“Who wrote that part of the code?”
“Looks like it was Sam.”
“Oh, then it’s probably one-based.”
June 26th, 2008 at 18:54
We don’t count from zero, we index from zero. The first item, has an index of zero in our little corner of the world. So item one is referred to as item[0]. It was a long time before humans had a concept of zero as a number (around the 4th century BC if my history serves me. I don’t think it was really codified until around 600 AD.)
June 30th, 2008 at 14:41
We computer scientists also describe intervals using half-open notation, starting at the start, but ending one before the end. This makes loops simple, as in this loop over [0,k), that is, from 0 inclusive to k exclusive:
for (int i = 0; i < k; i++) { }We tend to index strings and compute substrings the same way, with
"abcde".substring(0,3)being the same as"abc". But to keep you on your toes, we sometimes use start+length rather than start+end.Set theorists do the same thing, implicitly starting from zero (coded as the empty set in logical theories of arithmetic) and indexing summations with
Σi < k. Because zero is the smallest natural number they code, they get away without even writing the zero!June 30th, 2008 at 16:59
Mathematicians are fairly consistent with using half-open intervals too. It’s most common to start indexing from 1 in math, but it varies by context. Matrix element subscripts, for example, start from 1, but power series coefficients start from 0.