Why computer scientists count from zero

Hands counting from zero
In my previous post, cohort assignments in clinical trials, I mentioned in passing how you could calculate cohort numbers from accrual numbers if the world were simpler than it really is.

Suppose you want to treat patients in groups of 3. If you count patients and cohorts starting from 1, then patients 1, 2, and 3 are in cohort 1. Patients 4, 5, and 6 are in cohort 2. Patients 7, 8, and 9 are in cohort 3, etc. In general patient n is in cohort 1 + ⌊(n-1)/3⌋.

If you start counting patients and cohorts from 0, then patients 0, 1, and 2 are in cohort 0. Patients 3, 4, and 5 are in cohort 1. Patients 6, 7, and 8 are in cohort 2, etc. In general patient n is in cohort ⌊n/3⌋.

These kinds of calculations, common in computer science, are often simpler when you start counting from 0. If you want to divide things (patients, memory locations, etc.) into groups of size k, the nth item is in group ⌊n/k⌋. In C notation, integer division truncates to an integer and so the expression is even simpler: n/k.

Counting centuries is confusing because we count from 1. That’s why the 1900’s were the 20th century etc. If we called the century immediately following the birth of Christ the 0th century, then the 1900’s would be the 19th century.

Because computer scientists usually count from 0, most programming languages also count from zero. Fortran and Visual Basic are notable exceptions.

The vast majority of humanity finds counting from 0 unnatural and so there is a conflict between how software producers and consumers count. Demanding that average users learn to count from zero is absurd. So the programmer must either use one-based counting internally, and risk confusing his peers, or use zero-based counting internally, and risk forgetting to do a conversion for input or output. I prefer the latter. The worst option is to vacillate between the two approaches.

11 thoughts on “Why computer scientists count from zero

  1. I think a preprocessor / kludge is really bad, probably worse than vascillating. Unless that’s what you mean by vascillating. I really hated the code in Numerical Recipies for C because of the kludges they used in converting their Fortran to C. Double-plus ungood.

    In a kindergarten somewhere a child of a programmer is counting to ten by saying, “Zero, one, two, three, four, five, six, seven, eight, nine.”

  2. By vascillating I meant inconsistent approaches in the same code base. Imagine this conversation.

    “I wonder whether this index is one-based or zero based.”

    “Who wrote that part of the code?”

    “Looks like it was Sam.”

    “Oh, then it’s probably one-based.”

  3. We don’t count from zero, we index from zero. The first item, has an index of zero in our little corner of the world. So item one is referred to as item[0]. It was a long time before humans had a concept of zero as a number (around the 4th century BC if my history serves me. I don’t think it was really codified until around 600 AD.)

  4. We computer scientists also describe intervals using half-open notation, starting at the start, but ending one before the end. This makes loops simple, as in this loop over [0,k), that is, from 0 inclusive to k exclusive:

    for (int i = 0; i < k; i++) { }

    We tend to index strings and compute substrings the same way, with "abcde".substring(0,3) being the same as "abc". But to keep you on your toes, we sometimes use start+length rather than start+end.

    Set theorists do the same thing, implicitly starting from zero (coded as the empty set in logical theories of arithmetic) and indexing summations with Σi < k. Because zero is the smallest natural number they code, they get away without even writing the zero!

  5. Mathematicians are fairly consistent with using half-open intervals too. It’s most common to start indexing from 1 in math, but it varies by context. Matrix element subscripts, for example, start from 1, but power series coefficients start from 0.

  6. if you treated item 1 as index 1 (and not index 0) you could do lots of cool things like refer to an array with index 0. Then you could do

    for (array++)
    … check array type, whether it’s empty, etc.
    … then do something with each iteration

  7. Sir I want to know that what special difference 0 has brought to counting.

    because normally the way of understanding a number is by base count

    as in base -10 11 represents eleven but in base-4 it represents 5 as as base means number of symbols we have for counting including 0.

    like 0 zero 10 – four
    1 one 11- five
    2 two 12- six
    3 three 13- seven and so on…….

    and a number is computed as

    i.e. 33 means (3 * 4) + 3 means fifteen.

    So a digit might me thought as (take ‘&’ unit quantity)

    & &
    & ..three *(multiply) & ….four +(add) .. (some digit from 1,2,3)
    & &
    &
    means every ‘&’ in L.H.S will be replaced by 4 digits(from L.H.S).

    But suppose 0 is not there then a base-3 number will be represented by 1,2,3
    known symbols i.e four will be represented by 11. What difference or problem this will bring in counting ????

    Please do reply sir…..

  8. Michael Pohoreski

    As a programmer counting from zero seems natural to me because we are counting *relative*. C’s idiom that arrays are pointers are just two different _perspectives_ lends itself quite well to a natural Mathematical mapping.

    Most people are used to counting *absolutely* so they find the whole “start from zero thing” introduces all sorts of “fence post” bugs.

    If kids were taught counting from zero instead of 1 they wouldn’t find the concept so foreign. It is only “absurd” because people don’t see the strengths and weaknesses of the _two_ systems: relative vs absolute.

  9. I agree that one based counting is more natural to most people (and that zero based counting is more covenient when doing calculations). However there may be an exception where zero based counting is more natural (at least for Europeans): floor numbering. In most European countries the ground floor is level 0 (including the UK), which seems natural especially when there are basement floors (-1, -2).

  10. I think the way many programmers count from zero is fundamentally flawed. The reason that zero is included in counting with programming etc is for things such as counting how many times a process might go through a loop. i.e. if its a for loop you may well go through zero times. As for starting to count from zero its silly and invaluable. If I’m counting CPU’s the first CPU is not CPU 0, its obviously CPU 1. But its intrenched in programmers that ‘you must start counting from zero’.

Comments are closed.