The confusion between numbers such as 7 and 007 comes up everywhere. We know they’re different—James Bond isn’t Agent 7—and yet the distinction isn’t quite trivial.
How should software handle the two kinds of numbers? The answer isn’t as simple as “Do what the user expects” because different users have different expectations.
If you type 007 into Excel, by default the software will respond as if to say “Got it. Seven.” If you configure a cell to be text, then it will retain the leading zeros. Many people find this surprising, myself included.
But you can be sure that Microsoft has good reasons for the default behaviors it chooses. These are often business reasons rather than technical reasons. Microsoft wants to please the majority of its user base, not tech wizards. Not only are wizards an unprofitable minority, wizards can take care of themselves.
Someone relayed the following conversation to me recently.
“It took me longer than I thought, but I got the zip codes wrangled.”
“Leading zeros trip you up?”
“Yeah, how did you guess?”
“This isn’t my 01st rodeo.”
I’ve run into this, as has almost everyone who has ever worked with zip codes. The Boston zip code 02134 is not the number 2,134.
In Perl the expression
(02134 > 2000) evaluates to false. That is because in some software, including the perl interpreter, a leading zero indicates that a number is written in octal, i.e. base 8. So 02134 represents 2134eight = 1116ten, which is less than 2000ten.
Update: I’d forgotten that C acts the same way until Wayne reminded me in the comments. I don’t think I’ve ever (deliberately) used that feature in C.
I’m an American, and I use American-style dates in public correspondence. But privately I use YYYY-MM-DD dates so that dates always sort as intended, regardless of whether a particular piece of software interprets these symbols as numbers, text, or dates.
Computer science versus software engineering
From a computer science perspective, the root of the problem is not being explicit about data types. In computer science lingo, 7 and 2134 are integers, while 007 and 02134 are “words” built on the “alphabet” consisting of the digits 0 through 9. Integers and words have different data types. Furthermore, 007 and 02134 are not just words but representations of different data types: one is a serial number and the other is a postal code. And neither is not an octal number.
Objects of different data types have may have similar text representations, but these representations are to be interpreted differently. And they have different sort orders, which may not correspond to their sort order as text. End of discussion.
This is fine for computer science, but it doesn’t address the software engineering problem of meeting user expectations. It will not do to say “Just make the user specify his types.” The average user doesn’t know what that means.
So what do you do? The software could make educated guesses, but then what? Ask the user for confirmation that the software guessed correctly? Or presume the guess was correct but provide a way to fix the assumption in case it was not? Demand that the user be more specific? The solution depends on context.
Even if you want to meet the expectations of a particular group, such as Excel users or Perl programmers, those expectations may evolve over time. We expect different behavior from software than we did a generation ago. But we also expect backward compatibility! So even within an individual you have conflicting expectations. There is no simple solution, even for such a simple problem of how to handle leading zeros.