Leading zeros

The confusion between numbers such as 7 and 007 comes up everywhere. We know they’re different—James Bond isn’t Agent 7—and yet the distinction isn’t quite trivial.

How should software handle the two kinds of numbers? The answer isn’t as simple as “Do what the user expects” because different users have different expectations.

Excel

If you type 007 into Excel, by default the software will respond as if to say “Got it. Seven.” If you configure a cell to be text, then it will retain the leading zeros. Many people find this surprising, myself included.

But you can be sure that Microsoft has good reasons for the default behaviors it chooses. These are often business reasons rather than technical reasons. Microsoft wants to please the majority of its user base, not tech wizards. Not only are wizards an unprofitable minority, wizards can take care of themselves.

Zip codes

Someone relayed the following conversation to me recently.

“It took me longer than I thought, but I got the zip codes wrangled.”

“Leading zeros trip you up?”

“Yeah, how did you guess?”

“This isn’t my 01st rodeo.”

I’ve run into this, as has almost everyone who has ever worked with zip codes. The Boston zip code 02134 is not the number 2,134.

Octal

In Perl the expression (02134 > 2000) evaluates to false. That is because in some software, including the perl interpreter, a leading zero indicates that a number is written in octal, i.e. base 8. So 02134 represents 2134eight = 1116ten, which is less than 2000ten.

Update: I’d forgotten that C acts the same way until Wayne reminded me in the comments.  I don’t think I’ve ever (deliberately) used that feature in C.

Dates

I’m an American, and I use American-style dates in public correspondence. But privately I use YYYY-MM-DD dates so that dates always sort as intended, regardless of whether a particular piece of software interprets these symbols as numbers, text, or dates.

Computer science versus software engineering

From a computer science perspective, the root of the problem is not being explicit about data types. In computer science lingo, 7 and 2134 are integers, while 007 and 02134 are “words” built on the “alphabet” consisting of the digits 0 through 9. Integers and words have different data types. Furthermore, 007 and 02134 are not just words but representations of different data types: one is a serial number and the other is a postal code. And neither is not an octal number.

Objects of different data types have may have similar text representations, but these representations are to be interpreted differently. And they have different sort orders, which may not correspond to their sort order as text. End of discussion.

This is fine for computer science, but it doesn’t address the software engineering problem of meeting user expectations. It will not do to say “Just make the user specify his types.” The average user doesn’t know what that means.

So what do you do? The software could make educated guesses, but then what? Ask the user for confirmation that the software guessed correctly? Or presume the guess was correct but provide a way to fix the assumption in case it was not? Demand that the user be more specific? The solution depends on context.

Even if you want to meet the expectations of a particular group, such as Excel users or Perl programmers, those expectations may evolve over time. We expect different behavior from software than we did a generation ago. But we also expect backward compatibility! So even within an individual you have conflicting expectations. There is no simple solution, even for such a simple problem of how to handle leading zeros.

6 thoughts on “Leading zeros

  1. “Not only are wizards an unprofitable minority, wizards can take care of themselves.”
    Nice!

    On similar lines, I’ve said, paraphrased, “[A wizard’s] entire life is an edge-case”

  2. Another fun one: Telephone numbers. Formally, all british area codes start with an 0, so any collection of british telephone numbers will be full of leading zeroes.

  3. In C/C++, the TIOBE Index #2 and #3 top languages (and also, for that matter, arithmetic in bash shell), prepending a leading zero makes the number octal. Seems like a big violation of the principle of least surprise–we learn in grade school (at least I did) that prepending a leading zero does not change the value of a number. Thankfully, recent versions of Python disallow leading zeros for integers unless there is explicit 0x, 0o or 0b. So at least you get an error instead of silently unexpected behavior.

  4. Thanks. I’d forgotten that C/C++ did that. Seems in character for Perl, and maybe K&R C, but not modern C or C++.

  5. > I don’t think I’ve ever (deliberately) used that feature in C.

    Neither have I, but I’ve been burned by it a couple of times, both when trying to be cute by adding leading zeros to numbers in order to make them line up nicely in the editor.

    After reading your first paragraph my immediate thought was “I hope he covers numbers with leading zeros being interpreted as octal” (my hope was fulfilled, so thank you for that!). The fact that I thought this encourages me, because it suggests that having made the same mistake twice, I shall not make it a third time.

Comments are closed.