From the category archives:

Computing

Worthless technical books

by John on October 20, 2009

I sold six technical books to a used book store on the way home today. The store paid me $5 total for four of the books. Two books they didn’t want at all. The books were not that old, but they were practically worthless.

It’s sobering to think how little a technical book is worth a few years after it is printed. It’s a good reminder to focus on things that will last.

Related posts:

Old math books
C. S. Lewis on reading old books
Mathematica turns 20

{ 7 comments }

Glynn Foster from Sun talks about OpenSolaris on FLOSS Weekly episode 75. After explaining how Solaris has always been a robust, scalable operating system, Foster brags that now on a Toshiba laptop with OpenSolaris pre-installed ” … the volume works, and the keys work…” Then host Jono Bacon laughs “The keys work?!” The dialog starts at about 23:30 into the podcast.

The other host, Leo Laporte, mumbled “so cool” after Glynn Foster says “the volume works” and apparently would have let him get away with saying “the keys work.” But Jono Bacon is the community manager for Ubuntu, a Linux distribution that cares more about whether the volume and keyboard work than whether the OS scales.

It was amusing to listen to Glynn Foster and Jono Bacon personify their respective operating system’s priorities, server performance for Solaris and desktop experience for Ubuntu. Foster says that OpenSolaris used to be a royal pain to install and configure but now it has gotten much better. I don’t know how well Ubuntu scales — I imagine it’s not nearly as scalable as OpenSolaris — but it was designed from the beginning to be easy to install.

{ 2 comments }

Three rules of thumb

by John on July 1, 2009

Here are three rules of thumb for back-of-the-envelope estimates:

  1. Duff’s rule: Pi seconds is a nanocentury.
  2. Hopper’s rule: Light travels one foot in a nanosecond.
  3. Rule of 72: An investment at n% interest will double in 72/n years.

How might you use these? How accurate are they?

Duff’s rule comes in handy when converting from times measured in seconds to times measured on calendars. This may not sound useful, but it often happens in software. For example, if a task takes a second to complete, how long would it take to do it a billion times? Well, a billion seconds, obviously. But how long is that in familiar terms? Duff’s rule says a century is about 3.14 billion seconds, so a billion seconds would be something like 30 years.

How accurate is Duff’s rule? A year is 31,536,000 seconds, whereas Duff’s rule would estimate 31,415,927 seconds, so it underestimates the number of seconds in a year by about 0.4%.

Hopper’s rule is useful in electrical engineering. For example, you might need to know how long it would take a radio signal to travel between a transmitter and receiver. Hopper’s rule can explain why computer chip clock rates are not increasing. Electrical signals travel at some fraction of the speed of light, and current chip designs are limited by whether a signal can move across the chip during a clock cycle.

How accurate is Hopper’s rule? Light travels 299,792,458 meters per second. That corresponds to 0.983 feet per nanosecond, so Hopper’s rule overestimates by about 1.7%.

Here is a terrific video of Grace Hopper explaining Hopper’s rule to David Letterman, around 4:25. (Thanks Bill!)

The Rule of 72 is obviously useful in financial estimates. For example, $1000 invested at 6% interest will become $2000 in 72/6 = 12 years.

How accurate is the rule of 72? The value of an initial investment P at time t with under a continuous interest rate r is P exp(rt). Solving exp(rt) = 2 for t gives t = log 2 / r. If we express r as a percentage, we have to multiply t by 100. This says that for continuously compounded interest, the rule of 72 would be exact if “72″ were replaced with 100 log 2 = 69.3. So for continuous interest, the rule overestimates the doubling time by 0.72/log 2 or about 4%. So why use 72 rather than 69.3? There are two reasons. First, 72 is easy to work with mentally since it is divisible by lots of small integers. Second, interest is often compounded periodically — say annually or monthly — rather than continuously.

The doubling time is longer for investments with periodic interest rather than continuous interest. The overestimate from using 72 rather than 69.3 is partially canceled out by the accounting for the longer doubling time for periodic compounding and so 72 may work better than 69.3. Exactly how accurate the rule of 72 is for periodically compounded interest depends on the interest rate and the compounding period.

Related post:

Bancroft’s rule (rule of thumb for estimating linear regression)

{ 4 comments }

Backup and recovery

by John on June 18, 2009

Paul Randal had the following to say about database backup and recovery in his interview with .NET Rocks.

Don’t ever, ever plan a backup strategy. Plan a restore strategy.

The point of a backup is to be able to restore data when necessary. That sounds obvious, but it’s easy to lose site of. Too many companies backup their data, or think that they back up their data, but don’t have much of a recovery plan. Or they have a recovery plan but they haven’t rehearsed it. Until you rehearse your recovery plan, you don’t know whether it’s going to work, you don’t know how long it’s going to take, and you don’t know whether you need to invest in hardware to speed it up.

{ 4 comments }

The Unix Programming Environment

by John on June 15, 2009

Joel Spolsky recommends the following books to self-taught programmers who apply to his company and need to fill in some gaps in their training.

  1. Structure and Interpretation of Computer Programs
  2. The C Programming Language
  3. The Unix Programming Environment
  4. Introduction to Algorithms

The one that has me scratching my head is The Unix Programming Environment, first published in 1984. After listening to Joel’s podcast, I thumbed through my old copy of the book and thought “Man, I could never work like this.” Of course I could work like that, because I did, back around 1990. But the world has really changed since then.

I appreciate history and old books. I see the value in learning things you might not directly apply. But imagine telling twentysomething applicants to go read an operating system book that was written before they were born. Most would probably think you’re insane.

{ 7 comments }

Upcoming Y2K-like problems

by John on June 13, 2009

The world’s computer systems kept working on January 1, 2000 thanks to billions of dollars spent on fixing old software. Two wrong conclusions to draw from Y2K are

  1. The programmers responsible for Y2K bugs were losers.
  2. That’s all behind us now.

The programmers who wrote the Y2K bugs were highly successful: their software lasted longer than anyone imagined it would. The two-digit dates were only a problem because their software was still in use decades later. (OK, some programmers were still writing Y2K bugs as late as 1999, but I’m thinking about COBOL programmers from the 1970’s.)

Y2K may be behind us, but we will be facing Y2K-like problems for years to come. Twitter just faced a Y2K-like problem last night, the so called Twitpocalypse. Twitter messages were indexed with a signed 32-bit integer. That means the original software was implicitly designed with a limit of around two billion messages. Like the COBOL programmers mentioned above, Twitter was more successful than anticipated. Twitter fixed the problem without any disruption, except that some third party Twitter clients need to be updated.

We are running out of Internet addresses because these addresses also use 32-bit integers. To make matters worse, an Internet address has an internal structure that greatly reduces the number of possible 32-bit addresses. IPv6 will fix this by using 128-bit addresses.

The US will run out of 10-digit phone numbers at some point, especially since not all 10-digit combinations are possible phone numbers. For example, the first three digits are a geographical area code. One area code can run out of 7-digit numbers while another has numbers left over.

At some point the US will run out of 9-digit social security numbers.

The original Unix systems counted time as the number of seconds since January 1, 1970, stored in a signed 32-bit integer. On January 19, 2038, the number of seconds will exceed the capacity of such an integer and the time will roll over to zero, i.e. it will be January 1, 1970 again. This is more insidious than the Y2K problem because there are many software date representations in common use, including the old Unix method. Some (parts of) software will have problems in 2038 while others will not, depending on the whim of the programmer when picking a way to represent dates.

There will always be Y2K-like problems. Computers are finite. Programmers have to guess at limitations for data. Sometimes these limitations are implicit, and so we can pretend they are not there, but they are. Sometimes programmers guess wrong because their software succeeds beyond their expectations.

{ 3 comments }

OS ecosystems

by John on May 29, 2009

Colin Howe wrote an interesting article last week comparing the Windows and Ubuntu worlds, not the operating systems per se. Feature-by-feature comparisons of operating systems are not that helpful. Contemporary operating systems have a lot in common in their details, but they create very different ecosystems. These ecosystem differences are not apparent at first, but in the long run they dominate the experience of using an operating system.

You can run a lot of the same software across different operating systems, but using software that wasn’t originally designed with your OS in mind can be like importing an invasive species. It may work at first but cause you grief over time when it doesn’t play well with others.

Related posts:

Negative space in operating systems
Would you rather have a chauffeur or a Ferrari?
Pepsi challenge for Windows Vista
Introduction to Mac for Windows developers
Moore’s law and software bloat

{ 0 comments }

Dan Bricklin commented in a recent interview on how the expectations of computers from science fiction have not panned out. The point is not that computers are more or less powerful than expected, but that we have wanted to put computers to different uses than expected.

photo of red Ferrari

Fictional computers such as the HAL 9000 from 2001: A Space Odyssey were envisioned as chauffeurs. You tell the computer what to do and then go along passively for the ride. Bricklin says it looks like people would rather have a Ferrari than a chauffeur. We want our computers to be powerful tools, but we want to be actively involved in using them.

I’d refine that to say we either want to actively use our computers, or we want them to be invisible. Maybe there’s an uncanny valley between these extremes. Most people are blissfully ignorant of the computers embedded in their cars, thermostats, etc. But they don’t want some weird HAL 9000-Clippy hybrid saying “Dave, it looks like you’re updating your résumé. I’ll take care of that for you.”

{ 9 comments }

Rules for computing happiness

by John on May 4, 2009

Some time ago I ran across a blog post Al3x’s rules for computing happiness by Alex Payne. I agree with the spirit of the list, though I disagree at least to some extent with most of the points. It seems to me that the underlying idea of the list is to set some boundaries on how you use your computer. Instead of just asking the easiest way to accomplish the immediate task, think of longer term (unintended) consequences.

Here’s Alex’s first rule:

Use as little software as possible.

You could interpret the first rule at least a couple ways. First, don’t use software when a low-tech solution works as well or better. Second, don’t buy or download hundreds of different applications. Learn how to use a few applications well. I agree with both interpretations.

Here are the second and third rules.

Use software that does one thing well. Do not use software that does many things poorly.

If that means having hundreds of little applications, then there’s a tension with the first rule. I suppose it matters how you define a “thing.” If your “thing” is broad enough, such as for example editing images, then there’s no conflict. I don’t think Alex would suggest using thousands of little utilities for image editing rather than using a package like Photoshop or GIMP. I imagine he’s referring to overly ambitious applications, such as software that tries to be word processor, email client, Lisp interpreter, floor wax, and dessert topping.

Here are a few more of the rules I appreciate.

  • Use a plain text editor that you know well.  Not a word processor, a plain text editor.
  • Keep as much as possible in plain text. Not Word or Pages documents, plain text.
  • Pay for software that’s worth paying for, but only after evaluating it for no less than two weeks.
  • Buy as large an external display as you can afford if you’ll be working on the computer for more than three hours at a time.

The emphasis on plain text files may seem reactionary, but there are still numerous advantages to plain text. Word has its advantages as well. Choose wisely.

I particularly like his advice to pay for software that’s worth paying for. I understand the attraction of software that is “free as in beer,” especially at work. Even though the cost of commercial software doesn’t come out of my pocket, the bureaucratic hassle and delay of corporate purchasing make free software more attractive.  But some free software gives a false economy because the software is difficult to use. The software may be free up front, but there’s an opportunity cost for using it, a tax you pay as long as you use it.

Related posts:

Living within chosen limits
What’s wrong with paper?
Selective use of technology
One program to rule them all

{ 1 comment }

Anatomy of a floating point number

by John on April 6, 2009

In my previous post, I explained that floating point numbers are a leaky abstraction. Often you can pretend that they are mathematical real numbers, but sometimes you cannot. This post peels back the abstraction and explains exactly what a floating point number is. (Technically, this post describes an IEEE 754 double precision floating point number, by far the most common kind of floating point number in practice.)

A floating point number has 64 bits that encode a number of the form ± p × 2e. The first bit encodes the sign, 0 for positive numbers and 1 for negative numbers. The next 11 bits encode the exponent e, and the last 52 bits encode the precision p. The encoding of the exponent and precision require some explanation.

The exponent is stored with a bias of 1023. That is, positive and negative exponents are all stored in a single positive number by storing e + 1023 rather than storing e directly. Eleven bits can represent integers from 0 up to 2047. Subtracting the bias, this corresponds to values of e from -1023 to +1024. Define emin = -1022 and emax = +1023. The values emin – 1 and emax + 1 are reserved for special use. More on that below.

Floating point numbers are typically stored in normalized form. In base 10, a number is in normalized scientific notation if the significand is ≥ 1 and < 10. For example, 3.14 × 102 is in normalized form, but 0.314 × 103 and 31.4 × 102 are not. In general, a number in base β is in normalized form if it is of the form p × βe where 1 ≤ p < β. This says that for binary, i.e. β = 2, the first bit of the significand of a normalized number is always 1. Since this bit never changes, it doesn’t need to be stored. Therefore we can express 53 bits of precision in 52 bits of storage. Instead of storing the significand directly, we store f, the fractional part, where the significand is of the form 1.f.

The scheme above does not explain how to store 0. Its impossible to specify values of f and e so that 1.f × 2e = 0. The floating point format makes an exception to the rules stated above. When e = emin – 1 and f = 0, the bits are interpreted as 0. When e = emin – 1 and f ≠ 0, the result is a denormalized number. The bits are interpreted as 0.f × 2emin. In short, the special exponent reserved below emin is used to represent 0 and denormalized floating point numbers.

The special exponent reserved above emax is used to represent ∞ and NaN. If e = emax + 1 and f = 0, the bits are interpreted as ∞. But if e = emax + 1 and f ≠ 0, the bits are interpreted as a NaN or “not a number.” See IEEE floating point exceptions for more information about ∞ and NaN.

Since the largest exponent is 1023 and the largest significant is 1.f where f has 52 ones, the largest floating point number is 21023(2 – 2-52) = 21024 – 2971 ≈ 21024 ≈ 1.8 × 10308. In C, this constant is defined as DBL_MAX, defined in <float.h>.

Since the smallest exponent is -1022, the smallest positive normalized number is 1.0 × 2-1022 ≈ 2.2 × 10-308. In C, this is defined as DBL_MIN. However, it is not the smallest positive number representable as a floating point number, only the smallest normalized floating point number. Smaller numbers can be expressed in denormalized form, albeit at a loss of significance. The smallest denormalized positive number occurs with f has 51 0’s followed by a single 1. This corresponds to 2-52*2-1022 = 2-1074 ≈ 4.9 × 10-324. Attempts to represent any smaller number must underflow to zero.

C gives the name DBL_EPSILON to the smallest positive number ε such that 1 + ε ≠ 1 to machine precision. Since the significant has 52 bits, it’s clear that DBL_EPSILON = 2-52 ≈ 2.2 × 10-16. That is why we say a floating point number has between 15 and 16 significant (decimal) figures.

For more details see What Every Computer Scientist Should Know About Floating-Point Arithmetic.

First post in this series: Floating point numbers are a leaky abstraction

{ 8 comments }

Joel Spolsky coined the term leaky abstraction for programming concepts that usually shield you from messy details but sometimes break down. A perfect abstraction is a black box that you never have to open. A leaky abstraction is a black box that you have to open up occasionally.

Floating point numbers, the computer representations of real numbers,  are leaky abstractions. They work remarkably well: you can usually pretend that a floating point type is a mathematical real number. But sometimes you can’t. The abstraction leaks, though not very often.

Most explanations I’ve heard for the limitations of machine numbers are pedantic. “There are only a finite number of floating point numbers so they can’t represent real numbers well.” That’s not much help. It doesn’t explain why floating point numbers actually do represent real numbers sufficiently well for most applications, and it doesn’t suggest where the abstraction might leak.

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10308, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton’s Law of Gravity is only known to four significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton’s gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations — addition, multiplication, division — are very accurate. As long as you don’t overflow or underflow,  these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate.  If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations. For an example, see this post on calculating standard deviation. See also computing derivatives, the second example in  Five Tips for Floating Point Programming.

What about overflow or underflow? When do you need numbers bigger than 10308? Often you don’t. But in probability calculations, for example, you need them all the time unless you’re clever. It’s common in probability to compute a medium-sized number that is the product of an astronomically large number and an infinitesimally small number. The final result fits into a computer just fine, but the intermediate numbers might not due to overflow or underflow. For example, the maximum floating point number on most computers is somewhere between 170 factorial and 171 factorial. Such large factorials often appear in applications, often in ratios with other large factorials. See Avoiding Overflow, Underflow, and Loss of Precision for tricks on how to work with factorials that would overflow if computed directly.

Often you can afford to be blissfully ignorant of the details of floating point arithmetic, but sometimes you cannot. A great place to learn more is David Goldberg’s paper What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Update: See follow-up post, Anatomy of a floating point number.

Related posts:

How to compute binomial probabilities
NaN, 1.#IND, 1.#INF, and all that

{ 15 comments }

Data center energy efficiency

by John on February 24, 2009

Data centers consume 1.5% of the electricity produced in the United States and the percentage is increasing. What can be done to make data centers more energy efficient?

According to Ken Brill, 30% of servers could simply be turned off. These servers no longer do anything useful, but nobody was responsible for decommissioning them. (Along the same lines, Brill noted that while preparing for Y2K, companies discovered that half the software they owned didn’t need to be fixed simply because it wasn’t being used.)

After turning off unused equipment, the next thing to do is use more efficient power supplies. Brill said that these power supplies would pay for themselves quickly but are not commonly used.

Listen to Moira Gunn’s interview with Ken Brill.

Show notes | mp3

Related post: Exascale computing

{ 2 comments }

Rolling dice for normal samples

by John on February 10, 2009

Roll five dice and use the sum to simulate samples from a normal distribution. This is an idea I ran across in Teaching Statistics by Andrew Gelman and Deborah Nolan. The book mentions this idea in the context of a classroom demonstration where students are to simulate IQ scores.

I was surprised that just five dice would do a decent job.  So just how good a job does the dice generator do?

To find out, I looked at the difference between the cumulative densities of the dice generator and an exact normal generator. When you roll a single six-sided die, the outcomes have mean 3.5 and variance 35/12, and so the corresponding mean and variance for rolling 5 dice is 5 times greater. By the central limit theorem, the sum of the five rolls should have approximately the same distribution as a normal random variable with the same mean and variance. So what we want to do is compare the distribution of the sum of the five dice rolls to a Normal(17.5, 3.8192).

Now how do we get the distribution of the sum of the five dice? For one die it’s easy: the numbers 1 through 6 all have probability 1/6 of showing up. The sum of two dice is harder, but not too hard to brute force. But if you’re looking at five or more dice, you need to be a little clever.

By using generating functions, you can see that the probability of getting a sum of k spots on the five dice is the coefficient of xk in the expansion of

(x + x2 + x3 + x4 + x5 + x6)5 / 65

In Mathematica code, the probability density for the sum of the five dice is

pmf = Table[
  6^-5 Coefficient[(x + x^2 + x^3 + x^4 + x^5 + x^6)^5, x^i],
  {i, 30}]

To get the distribution function, we need to create a new list whose nth item is the sum of the first n items in the list above. Here’s the Mathematica code to do that.

cdf = Rest[FoldList[Plus, 0, pmf]]

The FoldList repeatedly applies the function supplied as its first argument, in this case Plus, starting with the second argument, 0. The Rest function removes the extra 0 that FoldList adds to the front of the list. (Like cdr for Lisp fans.)

Here’s the Mathematica code to produce the corresponding normal distribution.

norm = Table[
    CDF[NormalDistribution[3.5*5, Sqrt[5*35/12]], i],
    {i, 30}]

Here’s what the difference between the dice distribution and the normal distribution looks like:

plot of cdf - norm

The maximum error is about 5%. It’s not an industrial quality random number generator, but it’s perfectly adequate for classroom demonstrations.

Now what would happen if we rolled more than five dice? We’d expect a better approximation. In fact, we can be more precise. The central limit theorem suggests that as we roll n dice, the maximum error will be proportionate to 1/√ n when n is large. So if we roll four times as many dice, we’d expect the error to go down by a factor of two. In fact, that’s just what we see. When we go from 5 dice to 20 dice, the maximum error goes from 0.0525 to 0.0262.

{ 11 comments }

Starting number for HTML lists

by John on February 4, 2009

I recently found out how to make an HTML list start numbering somewhere other than at 1. This is handy when you have a list interrupted by some text and want to continue the original numbering without starting over. I’ve only been using HTML for 15 years. Maybe one of these days I’ll really learn it.

In the <ol> tag, add the attribute start="7", for example, to make the list start numbering with 7.  The start attribute can be any integer, even negative.

For example, the seven dwarfs are

  1. Dopey
  2. Grumpy
  3. Doc
  4. Happy
  5. Bashful
  6. Sneezy

and last but not least

  1. Sleepy.

Update: As pointed out in the comments below, the example in this post may not render correctly in your reader. See this post for a discussion of the problem.

{ 3 comments }

Free Ubuntu Linux book

by John on January 29, 2009

Andy Hunt announced this evening that Keir Thomas’ 170-page book Ubuntu Pocket Guide and Reference is available as a free PDF. It is also available on paper from Amazon.

{ 0 comments }