The Titanic Effect

Gerald Weinberg’s book Secrets of Consulting is filled with great aphorisms. One of these he calls the Titanic Effect:

The thought that disaster is impossible often leads to an unthinkable disaster.

If your model says disaster is extremely unlikely, the weakest link may be your model.

In The Black Swan, Nassim Taleb looks at the risks facing a casino. The biggest risks have not been lucky gamblers. The actuaries working for casinos understand the risks of lucky customers very well and put policies into place to protect against these risks. But the actuaries didn’t account for the possibility that a tiger might maul an irreplaceable performer, costing the casino $100 million. Neither did they account for the possibility that an employee might forget to file tax paperwork or that someone might kidnap a casino owner’s daughter. No one could have foreseen these events, and that’s the point: there are always risks outside your model.

Related post: Feasibility studies

Kibi, mebi, gibi

Computers like powers of 2, people like powers of 10, and the fact that 210 is approximately 103 makes it easy to convert between the two powers.

A kilobyte is 1000 bytes like a kilogram is 1000 grams. But for years the former was only approximate and the latter is exact. A kilobyte was actually 210 = 1024 bytes. Similarly a megabyte was 220 or approximately 106 bytes and a gigabyte was 230 or approximately 109 bytes. Then in 1998, the International Electrotechnical Commission (IEC) introduced new units to eliminate this ambiguity. The prefix “kibi” means exactly 210, “mebi” means 220, and “gibi” means 230. (The IEC also introduced tebi, pebi, exbi, zebi, and yobi as binary analogs for tera, peta, exa, zetta, and yota.)

I’ll give arguments for and against using the new units and conclude with a note about their implementation in PowerShell.

Pro

The number of bytes in a kibibyte is about 2.5% larger than the number of grams in a kilogram. But the number of bytes in a mebibyte is about 5% larger than the number if grams in a megagram. Every time you go up by a factor of 210, the approximation degrades about another 2.5%. So a tebibyte is almost 1.1 trillion bytes. At larger sizes, the traditional prefixes become progressively more misleading and hence the need for more precise alternatives is greater.

Another advantage to the new prefixes is that “kibi, mebi, gibi” is reminiscent of “veni, vidi, vici.”

(Why does the quality of the approximation (210)k ≈ 103k by about 2.5% for each increment in k? Because by the binomial theorem, (1 + 0.024)k ≈ 1 + 0.024 k.)

Con

The only thing special about decimal powers of 2 — 210, 220, 230 etc. — is that they are approximately round numbers in base 10 that have traditional prefixes associated with them. From a computer hardware perspective, numbers like 216, 232, and 264 are more natural. If you’re not going to use the traditional prefixes, abandon them rather than create near copies. Just state the number of bytes, using scientific notation if necessary.

PowerShell

In Windows PowerShell, the units kb, mb, and gb to mean 210, 220, and 230. For example, if you type 2mb + 5gb at the PowerShell command prompt, the result is 2102272. If you really want to compute 2×106 + 5×103 and can’t do it in your head, you could type 2e6 + 5e3.

Bruce Payette commented on this feature of PowerShell in his book PowerShell in Action.

Yes, the PowerShell team is aware that these notations are not consistent with the IEC recommendations (kibabyte, and so on). Since the point of this notation is convenience and most people in the IT space are more comfortable with Kb than with Ki, we choose to err on the side of comfort over conformance in this one case. Sorry. This particular issue generated easily the second most heated debate on the PowerShell internal and external beta tester lists.

Pi seconds equals one nanocentury

Duff’s rule says that a nanocentury is about π seconds. Assuming a year is 365.25 days, there are 3,155,760,000 seconds in a century. So a nanocentury, one billionth of a century, is 3.15576 seconds, roughly π seconds.

This odd fact is surprisingly useful in back-of-the-envelope calculations. In order to determine whether something is computationally feasible, you have to go from how much work can be done in a second to how much work could be done on calendar time scales. For example, suppose some operation is going to take 1015 steps and you can carry out 106 operations per second. How long would that take? Obviously 109 seconds, but how long is that in familiar units of time? According to Duff’s rule it’s about a third of a century so about 30 years.

Since the square root of 10 is approximately π, you could say there are about square root of 10 seconds in a nanocentury. In fact, the square root of 10 is 3.162 and so it’s closer to the number of seconds in a nanocentury than π is. The advantage to square root of 10 is that it is exactly half an order of magnitude, 101/2. So you could say a century is 9.5 orders of magnitude longer than a second, or a year is 7.5 orders of magnitude longer than a second. Or perhaps more memorably,

A year is about 30 megaseconds.

If you interpret “mega” as 220 rather than 106 this approximation gets even better. (Technically, this would be 30 mebiseconds. SI distinguishes “mega” = 106 = 1,000,000 from “mebi” = 220 = 1,048,576, though the latter isn’t widely used. Most people have either never heard of the new prefixes like “mebi” or think they sound silly and prefer the ambiguity of using “mega” to mean two slightly different things. See Kibi, mebi, gibi.)

I wrote briefly about Duff’s rule a while back in the post Three rules of thumb. That post also includes a great video of Grace Hopper explaining to David Letterman her rule of thumb that light travels about one foot in a nanosecond.

There isn’t a googol of anything

Before Google, there was googol, the number 10^100, written as a 1 followed by 100 zeroes.

There are about 4 × 10^79 atoms in the universe. (Here’s a derivation of that number.) You could bump that number up a little by counting particles rather than atoms, but not by much. There’s not a googol of anything physical in the universe.

On the other hand, numbers larger than a googol routinely arise in application. When you’re counting potential things rather than physical things you can run into numbers much larger than a googol. This happens all the time in probability calculations. Inconceivably large numbers pop up in intermediate steps on the way to moderate-sized results.

Related posts

Emacs calculator

Emacs has quite a sophisticated scientific calculator. Like many other things in Emacs, it is both powerful and idiosyncratic.

The calc module ships with Emacs as of version 22. The calc manual has full documentation, but it takes some work to understand. This post may make it easier to read the manual.

The manual says you can start calc with the command M-#. That did not work for me, but the command M-x calc did.

Calc has two modes: RPN (reverse Polish notation) and algebraic. RPN is the default mode and uses very terse commands, often one or two letters. Most people will find the algebraic mode syntax more familiar. Commands in algebraic mode begin with a quote and use longer, more descriptive names.

For example, suppose you want to find the cosine of 5. You can do this in RPN mode by typing 5 and pressing return, then typing shift-C. To do the same calculation in algebraic mode, type 'cos(5) and return.

I’ll step through to gamma function documentation as an example of how to interpret the manual. It says

The f g (calc-gamma) [gamma] command computes the Euler gamma function. …

This means that in RPN mode, you enter the argument of the gamma function, say 5, and then type fg. (Not f g as you might expect.) You could also enter 5 and then type M-x calc-gamma. In algebraic mode, the syntax would be 'gamma(5). So the pattern in the manual is

RPN keystrokes (lisp function) [algebraic syntax].

The calc module has an amazing range of functionality — symbolic calculation, matrix operations, graphics, etc. — though I don’t imagine I’d use much of it. In my mind, the benefit of calc is being able to do a quick calculation without leaving Emacs. Although one could do sophisticated calculations in calc, I doubt I would for two reasons. First, if I have to look up how to do something in calc, I lose the benefit of not interrupting my workflow. Second, I don’t want to start a computation in calc and then discover I need something that isn’t there and then have to switch to another tool such as SciPy, Mathematica, or R.

Related posts

Probability that a number is prime

The fastest ways to test whether a number is prime have some small probability of being wrong. Said another way, it’s easier to tell whether a number is “probably” prime than to tell with certainty that it’s prime. This post looks briefly at algorithms for primality testing then focuses on what exactly it means to say a number is “probably prime.”

The simplest probabilisitic method for testing primes is based on Fermat’s Little Theorem. This theorem says that if p is prime then ap-1 has a remainder of 1 when divided by p. If the remainder when dividing an-1 by n is anything other than 1, we know we do not have a prime. If the remainder is 1, we don’t necessarily have a prime, but the chances that the number is prime have gone up. (At this point, some statisticians are cringing at my wording. I’ll get to these objections in a minute.)

If I repeat the test for several different values of a, the probability that my number is prime keeps going up. In principle I could keep testing with new values of a until I’m sufficiently satisfied that my number is prime. That’s not quite true, however, since there are some numbers known as Carmichael numbers that will slip through the test for every value of a. The smallest Carmichael number is 561. But these numbers are sufficiently rare that you are extraordinarily unlikely to run into one by accident when testing large numbers for primality.

Primality testing involves some very interesting mathematics. I may blog about that sometime, but in this post I want to look at what it means to say a number is “probably prime.” This issue illustrates the differences between two schools of statistical interpretation.

A strict frequentist statistician would argue as follows. Suppose you have a large number N, and you run it through a probabilistic primality test. If it fails, the probability of it being prime is 0 because the number is certainly composite. If it passes, the probability that it is prime is either 0 or 1, but we just don’t know which. If it is prime, the probability that it is prime is 1. If it’s not prime, the probability is 0. It’s not very helpful to ask for the probability of a specific number being prime because there’s nothing random going on.

However, just before you write off the statistician as being worthless, he gives a further explanation. He explains that you can indeed talk about the probability of a number being prime if you rephrase your question. Suppose you pick a number at random from some range. Then you can speak of the probability that you chose a prime number. The number’s properties are not random but your choice is random. And further you can ask about the conditional probability that your selection will be prime given that it passes a primality test.

Perhaps you don’t find the argument above satisfying. You still want to say “I’ve got a number here and it passed the primality tests. Surely I can say it’s probably prime.” A Bayesian statistician may agree with you. There are varieties of Bayesians, but one school of thought says that probabilities represent uncertainty. Randomness is one cause of uncertainty, but not the only one or even the most important one. E. T. Jaynes was a major proponent of this view. He argued that much of what people say about randomness is nonsense and that we should focus on information instead. As he once said “probabilities do not describe reality, only our information about reality.”

Of course a particular number is either prime or not, but our knowledge of whether it is prime is uncertain. So the probability refers to our mental state, not to the number. (In a sense this isn’t too different from the frequentist saying the probability applies to our process of choosing numbers, not the number we drew.)

One criticism of Bayesian statistics is that Bayesian analysis begins with a prior distribution. So our Bayesian statistician would ask “What is your probability that the number will be prime before you select it?” A sensible answer would be to use the Prime Number Theorem to come up with an initial guess. For example, if you’re picking a number with 200 digits, the theorem says that the proportion of 200-digit numbers that are prime is roughly 1/(200 log 10) or about 1/460.

But what if someone chooses a prior that isn’t as clever? They’ll get a different probability that a particular number is prime! But that’s to be expected. If a probability represents uncertainty, some people will have more uncertainty than others because people know different things. What if someone secretly created your 200-digit number by multiplying two 100-digit numbers? His prior probability that the product is prime will be zero because he has no uncertainty.

Leaving aside primality for a moment, suppose I flip a fair coin. Before I flip the coin, a frequentist and a Bayesian  agree that the probability that it will come up heads is 1/2. Now suppose the coin has fallen to the floor and only I can see the outcome.  My probability of heads is now either 0 or 1. The Bayesian would say that his probability is still 1/2 because his information hasn’t changed. A strict frequentist would say it is now improper to assign a probability to the outcome. The coin’s status has changed from random to fixed but unknown, and one may not use probabilities to quantify uncertain knowledge.

Going back to primality testing, recall we first said 1/460 was a good prior distribution on the probability of a 200-digit number being prime. But we could improve this by saying the prior probability is 0 for even numbers and 1/230 for odd numbers since we can easily rule out half of the 200 digit numbers. Our prior probability has become more refined because it now incorporates more knowledge, namely the fact that big even numbers are not prime. But why stop there?

We could rule out multiples of 3 as well and put all the positive probability on numbers not divisible by 2 or 3. We could continue, and at each step our prior distribution would change. Some people are disturbed by the thought of the prior changing like this, but it’s perfectly consonant with the idea of a prior distribution expressing prior knowledge. The more work you do up front, the less uncertainty you have.

We could even continue this process, in theory, until we know the primality status of every 200-digit number and there’s no uncertainty left. We’d be done before we got around to conducting our probabilistic primality test. In practice we’d never finish if we tried to test every 200 digit number for primality. There would be nowhere to store the results even if we could compute them. There are fewer than 10^80 atoms in the universe, so if we stored one result per atom, we could only store a table of primes up to about 80 digits.

Setting aside statistical fine points, suppose we are now quite certain that our number is prime. Say we report that the probability of our number not being prime is less than 1 chance in 1050. Is that really believable? For such a tiny probability, isn’t it more likely that there was a bug in our computer program that produced the probability? Or if we assume the software is perfect, we have to admit that computer hardware has errors. These errors are rare, but are they as rare as one chance in 1050? Or for that matter, what is the probability that radiation may have caused a bit to flip somewhere in the hardware as we were computing?

Suppose we want to know the primality of a number because we’re going to use the number in an encryption algorithm. Then our ultimate concern is whether the encryption keeps our data secure. If the probability of the primality algorithm failing is orders of magnitude less than the probability of other security threats, we should focus our attention elsewhere.

Update: See How likely is it that a probable prime is prime?.

Related posts

Casablanca and Einstein

Humphrey Bogart and Ingrid Bergman in Casablanca

If you’ve ever seen Casablanca, you’ve heard the song As Time Goes By, but only the chorus.

You must remember this
A kiss is just a kiss, a sigh is just a sigh.
The fundamental things apply
As time goes by.

Did you know the song includes references to relativity and four-dimensional geometry?

Here’s the first verse.

This day and age we’re living in
Gives cause for apprehension
With speed and new invention
And things like fourth dimension.

Yet we get a trifle weary
With Mr. Einstein’s theory.
So we must get down to earth at times
Relax relieve the tension.

And no matter what the progress
Or what may yet be proved
The simple facts of life are such
They cannot be removed.

Here are the full lyrics.

Via Math Mutation podcast #134.

Probability and Statistics cheat sheet

Matthias Vallentin posted a comment on my post about a math/CS cheat sheet to say that he’s been working on a probability and statistics cheat sheet. Looks great, though at 24 pages it stretches the definition of “cheat sheet” even more than the computer science cheat sheet did.

Anybody know of other cool cheat sheets?

Related links: