Twin prime conjecture and the Pentium division bug

Twin primes are pairs of primes that differ by 2. For example, 3 and 5 are twin primes, as are 17 and 19. Importantly, so are 824633702441 and 824633702443. More on that in a minute.

No one knows whether there is a largest pair of twin primes. The twin prime conjecture says that there are infinitely many pairs of twin primes, but the conjecture has not been proven.

Now suppose we take the reciprocals of the twin primes and add them up.

\left(\frac{1}{3} + \frac{1}{5}\right) + \left(\frac{1}{5} + \frac{1}{7}\right) + \left(\frac{1}{11} + \frac{1}{13}\right) + \cdots

If there were only finitely many twin primes, the sum would have finitely many terms and hence a finite sum. But the sum might converge even though it has infinitely many terms. On the other hand, if we could show that the sum diverges, we’d have a proof of the twin prime conjecture. Viggo Brun showed that the sum does converge. Its sum, known as Brun’s constant, is a little more than 1.9.

In 1994, Thomas Nicely was studying Brun’s constant when he found that his computer incorrectly computed 1/824633702441 beyond the eighth significant figure. Nicely had discovered the infamous Pentium division bug.

Intel responded by saying the division errors were inconsequential. Intel was absolutely correct, but the public couldn’t understand that. They only knew that the chips were “wrong.”

The error was estimated to occur once in every 9 billion divisions. (I doubt any large program has ever been written that is as bug-free as the buggy Pentium chips.) And when an error did occur, the result was not entirely wrong, only less accurate than usual. The public only understood that sometimes the answers were “wrong.” Most people do not understand that floating point arithmetic is nearly always “wrong” in the sense of being less than perfectly accurate.

At first Intel said it would only replace the chips for people who could show they were effected by the bug, i.e. almost nobody. Eventually Intel gave in to pressure and replaced the chips. The episode cost Intel half a billion dollars.

More number theory posts

5 thoughts on “Twin prime conjecture and the Pentium division bug

  1. The real question you need to ask, though, is `Was the Pentium up to the usual standards of its time?’ If the answer to that is `no’, then Intel did screw up and did need to get rapped a bit for it. Otherwise, due to the Iron Law of Shareholder Value, the standards would have slipped to the point where worse bugs would be par for the course.

    Companies will do precisely what you let them get away with. This is a big reason Microsoft has such troubles with companies that write device drivers for Windows: They’ve let the bastards get away with buggy crap for so long, buggy crap is now pretty much the standard.

  2. That’s interesting. I had a very similar experience once. I found a bug in the floating-point unit of a DEC PDP-11 minicomputer.

    This was in the mid 1970’s. I was a physics graduate student and (at that time) doing computer molecular-dynamics simulations of liquids on a DEC PDP-11 model 40. I was doing a simple example to get my feet wet (excuse the pun) – liquid Argon. So I was simulating a box of Argon atoms and while it behaved as expected in most respects, I noticed that the temperature was increasing, which should have been impossible the way the experiment was being conducted. The temperature is just the average (over all of the Argon atoms) kinetic energy, which (if you remember from your physics classes) is equal to half the mass times the velocity/speed squared. In investigating this I also discovered that the system was accelerating in the negative (x,y,z) direction in the coordinate system that we were using.

    I spent days trying to understand what was causing this and finally in desperation I resorted to printing out the full binary representations of the operands and results of all floating-point math operations being performed. What I discovered was that floating-point subtraction (or adding a negative number) was the culprit. It was making a very infrequent error in the least-significant bit of the mantissa. Also, this error could go either way – i.e. when an error occurred, the result could be either greater than or less than it should have been, but there was a bias. The less-than error occurred a little more frequently; thus the acceleration in the negative direction.

    When I contacted DEC about this problem, I had a terrible time convincing them that it was really an error and, when I finally did convince them, they claimed that it still “met spec.” At that point, I gave up and switched to doing the calculations using a software floating-point package that only used the hardware to do fixed-point arithmetic. I knew that we were getting a new model (a 70) soon and I hoped that it wouldn’t have the problem, which thankfully turned out to be the case.

  3. Along the lines of Chris Barts’ market viewpoint, one commentator I read pointed out that Intel was making a broad push so that more people would know about their products. The “Intel Inside” marketing campaign started 3 years previous to the FDIV bug.

    Part of the market reaction against Intel was consequently likely because the increased marketing effort changed the market enough that the previous standards of practice no longer applied. In that case, the question “Was the Pentium up to the usual standards of its time?” is irrelevant, since it was unknowingly in a new market with different standard from its old one.

  4. I worked on the FDIV software workaround, the error was in fact a lot more significant than what you state here since (a) the maximum error was 1/256, i.e. only ~2 correct digits, and (b) the bug was much more likely to be hit when you worked with numbers just less than an integer, i.e. as you typically get in numerical modelling or when doing stuff like 1.0/3.0 and then multiply back by 3.0 to end up with 0.999999..

Comments are closed.