The Intel Pentium 3 was a bug

20 years of FDIV-Bug: The background

In his email dated October 30, 1994, in which he revealed the FDIV bug in the Pentium processor, Professor Dr. Thomas R. Nicely literally triggered a storm, above all a storm of indignation. Nicely was also outraged because he had reported the error that had occurred during number theoretic calculations to Intel support earlier. They were very surprised there, and Intel had been aware of the error since June at the latest. Otherwise there was no reaction.

Nicely therefore sent his email on October 30th not only to Intel, but also to well-known authors and journalists such as Andrew Schulman and posted it on Compuserve's Canopus Forum. The mistake was confirmed from everywhere; then the avalanche took its course.

In Nicely's example 1 / 824633702441.0, the relative error was only 10-9, but there were even worse number pairs with division errors in the range of over 10-5. That was billions of times higher than what the IEEE 754 regulation allowed for double-precision floating point operations.

Intel then had to admit the error and explained that when a table was transferred for the division according to the SRT algorithm, five of a total of 1066 values ​​had been forgotten. The error affected not only FDIV, but also many other FPU commands that use FDIV internally, such as FPTAN, FPATAN, FPREM ...

New, corrected steppings were already in the works. They wanted to sweep the little slip under the carpet as quietly as possible, but that was no longer possible.

It was reported around the world that c't went into the processor whisper 1/1995 under "The bug of Intel's flagship" for the first time on the embarrassing error. After the numerous public protests, Intel boss Andrew Grove then apologized, promised improvement and henceforth timely notification of errors. He also issued a free, lifelong exchange guarantee.

SRT algorithm

The division's underlying implementation of the SRT algorithm with Radix 4 is based on a master's thesis by David E. Atkins III at the University of Illinois, which you can read on openLibrary.org. It delivers two bits of the quotient for each calculation step. In each calculation step, the processor uses the so-called PD lookup table and uses the upper mantissa bits of the remainder (6 bits) and the divisor (5 bits) to see how the quotient continues.

Only if the divisor shows one of the five affected patterns can an error occur at all, and only if in the course of the further calculation steps the remainder also refers to the incorrect entries.

Clever programmers like Tim Coe from Vitesse Semiconductors were able to reproduce the problems largely in an emulator program based on 20 pairs of errors sent to him, before Intel announced the details. Because to limit the damage, Intel wrote the white paper "Statistical Analysis of Floating Point Flaw of the Pentium Processor" on November 30th, which, in addition to the cause of the error, contained a statistical assessment of the effect of the error.

Floating-point calculations, Intel explained, play a role primarily in spreadsheets for the common man. And there is usually generous rounding anyway, so that there is a (rare) relative error of 10-5 does not have any further effect. Intel estimated that the average person would only get 15 minutes of computing time per day. That would be only 1500 FDIV operations, so that a relevant error should only have an effect every 27,000 years, far less than the other sources of error in the PC. DRAMs without ECC stand out, as they are used today and where Intel estimated the error rate to be one error every seven years.

Above all, IBM protested vigorously, calculated a slightly higher error rate of one error every 6 computing hours and then stopped selling Pentium systems. You have to see that against the background of the PowerPC processors that are just starting up. IBM found that the floating-point numbers in spreadsheets are not evenly distributed as in Intel's estimate, but rather the risk candidates with affected bit patterns occur far more frequently, numbers such as 1.9999 ... instead of 2 with many ones in the binary mantissa.

Converted to the above ordinary people à la Intel with just 15 minutes of computing time per day, one relevant error would come out every 24 days. c't made its own estimate, which ultimately came closer to IBM's value with one error per 60 days than to Intel's 27,000 years. Later mathematicians took on the error and made more precise estimates of the frequencies and magnitudes, for example from Stanford University coincidentally at exactly the same time as the analysis by "Mr. Mathworks" Cleve Moler.

Mitigation

The error was relatively easy to circumvent using the software at the expense of performance. It was sufficient to multiply the numerator and denominator by a suitable value before dividing so that the error patterns no longer occur. However, some of the multipliers recommended at the time were completely unsuitable because they did not always avoid incorrect bit patterns. c't had identified 1,0001 as the best and smallest value at that time.

But there were also tricky programs that boastfully called themselves "Pentium optimizers". All they did was shut down the coprocessor entirely. Others hooked into emulation interrupt 7, but that also cost a lot of time. c't offered an efficient solution for the DOS compilers of that time, which usually called an emulator the first time an FPU command was called and then replaced the call with the real coprocessor command. A resident handler was able to intervene here and specifically filter out FDIV commands. Later compilers then made FDIV special handling available as an option. The support was mostly only discontinued in the last few years.

exchange

At that time, Intel launched the FDIV Replacement Program with a special online page. However, online support has been discontinued for several years.

In the course of time, c't has twice put the lifelong guaranteed exchange guarantee to the test, both times of course hidden and not under the editorial address. The first attempt took place in 2003. You had to call a special support number and somehow prove, for example by taking a screenshot, that the processor was affected. Then you got an exchange processor (unfortunately no overdrive) sent by DHL and had to return the old one.

The second attempt in 2008 was a lot more complicated. The support didn't believe us at first, but we persisted. However, UPS wanted to collect 100 euros for the "free" exchange. We refused, protested, and a few weeks later actually got a free delivery. The third attempt is still pending, but actually we don't want to part with the Pentium with the FDIV bug ...