DIVIDED IT FAILS - PENTIUM ARITHMETIC BUG ANGERS USERS (December 2nd 1994) It is not often that the president of an $8bn (and rising) company spends the weekend drafting a message to be posted to a newsgroup. But that is what Andy Groves, head of Intel Corp was doing last weekend. Over the past couple of weeks the Usenet newsgroup comp.sys.intel has been dominated by an angry debate over a bug in Pentium's floating point unit which causes errors in the occasional division sum. If you are using a Pentium machine today then it will have the bug - Intel is now saying that it is sampling fixed chips with its manufacturer-customers, but that machines with corrected chips are not likely to appear in the shops until early next year. What so enraged the Internet-based users, was not so much the bug itself; bugs *do* appear in processors and all processors go through a constant process of improvement. Rather, it was Intel's apparent attitude to the problem. The company acknowledged that it knew about the problem since the summer, however the perception was that it didn't actually let on until Dr. Thomas R. Nicely of Lynchburg College let the cat out of the bag. Dr Nicely had been doing some heavy duty number crunching when he realised that the answer to one sum 1/824633702441 was only accurate to the eight significant figures, rather than fifteen decimal places. He had noted the problem in June and, having excluded all other sources of error, reported it to Intel on October 16th. The matter became public on October 30, when a memo to his colleagues was re-posted on Compuserve. Other researchers quickly chipped in and it was discovered that the problem extended across a range of numbers. The clearest analysis of the problem so far is contained within a Frequently Asked Question (FAQ) document put together by Mike Carlton of the University of Southern California Information Sciences Institute. Currently no-one outside Intel is sure exactly how many division-pairs will cause errors, however it is known that at least 1,738 unique cases result in accuracy less than single precision and of these 87 cases produce answers accurate to only around four decimal places. Intel's initial public response stoked the flames, rather than calm them: the company set up a fax-back system to brief worried users. The message described the bug as a "subtle flaw" and estimated that the average "spreadsheet user" would encounter the problem only once in every 27,000 years. The idea that Intel wanted to get across was that the rest of the PC was bound to fall apart before your Pentium processor produced an incorrect answer. However the users immediately interpreted this as meaning that around 3 spreadsheet users a day worldwide would be getting erroneous results from their spreadsheets, with even more frequent errors for people doing serious scientific work. Most importantly, anyone doing iterative functions, where a variable is repeatedly calculated, could see the inaccuracies snowball through their calculations. But above all, the question raised by the newsgroup was "Why didn't you tell us as soon as you knew that there was a problem, rather than keeping us in the dark?" The second question is invariably "Will you replace my chip" to which the answer seems to be "probably not". Unless you can show Intel that you are doing high powered mathematics that needs full double precision figures Intel is unlikely to oblige. To-date we only have two reported examples to draw on: one Pentium user; an undergraduate mathematics student says that he had his request for a replacement chip turned down, despite the fact that he could be doing these complex calculations on his PC. The other user, using his computer for medical analysis ("if you were going under the knife, would you want to know that the analysis may be wrong?") says that he was put on the list for a replacement after 10 minutes of discussion with an Intel rep. Intel now admits that it should have been more open about the bug from the start. It was, if you'll excuse the gallows humour, a miscalculation on its part. But, it says, its initial engineering analysis convinced it that the bug was very unlikely to ever affect users. So, the problem was noted and forwarded through the usual channels to be fixed in the chip's mask. To give a feel for how often this happens; the 486 mask has been through around 30 revisions. The changes to the Pentium weren't rushed through, the idea was to trickle them into the channel. It is incorrect to say that Intel did nothing until Dr Nicely dropped his small bombshell-ette - corrective action was already underway, it says. As a matter of interest, Nicely is now consulting for Intel, and has signed a non-disclosure agreement. The message from Groves apologised for the situation, and revealed just how problematical it was for the company: "We would like to find all users of the Pentium processor who are engaged in work involving heavy duty scientific/floating point calculations and resolve their problem in the most appropriate fashion including, if necessary, by replacing their chips with new ones. We don't know how to set precise rules on this, so we decided to do it thru individual discussions between each of you and a technically trained Intel person... I would like to ask for your patience here." By Wednesday the company had received at least 5,000 calls worldwide. The problem is compounded, of course, by the fact that Intel had been partially targeting Pentium machines as low-end workstation replacements. While Intel and users debate how often the error is likely to occur, the question of how this will effect Intel's business in the short, medium and long term also remains to be resolved. That depends on how long the issue remains "news" and so remains in the public's mind. At the beginning of the week, most financial analysts were saying that the story was interesting, but suggested no one would remember it in a week's time. Indeed an initial 2% slump in Intel's share price last Friday, was followed by a swift recovery on Monday. Then in the middle of the week analysts at Prudential Securities said they believed that the technical difficulties with Pentium's FDIV instruction were more deep-seated than previously thought, and a rumour spread on Wall St that all the faulty Pentiums would be recalled. Intel denied both suggestions and its share price stabilised again. However one of the most interesting aspects of the story is the Internet's role in all this - the story first fermented in the Internet newsgroups for some time before bubbling over into the mainstream media. EE Times gets the credit for first picking up the story on November 7th, though it buried it somewhat. Since then however, CNN and the Washington Post/Wall St Journal double-act have done their pieces, and the problem has appeared in The Economist, which pointed out that some banks track interest rates with a degree of precision that takes them into the danger zone. Even Channel 4 News in the UK took a bite at the cherry; not its usual fodder at all. Meanwhile IBM has announced that it will be replacing faulty processors for its customers. Intel's latest admission, that machines with the fixed chips will not appear until next year is also guarantied to keep the story bubbling, and no-doubt the trade mags will keep an eye on the situation, looking for the first bug-free machine to ship. And of course, things will carry on bubbling on the Internet, already users are talking about pursuing Intel or its suppliers through the courts on the grounds of selling faulty goods; there's nothing like a bit of litigation to keep people interested. There is even the possibility that one of the leaner, hungrier x86 processor-clone makers could be tempted into running an advertising campaign along the "99% Pentium-compatible, trust us, you don't want the other 5%" lines. Doing so would be risky, positioning the advertiser in a hostage-to-fortune position; still the US advertising market is a rough and tumble place and no-doubt someone will take a dig at the Intel Inside campaign, or 'Insel Intide' as the Economist dubbed it. But perhaps the worst news for Intel is that the jokes have already started. Every human or marketing disaster is swiftly followed by black jokes; for a long time in the UK the car maker Skoda became the butt of jokes about its build quality - "Q. How do you double the value of Skoda? A. fill its tank with gasoline." It took a long time for the company to shift that image, despite the fact that Volkswagen took over the company and improved quality beyond recognition. Even today, Skoda drivers in the UK walk around with a sheepish air. The fact that it took less than a week for the jokes such as: Q. How many Pentium engineers does it take to change a lightbulb? A. Errr, we're not quite sure, but don't worry, bulbs don't blow very often. to begin flying across the Internet suggests that Intel's damage control has completely failed. The problem is that people no longer really care that the bug is almost certain not to affect them; Pentium's inability to count has already become an urban myth and the jokes will continue to fly, irrespective of calming messages from Andy Groves on the Internet. (C) PowerPC News - Free by mailing: add@power.globalnews.com