September 1st, 2021 ~ by admin

NEC’s Forgotten FPUs

NEC uPD70108C – V20 CPU – Late 1984

NEC had a cross license agreement with Intel dating back to April of 1976 that allowed each company to make/sell products based on each others patents.  This was particularly important in the 1970’s as having a viable ‘second source’ for your designs was considered critical for it to be viable in the market.  This was especially true for Intel, who wanted to get into the Japanese market. In 1979 NEC began to produce and sell the 8086 and 8088 processors.  NEC wasn’t going to succeed by just being a second source to Intel though, designing their own processors was of great importance.  While producing the 8086/8088 they also began working on their own version, which would be an enhanced 8086/8088 processor.

NEC V30 Die (courtesy Birdman) – 8086 with many enhancements

The result was the rather well known V20/V30 processors of 1984.  These were not just clones of the Intel MCS-86 (though determining this took several court cases and resulted in the Chip Act of 1984).  The V30 had some pretty big differences, notably, internally it had dual 16-bit busses, allowed data to be moved much more efficiently, as data could be moved into and out of a register at the same time (nearly).  It also increased the microinstruction word from 21 bits to 29 bits, added a hardware effective address generator, additional instruction pointers, and a hardware shift/loop counter.  Taking advantage of these features added some new instructions as well, 156 compared to the 8086’s base 133.  The V30/V20 were the beginning of a line of V-series processors.  NEC went on to make  ‘186/188 style processor (the V40/V50) as well as a series of microcontroller versions  (V25/V35 and others).  The V20/V30 were to be supported by a math coprocessor like the 8087 called the upd72091.  Very little info is available on the 72091 as it was cancelled very early on in its design, as by 1984-1985 it was already out of date.  Its replacement was to be a bit more powerful.

Design of the the upd72191 started likely at the same time the V30 was released, around 1984-85, with specifications released in 1986, and plans for chips by 1987.  This chip was in an advanced state of planning, such that many products, including motherboards (such as the Ampro Little Board PC) and industrial controllers designed with sockets for it.  Preliminary datasheets exist, but alas, no chips seem to be found.

LittleBoard PC (Ampro) with support for canceled upD72191 (V40 based)

The upd72191 was made in CMOS and is a bit like an enhanced 80C187 but with support for the V20/V30.  It is fully IEEE-754 compatible (the 8087 wasn’t as the standard wasn’t finished yet) and supports a similar instruction set as the 80C187 (and thus the 80387).  Unlike the 8087 it supports the full set of Exponential, Trig, Logarithmic, and Hyperbolic instructions.  The 8087 was somewhat limited in this, as it was already pushing the limits of what was possible on a single chip at thee time of its release.  The 72191 supports FSIN/FCOS which the 8087 doesn’t and many other functions (its full instruction set could not be found).  The 72191 has a mode pin that selects between interfacing between the V20/V30 and the V40/V50, (as these talked to coprocessors differently) so it was compatible with 4 distinct processors.  The 80C187 could only be used with the 80186 and the 8087 could only be used with the 8086/8088.

upD72191 FPU Block Diagram – 1986ish

Looking at the block diagram of the ‘191 we notice something else, its a dual bus design, much like the V30 processor.  Internally there are a pair of 74-bit busses for the mantissa (fraction) side and a pair of 16-bit busses for the exponent side.  This is a striking difference from that of the 8087 and the ‘187.  The 8087 has a single 16-bit bus for the exponent, and a 64-bit (68-bits into the shifter and ALU) for the mantissa.  There are 3 extra bits for enhanced accuracy, and a extra leading bit that is always 1 for floating point math, giving 64 bits of ‘data’.

The dual bus design makes sense as NEC did the same for the V-series.  Coupled with the right microcode, it can greatly enhance the speed of the FPU.   So why then is the bus expanded to 74-bits for the mantissa?   In the 80187 and 80387 this bus is still only 68-bits.  We look to the design of NECs follow on FPU for the answer.  The upd72291 (and its 32-bit bus 72691 version) are rather different beasts, made for the the V33/V53 x86 CPUs and V60/V70/V80 non x86-CPUs.  We’ll talk about them in more detail later, but they share the same 74-bit mantissa as the 72191, and in this case, the designers wrote a paper on its design.

The FPP [72691] is the only floating point processor that provides the power function xy.  This function (called FPOWER in the instruction set) is difficult to implement not only for its complex definition but also for sufficient accuracy. The equation Xy = e(y*logeX)
does not give good accuracy because the accuracy error of the log function is augmented by the exponential function.  The FPP solves this problem by providing a 74-bit data width for the mantissa data bus.

Being as the 72191 was canceled, the ‘291/691 would in fact have been the only FPU to support this in hardware, but it seems it was first implemented on the ‘191.  The solution only works well for larger (greater then 32) values of y, otherwise iterative multiplication is used, but where it can be used it greatly speeds up the calculation.

When the 72191 was canceled NEC thoughtfully provided a single chip solution called the upd9335C for allowing an 8087 to be interfaced to the V40/V50 processors which, like a 186, used a HOLD/HOLDACK bus release protocol instead of the 8086/8088s (and V20/V30s) REQUEST/GRANT.  For applications using a V20/V30, an 8087 could be used directly.

NEC upD70632R-20 20MHz V70 Processor

In 1989 NEC released the next of the V-series, the V60, V70 and later the V80 processors.  These were a departure from the previous in that they were no longer based on the x86 architecture, but rather a completely new ISA (though the V60 and V70 had a V20/V30 emulation mode).  These were full 32-bit designs, and were Japan’s first widely available 32-bit processors.  Of course with a new processor comes the need for a new FPU and NEC had not one, but 2 FPU options for these.  The upd72291 and upd72691 are based on the same design, but with some major feature differences.  The 72291 is designed to work with processors that have a 16-bit data bus such as the V60.  It also could be used with the older V33/V53 x86 designs.  Internally it has eight floating point registers and supports all your typical floating point functions as well as vector math functions.  The upd72691 is designed for 32-bit data paths, but adds a bit more…

NEC updD72291R-16 FPU

In addition to expanding the register set to 32 FP registers, the ‘691 also added a complete suite of matrix  math functions. The ‘691 was made on a 1.2u CMOS process and contained 433,000 transistors. (nearly 50,000 MORE then the V60 processor) Running at 20MHz it was capable of around 6.7MFLOP and supported 24 vector/matric instructions as well as 22 mathematical functions.  Like the 72191 it had a 74-bit mantissa datapath, but expanded the exponent path to 17-bits to support double extended precision number formats. It is a highly microcoded design using a 3072 word (43 bit word) microcode ROM, 20% for vector/matrix, 37% for arithmetic, and the rest for exceptions handling and other house keeping instructions. Interestingly, these microps themselves encode additional instructions that NEC call nano-ops, these controlled just the ALU operations of the instruction (the rest being bus control and sequencing).  These nano-ops were stored in a 256 word x 74-bit Nano ROM (only 120 words were used, the rest for potential expansion). This was the last of the line of NECs dedicated FPUs (excluding the few MIPS FPUs they made).  Its a bit ironic that it seems they canceled as many designs as they made.

…but perhaps they didn’t?

Read More »

Tags:
, , , , ,

Posted in:
CPU of the Day

August 12th, 2021 ~ by admin

Forgotten Italian CPU – The Genesys B52 MMX

Introduction

On this site you can read about thousands of processors models. And every year it is more and more difficult to write about some new (old) processors, since everything has been known for a long time. But there are also exceptions to the rule which we love to find. In 2021, I learned about one unusual processor, the information about which I want to share with you. The roots of this processor’s history go back to Italy, in the distant year or 1998. This time just falls on the confrontation between Intel and its second generation Pentium and AMD K6-2 and K6-3 processors. The Cyrix MII processors from Cyrix Corporation, IDT WinChip 2s and Rise mP6s were still going strong as well.

But before we talk about the Genesys B52 MMX processor, we should take a closer look at Intel Pentium II processors in general, as the Italian processor primarily owes its appearance to them.

Intel Pentium II

From 1993 to 1997, the Pentium dominated all market segments. Over time, the name of the “Pentium” trademark even grew into a household name (Its all about the Pentiums baby), but with the release of the Pentium II, everything changed. If earlier Intel did not deeply segment the market, there were Pentium Pros for workstations and servers, and for everything else there were various models of Intel Pentium processors, in which, at the end of their domination, Intel added MMX instructions, depriving and thereby putting an end to its server segment. The new slot form factor of the processor, the abandonment of the usual pins and ceramics and further segmentation of the market (using Intel Celeron processors and the new Xeon line) radically changed the further course of development of the history of microprocessors.

May 7, 1997 saw the light of the first models of Intel Pentium II processors, manufactured on a 350nm process with a core voltage of 2.8 volts. The first models were based on the Klamath core (named after the river by which The CPU Shack is located) core, operating at 233 and 266 MHz. The main differences from the Pentium Pro predecessor it was based on were the L1 cache increased from 16 to 32 Kb, and the presence of a block of SIMD instructions called MMX first introduced on the last P55C processors. Like the Pentium Pro it featured its own L2 cache on the module, but in this case it was 512KB fixed on the same PCB as the processor core, a much cheaper solution then the dual ceramic cavity package of the Pentium Pro.

Before the Pentium II, only the Pentium Pro could boast of its own cache, running at the frequency of the CPU core. But, placing the CPU core and L2 cache on the same substrate was an expensive pleasure even for Intel, and the processors had to be cheaper for better competition, which was getting more and more intense. Intel then made a “wise” decision, as a result of which the Pentium II got a its own L2 cache next to the CPU core This engineering solution significantly reduced the cost of manufacturing processors. BSRAM L2 cache chips were manufactured by Toshiba, SEC and NEC at that time, rather then being made in house by Intel, further easing the cost burdens.

Pentium II Klamath SECC1 PBGA Core 2 x Cache on front 2x + TAG on back

For all models of Pentium II processors, the cache size remained unchanged and equaled 512 KB, while different Pentium Pro models had a cache from 256 to 1024 KB. The L2 cache of the first Pentium II processors consisted of four microcircuits located on both sides of the cartridge processor board and operated at half the core frequency. In addition to the processor core and 4 L2 cache chips, there was also a tag-RAM chip on the cartridge PCB, a total of 6 IC’s.

Backside with 2x cache + TAG

The tag-RAM size/configuration determines which range of main memory can be cached. For example, if the L2 cache is 256 KB and the tag RAM is 8 bits wide, then this is enough to cache up to 64 MB of main RAM. However, if you add additional RAM in the process, it will not be cached unless you also expand the tag RAM. On Socket 1-3 486 systems, most motherboards allowed adding and modifying additional L2 cache and tag-RAM chips for this purpose. The Pentium Pro had built-in L2 cache and tags capable of caching up to 4GB of main memory, whereas the first Pentium IIs could cache up to 512MB of RAM.  This was in part to set them apart from the server oriented Pentium II Xeon which had full speed cache capable of caching 4GB (or 64GB with PSE-36),

In January 1998, Intel announced the Pentium II processor, built on a new core, codenamed Deschutes (Another river in Oregon). The processor core was manufactured using the smaller 250nm process, which lowered the operating voltage to 2.0 V, instead of 2.8 V for “Klamath”. The L2 cache of 512 KB still worked at half the core frequency, but it was made in the form of two BSRAM chips located to the side of the processor package. In later modifications of the Pentium II Deschutes core, Intel replaced the tag-RAM chip, thanks to which the processors could cache up to 4 GB of RAM (the 82459AD revision).

The first generation of Intel Celeron processors were based on the “Covington” core were essentially processors on the “Deschutes” core, but without ANY L2 cache. Thanks to this, they had very poor performance, but they overclocked very well, demonstrating the best overclocking figures up to double the nominal clock frequency.

Deschutes core with Organic BGA core and 2x cache chips on front. TAG on back

All overclocking of Pentium II, as a rule, rested on the characteristics of microcircuits used by BSRAM and tag-RAMs. The latter, like the cache, was much disliked voltage rises, and with inept handling, an expensive Pentium II could turn out to be a Celeron “Covington”, if such microcircuits failed.By the way, they warmed up decently on Pentium II processors based on the “Klamath” core so cooling was very important as well. The multiplier in 99% of Pentium II processors was locked (very early production ones were unlocked and Engineering Samples of course), so overclocking was performed by raising the FSB frequency, this being dependent always on the cache and TAG chips installed in that particular processor.

 

A simple example. In Costa Rica, where Intel has an advanced advanced processor assembly/test factory, which simultaneously assembled high-frequency models with 450 and 300 megahertz. The cartridge and core for these processors are identical (and the multiplier was the same 4.5x as well 66×4.5 for the 300 and 100×4.5 for the 450). The difference was only in the installed cache memory with different speed rating in nanoseconds. Sometimes on the assembly line there was only a fast cache memory capable of operating at a frequency of 225 MHz, intended for models of processors with 450 MHz. In this case, it was also installed on the model with a frequency of 300 MHz, as a result of which they overclocked perfectly.

Genesys B52 MMX CPU

The history of the Italian processor began in the city of Monopoli, in the province of Bari in Italy. In 1998, Italian Marcello Console founded Genesys, which initially employed 10 people. The main idea of the Genesys business was the production of modified Intel Pentium II processors based on the “Deschutes” core, at a much lower price than the Pentium II ones of similar clock speed. Plus a warranty period extended to 3 years and productivity increased by 5% or more. It turns out to be a solid Attraction of Generosity!

Genesys had registered its own domain www.b52mmx.com and is getting ready to implement their processors in ready-made system units. Unfortunately, nothing is known about the manufacturing process, it remains a mystery to this day. There is not so much information on these processors, but let’s try to figure out what these processors were.

Read More »

August 2nd, 2021 ~ by admin

The 6502 Travels the World: The Story of the Indian SCL6502

Semiconductor Complex LTD SCL6502 CPU

India in the 1970’s was often considered a third world county, supported by a largely agrarian economy and with a wide swath of the population still based off of subsistence living.  They also however, had a robust space program, had mastered nuclear technology and had a largely stable government that supported the advancement of technology development in the country.  All the pieces were there to begin making the shift to the robust high tech economy that they possess today.  In the 1970’s India had several govt entities working on semiconductors and electronics, all managed under the direction of the Dept of Electronics.  There was also a fair number of companies with plants in India doing electronics manufacturer and assembly.  This was largely small scale production of older technology.  TTL circuits  (starting with the 7420) were made in Bangalore by BEL back in 1971.  But TTL circuits won’t get you far, and at that time the best process India had was around 8 microns, so in 1972 an initiative was started to develop an indigenous semiconductor industry within India.

SCL Fab – Currently 0.18 Micron

Politics are the same everywhere, and so this process took some time, people with experience had to be recruited to run it, and a suitable (politically and geographically) location selected.  Eventually in the late 1970’s the Semiconductor Complex LTD was formed in the city of Mohali ( Chandigarh ) in the Punjab province of India.  SCL was to be the state supported enterprise to bring indigenous high end (LSI and above) semiconductor production to India.  Two things were needed to make this work: Technology, and People who were experts in that field.  SCL was tasked with going to Japan, America, and Western Europe in search of a company that would assist with the technology transfer, as well as finding some Non-Resident Indians who would be willing to come back to India to work on it.  Many Indians had high skill jobs in the industry outside of India, and it turned out convincing them to come back to help their country was a non-issue (though generous incentives were provided).  Getting the technology on the other hand was a bit more work.

The first trip of the technology transfer team of SCL was to Hitachi in Japan.  Negotiations with Hitachi were grueling, and while not unproductive, did not yield the results SCL wanted.  Hitachi was happy to license some designs to SCL, for a high fee and royalties, but did not want to immediately help create the 3-5 micron production fab that SCL envisioned.  Hitachi called thei ‘one step at a time’  whereas the Indians wanted to go all in from the start.  Hitachi agreed only to help (some) with a 5 micon process) and only to license products for digital clocks and watches.  The SCL team then turned to the United States, likely expecting similar results.

The chosen company in the USA was AMI (American Microsystems Inc), a company with 7-8 times the turnover of Hitachi.  AMI was at the time the largest maker of custom ICs in America, as well as a very large provider of second source ICs  (such as the 6800 and 9900 CPUs).  AMI’s CEO Roy Turner readily agreed to help SCL, much to the surprise of their negotiation team, and on the very first day offered SCL AMI’s 5 micro CMOS and NMOS processes, with the option to license their 3 micron CMOS and NMOS processes within 4 years of the agreement becoming effective.  AMI also offered SCL access to all of AMI’s standard products catalog, as well as the possibility of joint development of additional products, all at a simple 50/50 split.  AMI even offered to help with the technology export license that would be required by the US State Dept to transfer the fab tech to India.  The agreement was signed in April of 1981.

Read More »

Tags:
,

Posted in:
CPU of the Day

July 15th, 2021 ~ by admin

The Intel 8086 Gets ICE’d

A while back I received this rather unusual board. Made in 1979 it was clearly a prototype, being a completely handmade wire wrapped board made ona standard Intel MULTIBUS breadboard from 1974. No CPU was present, but a 3M TEXTOOL socket for a CPU is. The paper sticker on the board reads ICE-86/86A/88/88A TEST FIXTURE K95 and DSO TEST ENGINEERING.

ICE-86/86A/88/88A Prototype Test Board

The ICE-86 (and ICE-86A/ICE-88/88A) were all MULTIBUS In circuit Emulators Intel made for the iAPX86 processors in 1979-1985 or so. These were 3 board sets, with a emulator pod (containing a 808x processor) meant for developing and testing x86 software and hardware designs. The boards would plug into a Intel MDS or MDS2 system (or Intel Intellec) and with supporting software, formed the basic of much of the original x86 hardware/software design of the era.  I assumed this board was part of that set, but alas, while researching it I got ICE’d.

Remember wire wrapping? And using all one color for everything?

The ICE-8x systems are based on a Intel 8080A processor, so I checked the pinout on the socket on the prototype, VCC/GND did not match that of an 8080A CPU, it DID match that of a 8086.  Furthermore the clock generator on the board is a P8284, thats the clock generator for the 8086/88 processor, taking the 15MHz crystal input, and outputting a 5MHz clock. The 8080A processor of the ICE-86 emulator system uses a 8224 clock generator (which is a divide by 9 clock generator, usually running on a 9-10MHz or 18-19MHz Crystal).  To make matters more interesting I also have a couple later board (1982 production) which are clearly production (likely limited as the part numbers are still hand written) of the prototype.  They are labeled as ICE-86 TEST – 1981.

Production version of the ICE-86 TEST made in early 1982. Curiously this is a MULTIBUS board but about an inch (2.5cm) taller than standard. This was probably not meant to remain in a host system for long.

The prototype has a switch on it labeled ‘ICE’ for switching the board from 8086 mode to 8088 mode, while the production board lacks such a switch (its designed solely for 8086 processors).   The prototype has a pair of D3604A 4k (512×8) PROMs, the production version is running a pair of 3628A 8k versions,m which were not available when the prototype was made.  So what then would the purpose of such a board labeled ICE, that well, isn’t an ICE?

These board’s were designed for testing ICE emulators, and eventually giving end users the ability to test their software on a known working 8086/88 system.  Generally when using an emulator, you would plug the probe into the processor socket on the target system you are developing and the emulator system allows you to set breakpoints, check register values, memory, etc.  These test boards would allow you to develop at least basic software WITHOUT having a target system of your own, as well as to be able to offer an in system test of the entire ICE emulation.  The production boards being labeled ‘ICE 86 TEST’ seem to be just this, how to ensure the proper function of the by then, thousands of ICE-86/88 board sets now in use.  There was very likely a separate board for testing the ICE-88/A systems as well.  Plug the tester into a MULTIBUS slot on the host system, plug the probe cable into the ZIF socket, and run the testing software.  The ROM’s on the proto board are labeled ‘STIPOL’ which is cryptic at best, but onc of their purposes would likely to be to provide STImulus of somesort to the ICE emulator being tested.

The test boards would also give developers either peace of mind or headaches, when designing for the x86, is the problem the emulator not working? or is their a bug in my design?  Now I need to find boards from an actual ICE-86 system.

Tags:
, ,

Posted in:
Boards and Systems

June 27th, 2021 ~ by admin

Navy Hydrophone Noise Canceller: Weitek 3332 Floating Point Based DSP

Navy 55910 ASSY 0120811 Eight Channel DSP – Serial #1

I got these boards some time ago, hoping to be able to figure out more about them but alas, information is very sparse, but they are such good looing boards, with impressive technology for the day, I had to post them

These boards came out of a US Navy system labeled “Hydrophone Noise Canceller”  which seemed to be part of SONAR test system at a University.  These date from the late 1980’s to the early 1990’s. The system was comprised of 16 boards, 12 8 Channel DSP board, a control board, and 3 Ethernet Boards,  Each of these boards is a very heavy 4 layer PCB, with pretty much everything socketed.

The DSP Boards are based on the Weitek 3332 FPU. These are full 32-bit Floating point datapaths (MULT/DIV/ADD/SUB + Registers) and made on a CMOS process.  They operate on a 100ns (10MHz) clock.  THese are the higher end version of the 3132, they have a full 3 busses versus the single bus of the 3132.  These 3 busses add a lot to the pincount (168 vs 144) and thus cost but make designing a system more flexible, no bus sharing to worry about.  The 3332 was designed specifically to support high speed DSP and graphics processing.  It performed the ‘core’ of a DSP, allowing the user to build around it and make essentially a custom DSP for their application (unlike the purpose built TI TMS320 series of DSPs also available at them time) On the board they are backed by 4 Cypress CY7C128 2K SRAM per processor (8K total).  There is no clock crystal on the board itself, which is typical of a system like this.  To ensure everything stays in synch, the clock would be provided by the control board and distributed to each of the boards on the bus.

Navy 55910 ASSY 0125321 Controller A80386DX-25 (20MHz) Serial #2

The Control Board runs an Intel A80386DX processor.  On this particular board its a 25MHz chip, but note the crystal next to it is an 80MHz crystal.  A 386 internally divides the clock by 2, so the 80MHz clock is most like divided by 2 externally resulting in a 40MHz input to the 80386, and a 20MHz CPU clock.  I had another controller board with a 20MHz 80386 so they probably just used what ever they had available.  This is Serial # 2 afterall.  The 386 is supported by 4 27C256 EPROMs and 8 32K (CY7C198) SRAM chips, giving it 256K of SRAM.  In addition is 12 8k (CY7C185) 8K SRAM chips each with there own Pipeline Register.

A typical 386 system would have several MB of RAM, but this system is set up for real time data processing, as a DSP system, so the only data that needs to be in RAM is the control program itself, so 256K of system RAM is a great plenty.  Additional RAM is likely used solely for buffering data from the Hydrophones.

It would be interesting to know what this board was used for in more detail, but even if that never happens its an interesting board for its time.  Clearly a vast amount of effort went into designing and building the system.

 

Tags:
,

Posted in:
Boards and Systems

June 19th, 2021 ~ by admin

Intel P54CM Pentium: The Dual Pentium Processor

Intel Pentium P54CM – Q0475 Engineering Sample from November 1993

Today dual processors are incredibly common, even in home computing, and multicore processors even more common, but there was a time when this was not so.  There were of course multi-processor systems in the 80’s and early 90’s, but these required extensive additional hardware to support them.   Three main concerns for design multiprocessing systems are how to efficiently handle interrupts (which CPU handles what), how to ensure the caches are kept current (and not used if they aren’t), and how do processors share the same bus.

Bus sharing was largely handled already as busses have long been shared by all sorts of devices.  Interrupts were made easier by the release of the APIC (Advanced Programmable Interrupt Controller) standard by Intel in the early 1990’s.. The first version of this was implementing in the 82489DX IC.  Each CPU (486 or original P60/66) would need its own 82489DX (Local APIC) and then yet another one to work as an I/O APIC.  Clunky, but it worked.  The BIOS and OS were designed to help with cache coherency coupled with the a modified MESI protocols in the processors themselves for keeping track of what cache items were valid or not.

P54CM50-75 Q033 – Early October 1993 Sample – 75MHz modified Socket 5

After the release of the first (P5 Socket 4) Pentiums Intel decided to integrate  an APIC onto the CPU core itself.  This greatly simplified dual processor setups.  Within only a few months of the release of Socket 4, Intel was already working on the P54C Pentium.  These were to be on a whole new socket, Socket 5 (much to the annoyance to those who had just dropped some serious coin on a Socket 4 system).  The Socket 5 systems, using the Intel Neptune 430NX chipset, would support dual processor systems.  To do this Intel designed a separate Pentium Processor core called the P54CM, and originally, a separate, slightly modified socket for it.  The secondary socket had a slightly different pin out, and was to run the P54CM processor, OR, could be used as an OverDrive socket, with the Overdrive becoming a second CPU (why both, no one is entirely sure).

P54CM50-75 Q033 – Mod Socket 5 – Oct 1993 Q0475 Nov 1993 – Standard Socket 5

Samples of the P54CM debuted in October of 1993 using the new pinout.  Samples from just weeks later had reverted to the standard Socket 5 pinout, clearly someone at Intel decided that yet another socket (and package) design would be uneconomical.  The separate core, however, remained.

Early Pentium Print Ad shows the modified Socket.

The P54CM core was only produced in a very few specs, SX874 B1 stepping in STD Voltage (3.135V–3.465V) and the SX942 (STD) SX943 (VRE 3.3V–3.465V)  and SX944 (MD: faster timings on several pins/3.135V–3.465V) series in the B3 stepping.  There were also several ES versions made: Q033 P54CM50-75, Q0475, Q0519 and Q0520 with the B0 stepping and Q0543 with the B1 stepping.  These processors, including the production versions, were incredibly rare.  Very few companies used them in actual machines.  Why? Because a normal (providing it supported dual processing) P55C could be ran just as well.  The only real difference in the P54CM core was the DPEN/ output pin was driven low on RESET.  On a P54CM this pin is an output that tells the primary processor ‘hey a second processor exists’ while on the standard P54C, DPEN/ is an input.

SX874 – P54CM-B1 (with the FDIV bug) from October 1994

It turns out that the P54C/CM core ALSO has a CPUTYPE pin that can be be set to tell a system that the processor is a secondary processor or a primary (and early Pentium Dual boards had a jumper to do just this.)  You didn’t actually NEED a P54CM as the secondary processor. a normal P54C would work just fine.  There was even some trickery to allow a system to boot off of a secondary P54CM CPU, not officially supported by Intel, but in systems designed for redundancy, the DPEN/ pin could be overridden and the P54CM used to boot a system (normally the primary CPU would handle all the boot up duties and only enable the secondary CPU once it was ready).

Later Socket 5/7 Pentiums (C0 and later steppings) supported multiprocessing natively with a few exceptions.  The SU114/SL25H Pentium 200s did not have a functional APIC so thus were not DP compatible.  These were even mismarked by Intel, with the marking ‘VSS’ on the back.  That last ‘S’ means they were tested to support UP, DP and MP configurations, when in fact they were not, the code on the back should have been VSU (‘U’ means they were tested for MP, and uniprocessor, but NOT DP, as DP required a working APIC).  The SY045 (200) and SY037 (166) were also ‘VSU’ processors, not tested for DP use, likely because of some issue with the APIC.

Mismarked SU114 (VSS) and correctly marked SY045 VSU

Intel Overdrive processors suffer a similar fate, they will not run in the primary socket of a DP system, but will in the secondary socket.  This is mostly likely because the DPEN/ is not supported as an input on the Overdrive, so it wouldn’t know a secondary processor exists, a shame really as a dual OverDrive system would be pretty neat.

At he beginning of the P5 era Intel seemed to be all in on DP systems, but with the coming release of the Pentium Pro, they began to use Dual Processing as a way to differentiate their products.  DP support was removed in the next Pentium chipset (the FX Triton) only to later return in the HX Triton II.  The VX and TX Pentium Chipsets also lacked DP support.

Quite famously later in the 1990s Intel marketed the Pentium II/III with multi-processor support, and sold the Celeron as uniprocessor only.  It turned out that the lowly Celeron was quite happy to run in DP configuration, much to the annoyance of Intel, but joy of enthusiasts around the world.  Perhaps someone will figure out a way to run Pentium Overdrives in dual processor systems, if there is a will there tends to eventually be a way.

 

Posted in:
CPU of the Day

May 17th, 2021 ~ by admin

First & Last AMD Socket A Athlons – Thunderbird vs. Barton – Part 2

Continuing our exploration of the evolution of the Socket A architecture.  See Part 1 here

Test Stand

For tests of all processors with a final frequency of 1 GHz, several processor / RAM operating modes were selected: 100/100, 100/133, 133/133, 133/166 MHz, priority was given to modes with the highest RAM frequency.

The main components of the system:

CPUs:

AMD Athlon XP-М, (10x 100 и 7.5x 133) 1000 MHz, Barton
AMD Athlon (B), (10x 100) 1000 MHz, Thunderbird
AMD Athlon (C), (7.5x 133) 1000 MHz, Thunderbird

Motherboards:

  • ASUS A7V, chipset VIA Apollo KT133
  • ABIT KR7A, chipset VIA KT266A
  • EPOX EP-8K3A, chipset VIA KT333
  • EPOX EP-8K9A7I, chipset VIA KT400A
  • EPOX EP-8RDA3I, chipset Nvidia NForce 2 Ultra 400

Memory:

  • OCZ PC3200 EL Platinum Edition (OCZ4001024ELDCPE-K), 512 Мб х2 (PC3200) CL=2

Videocard

  • Gainward – GeForce 6800 Ultra AGP 256 Mb (Forceware 81.85).

Testing was carried out in Windows XP Sp3 using the following software:
• Super Pi mod. 1.5XS (1M task)
• PiFast v.4.1
• WinRAR x86 v. 5.40
• Cinebench 2003
• 3Dmark2001SE Pro b330
• 3DMark 2003 v.3.6.1
• AIDA64 5.50.3600
• PCMark 2004 v.1.30
• Max Payne
• Far Cry
• DOOM III

Tests

When testing all platforms, I used the same Windows XP SP3 distribution with the same list of running services and settings. Gainward GeForce 6800 Ultra AGP 256 MB together with Kingston V300 60 GB SSD remained unchanged companions throughout the tests. Windows XP SP3 was installed from scratch for each platform. All VIA chipsets used VIA Hyperion 4-in-1 Driver version 4.51. For the video card – Forceware 81.85. All unnecessary services were disabled, the system was tuned to high performance mode.

Read More »

Posted in:
Boards and Systems

May 14th, 2021 ~ by admin

First & Last AMD Socket A Athlons – Thunderbird vs. Barton – Part 1

Introduction

The AMD Socket 462 or Socket A, was a rather interesting and long-lasting CPU Socket. The first models of Socket 462 processors appeared in the summer of 2000, they were just the first representatives of the AMD Athlon “Thunderbird” in a ceramic case with a clock frequency of 600 MHz, and with 256 KB of L2 cache , an effective system bus frequency of 200 MHz, with MMX support instructions and their own 3DNow !, of course, there was no question of any SSE in those days. Produced ” Thunderbirds” at 180 nm. tech process, the operating voltage was set in the range of 1.70-1.75 volts, and the maximum heat dissipation was 72 watts for the older model 1400 MHz versions.  These replaced the old Slot A cartridge based Thunderbirds, made possible by the L2 cache being moved on die instead of off die (in similar fashion to Intel’s Coppermine Pentium IIIs moving to S370 from Slot 1).

Thunderbird die exposed

The last representative that was designed for Socket 462 was AMD Athlon XP+ using the “Barton” core, released in early 2003, which retained its position throughout 2004. With “Barton” the ceramic case is a thing of the past, being replaced by a Organic PGA package. The process has decreased to 130 nm, the L2 cache capacity has doubled, the system bus frequency has doubled, and the clock speeds have exceeded 2.2 GHz.

The fastest model had a real frequency of 2200 MHz and a performance rating of 3200+, the operating voltage was 1.65 V, and the TDP was 77 W with a 400FSB.  These was also another AXDA3200 with a 333 FSB, this actually clocked slightly faster as 2.333GHz, but was given the same PR rating due to its slightly slower FSB. The processor acquired the first generation SSE instructions, and the motherboards created for it in that day now added support for dual-channel operation of the RAM. If we add here that the first motherboards based on Socket 462 worked with SDRAM memory, and the subsequent ones with DDR-SDRAM, then according to a number of indicators there is a twofold increase in the main characteristics of the platform within the framework of one socket.

Such a funny comparison reminded me of today, where from the time the first generation of AMD Ryzen processors appeared in 2017, until the last (fourth gen), which debuted at the very end of last year (2020), all processors also had one AM4 socket. Ryzen performance gains across all four generations are clearly exemplified by the following slides:

AMD hasn’t had much of a problem with processor support before, although AMD has officially announced that Ryzen 5000 series desktop will only be supported on boards with 400 and 500 series chipsets. Therefore, on motherboards released for the first generation Ryzen, it will not work to use the latest generation processors in an official way. Although there is information on the network that there are cases of using Ryzen 5000 series processors on motherboards with the older X370 chipset, but the official position of AMD has already been announced above.

In the wake of such analogies, I thought, why not compare the first and the last Athlons for Socket 462 on several motherboards at the same clock frequency, with the same system configuration? You will find out the result of what came out of this by reading this article to the end.

Continuation of the Idea

The essence of the idea is simple – take the first AMD Athlon based on the “Thunderbird” core with a 200 and 266 MHz system bus and a clock speed of 1 GHz and an AMD Athlon XP representative on the “Barton” core with a similar round frequency of one GHz, and compare them with each other to find out how much the first generation loses to the last one. During the existence of Socket 462, several generations of processors with different cores have changed on it: Thunderbird – Palomino – Thoroughbred – Thorton – Barton.  This will be an interesting test of raw architecture improvements of the core. all other things being as equal as they can be.

Read More »

Posted in:
Boards and Systems

March 23rd, 2021 ~ by admin

CPU of the Day: National Elentari x86 and What Lies Beyond – Part 2

Last week we talked about a little known, but not unheard of 486 built by National Semiconductor called the NS486 Elentari.  As interesting as a non-Intel x86 architecture is, thats not what led me down the aforementioned rabbit hole.  This is what did…

The entrance to the Rabbit hole lay in an issue of Boot Magazine.

This small blurb in Boot magazine from back in August of 1997 is all it took.  What was this mysterious N7 processor that even Boot Magazine felt the need to mention?  It is being compared to the Cyrix MediaGX, which coincidently National had agreed to merge with right about the time this issue went to press, a fact that may or may not have been known to the authors at the time.  Regardless, the deal wasn’t officially completed until 1998, so that meant this mysterious N7 had been in development for some time, and probably had reached something a bit more then a glitter in an engineers eyes….and indeed it had.

Mentions of the ‘N7’ in the press at the time start in early 1996 and continue through 1997, this indicates that the N7 was likely planned soon after the beginnings of the NS486 core.  Its very likely that the NS486 was to be a stepping stone to the bigger more powerful N7.  The N7 is described as a 133MHz ‘Pentium compatible’ processor.  The NS586 core (as it was called by National) was an enhanced NS486, with the pipeline extended to 5-stages and using Nationals new 0.35u process, just as some had originally suggested for the NS486.  This resulted in a 3.3V processor running at 133MHz.

NS586 5-Stage Pipeline – Cache could happen on Stage 3 or 4 and Memory Access was non-blocking (image (c) MPR)

The NS586 was planned to be at least 2, and most likely 3 different processors (in similar fashion to the NS486SXF and NS486SXL).  The common core to all of the designs was the 5-stage NS586.  This took the NS486 and greatly enhanced it, adding 8 of L1 cache (4K Instruction + 4K Data). The pre-fetch buffer is doubled in size to 32-bytes as well as some Out of Order execution support.  The decode and memory/cache access logic is also further optimized.  Cache accesses can be shifted between the 3rf and 4th stages as needed, allowing modifying, loading or storing of cache data in two consecutive cycles.  Unlike the K6 or PII the NS586 does not use intermediate instructions in executing x86 code, it directly executes each x86 instruction (like the NS486 before it).  The updated pipeline executes all code as fast or faster then a 486 and in some cases faster then a Pentium.  National claimed that the 133MHz core would perform as a Pentium 95, compared to say a AMD 5×86-133 being rated at a Pentium 75 level. As with the NS486 before it, it lacked an onchip FPU.

The NS586 core was not exactly small, even on the 0.35u process it took 930,000 transistors (426,000 of which is the cache).  This resulted in a die size of around 25.8mm2. (roughly the same size as the core only NS486 on 0.65u).  And it was intended to be even bigger…

Lise the NS486 before it, the NS586 was to be integrated with a variety of peripherals, and this time National was going big on the integration.   At the top was the N7-Lite, which integrated the NS586 core with a SVGA 2D graphics controller, TI TMS320C50 based DSP and Audio controller.  This in addition to a PCI bus, DMA controller USB, IrDA, and other normal peripherals of the era.  The N7-Lite does not have a traditional DRAM controller, instead using a controller geared towards a UMA (Unified Memory Architecture) to use the system RAM for the CPU and the onboard GPU. The GPU is designed to support only a TV out (NTSC PAL and SECAM outputs) as this was to be a full NetworkPC on a chip, basically what became Set-Top Boxes of the 1990’s.

NS586L and N7-Lite shared the same core but with different peripherals as well as busses (image (c) MPR)

On the low end was the NS586L, which dropped the audio, video, and PCI bus and added a standard Pentium compatible VL-Bus, ISA, and DRAM/ROM controller, this is more of a enhanced NS486 with a similar set of peripherals, and likely would be the logical successor to designs using the 486.  Speed was to be 100MHz (again to differentiate it from the N7-Lite) and estimated cost was to be $25/chip.  Pricing for the N7-Lite was not announced. It’s unknown how far these designs progressed, whether actual silicon was made or not.  Having a transistors count and die size was indication they were pretty far along, perhaps having the chips floor plan finalized and working on taping it out for masks (6 months seems reasonable for samples after tape out).

Both of these chips were scheduled (as of October of 1997) to begin sampling in the second quarter of 1998.  Its very likely that a third chip was planned, if only due to the naming of the ‘N7-Lite.’ ‘Lite’ indicates that it is something less then the full version, and the Boot blurb (as well as some other press mentions) only refers to the ‘N7.’  In all likelihood there was to be a top end version known simply as the N7.  Such a chip would likely replace the TV only GPU in the N7-Lite with something that supports standard CRTs or LCD panels, perhaps more RAM support and Ethernet and/or an EIDE hard drive controller (something that the competing Cyrix derived ST STPC included).  We may never know, as despite the efforts of the engineering team the project was inevitably canceled in favor of the newly acquired MediaGX line from Cyrix.  This line continued to be developed at National even after they sold the rest of Cyrix off to VIA (eventually selling the MediaGX division to AMD).

Sadly they didn’t

Perhaps someone who worked in the Arador group at National can offer more insight, but until then we can only speculate of what could have been another interesting processor on the x86 scene in the 1990’s.

Posted in:
Uncategorized

March 18th, 2021 ~ by admin

CPU of the Day: National Elentari x86 and What Lies Beyond – Part 1

National Semi NS486SXF-25 Rev C0 -1999

While I was casually reading an issue of ‘Boot’ Magazine from 1997 I was sent down the rabbit hole by a mention of a processor in a small blurb in a footnote of an article.  Just a few lines really is all, but about a processor I was not familiar with, an x86 one at that! So nearly a month later, I have emerged from the rabbit hole.  We will begin not with what sent me to the hole in the first place, but when and where the hole itself came from, and that is the year 1995, the place? National Semiconductor.

As mentioned in the 486 Overclocking article, the 1990’s were a boon for up and coming x86 processors.  In some ways it was similar to the processor bonanza of the 1970’s but centered on x86.  Many companies wanted to have a go at the x86 architecture market.  National Semiconductor was of course interested in making something with x86 as well.  They rightly decided that a head to head competition with Intel for mainstream PC processors wasn’t the best idea, but that embedded computing, low cost set top box (as they later would become) and ‘Network PC’s’ would be a good market.  The goal was to design a simple efficient x86 processor and integrate it with many peripherals, and sell it for $20-30 each.

Elentari Core. 16-byte Prefetch Buffer, 1K Cache ,16-bit Data Bus and support for 2 8M DRAM Pages

 

NS486SXF/L Block Diagram (SXL omits blocks in dashed lines)

The core project began in very early 1995 (or late 1994) and was known as the Elantari, Queen of the Stars in Lord of the Rings Mythology.  The Elantari (aka the ESF94001) had three priorities in its development (in order): 1) Schedule, 2) Low Cost, 3) Performance.  Time to market was essential, even at the expense of performance optimization. The core (which the Marketing dept quickly renamed to the NS486) was to be a 486 compatible core (using protected mode only) with some optimizations and was organized officially under a new unit at National Semiconductor called the Arador Unit (someone really liked LotR). Target speed was 25MHz at 5V on Nationals 0.65u process using a very simple 3-stage pipeline (Fetch/Decode, Execute, Write Back).

NS486SXL-25 No Rev Marked 1996 (courtesy xSecret)

Balancing cost and performance meant that die area should be minimized, as this effects yields and parts per wafer.  This, on a 0.65u process, allowed for a small area of cache.  National ended up, after a fair amount of analysis, going with a 1K direct mapped instruction cache (that can bus snoop) and a 16 byte prefetch buffer.  This is in great contrast to the Intel 486 which had a 8KB unified cache (and 16K on later 486s).  But for embedded use instructions have a better performance increase when cached then data.  Cache also presents some difficulties with real time computing, as its difficult to know how long an operation will take if you don’t also know whether it s from cache or the main memory.  National provided a method on the NS486 to load and lock the cache with a set of instructions that would ALWAYS operate out of cache.  This combined with assigning one DRAM page to Data, and another for stack use, made timing more predictable and consistent when needed.  As part of the development process National used IP they had licensed from another IIT, whom had earlier designed a 486 class processor. IITs IP was not used in the NS486 itself, but was used in helping debug, design and develop it and its testing environment.

NS486SXL-25 Rev A – SXL unique die – 1998

The NS486 core lacked both a FPU and MMU, and had a 16-bit data bus.  This allowed for a fairly small core size.  The core alone took up about 256,000 transistors (roughly half of what the Intel 486 integer core used) and on the initial 0.65u 3-layer process results in a core die size of 29.6mm2 including the cache. (the SXL die with limited peripherals pushed that to around 64mm2)  The short pipeline greatly restricts the speed, it never made it above 25MHz (though 33MHz was apparently achievable.

National Semiconductor by this time had become dominate in integrated peripheral chips, led by its ‘SuperIO’ chip line, and it was this integration that made the NS486 unique.  National designed two versions of the NS486, the NS486SXF with a full set of peripherals, and the smaller NS486SXL with a few less.  The integration of peripherals was one of the most challenging aspects, the core itself is relatively simple, but adding other features, often with different clock and signal domains is much harder to design and test.  This is where National’s expertise on SuperIO chips came in handy.

The other challenging aspect of a x86 design in the 1990’s was from the legal department.  Intel claimed that even a clean design of anything x86 ‘MUST’ violate at least one Intel patent.  National however had designed the NS486 from the ground up, including the microcode, AND as a backup, also possessed a license from Intel dating back to the 1970’s (it was that license that helped lead to the National/Cyrix merger).

  NS486SXF NS486SXL
Package 160PQFP 132PQFP
Cost $25 $15
486 Core
X
X
DRAM Controller
X
X
DMA Controller
X

LCD Controller
X

ISA Bus Interface
X
X
External Bus Master Controller

X
UART/IrDA
X
X
ECP Parallel Port
X

PCMCIA Controller
X

Real-Time Clock, Timers
X
X
Programmable Interrupt
X
X
Reconfigurable I/O
X
X
Programmable Chip Select
X
X
3-Wire Serial Peripheral
X
X

NS486SXL Rev A0 die – matches package markings – Still 0.65u (courtesy aberco)

Initially both the NS486SXL and SXF used the same die, with the SXL having some of the onchip features disabled. National planned on making a seperate die later for the SXL to further reduce costs.  They did this in around 1998.  Their goal was also to shrink the design to their upcoming 0.35u process but it is unknown if they successfully did this (dies from 1998 continue to be of the 0.65u variety).

Initial samples were available by early 1996, a rather quick development.  The NS486 was well supported in both hardware and software.  It supported a number of common real-time operating systems of the time, including pSOS+, QNX, VxWorks, and VRTX. It did not however support DOS, having no real mode support. In 1997 the NS486SXF was used to implement Jav Nanokernel, a Java based OS running the Java VM directly on hardware. Hardware vendors included PARVUS (NS486 based PC104 board), BCT (Dev Boards) and several others making ready made NS486 based SBCs.   In November of 1996 National released a full Web Browser based Network computer Reference design using the NS486 called the ‘Odin’  This was the first sub-$200 web browser capable computer of the time.

NS486SXL die – Peripherals take up about a third of the die (die photo from aberco)

In 1997 things got a bit more interesting.  National Semiconductor decided to merge (in all reality it was an acquisition) with Cyrix.  The NS486 continued to be made, but by 1999 National listed it as ‘not recommended for new designs’  It would also appear that some things never really got finished.  Datasheets up through at least Dec of 1997 were still ‘Preliminary’ though the silicon had been produced for sometime.  Production of the NS486 continued well into the 2000s, with chips being made at least into 2003 and probably later.

The NS486 Performance in integer tasks was pretty good. In some cases beating the Intel 486DX. THis is largely because of its optimized instruction timing, many are single cycle, much faster then other cores.

At the time of its introduction it had little competition (in the x86 realm).  Intel had the 386EX and AMD had an the 386SC (what later became the ElanSC300 line).  Both of these were 386 class parts that were slower (and in the case of the 386EX) more expensive.  Intel themselves did not have a good embedded 486 option largely due to lack of trailing edge fab capacity.  Most of their fabs had been (or were being) converted to higher end processes to make new Pentiums and P6 chips, while their older fabs were filled to the brim with Intel’s then booming chipset business.

ARM 610 Motorola 68349 Hitachi SH7032 NEC V820 MIPS LR33020 Intel i960CA AMD 386SC Intel 386EX NS486SXF
Frequency 20 MHz 25 MHz 20 MHz 25 MHz 25 MHz 25 MHz  25 MHz  25 MHz 25MHz
Dhrystone MIPS 18 9 16 18 14 30 5.4 7.1 12
FPU No No No Yes No No No No No
MMU No No No No No No Yes No No
Cache None 1K Inst None 1K Inst 4K/4K 1K Inst None None 1K Inst
Periphs. Some Some Some Some Some Some Full Set Some Full Set
Transistors 359k 550k 593k 380k 700k 600k 335k ?? 500k*
Process 1.0u 0.8u 0.8u 0.8u 0.7u 1.0u 0.7u 0.8u 0.65u
Price $20 $33 $30 $80 $67 $90 $49 $33 $25

*Estimated Core = 256k Cache = ~50k

It was suggested that if National lengthened the pipeline of the NS486 to the then standard 5-stages, and moved it to their new 0.35u process that it could ‘easily’ hit 133MHz at 3.3V.  But what embedded designer would want to have to deal with that fast of a processor?  It would seem that the NS486 team had ideas beyond just the purely embedded market, as something more then the NS486 was their ultimate goal, and exactly what led me down this Rabbit hole……

In Part 2 we’ll look at what National developed from the NS486, and if it wasn’t for the MediaGX they acquired, very likely would have made it to market.

Posted in:
CPU of the Day