Archive for the 'CPU of the Day' Category

November 26th, 2016 ~ by admin

HP 3000 Series 33: 16-bits of Sapphire

HP 3000 Series 33 - 16-bits 11MHz. They were integrated into the desk, with a 20MB hard drive on the left, and the computer on the right (with a 1.2MB 8" Floppy Drive)

HP 3000 Series 33 – 16-bits 11MHz. They were integrated into the desk, with a 20MB hard drive on the left, and the computer on the right (with a 1.2MB 8″ Floppy Drive)

In 1972 HP introduced the HP 3000 line of minicomputers.  Mini of course meaning they didn’t take up the entire room.  They competed against the likes of the DEC PDP-11 and the TI-990.  Original called the System/3000 (apparently to compare favorably to the IBM System/360) they were renamed the HP 3000.  These were 16-bit computers employing a stack based design,  They had no general purpose registers, all operations operated directly on one of several stacks.  The first models were designed using bipolar discrete logic and ROM for the microcoding.  This allowed for good performance but was expensive and large.  Just the processor for the high end Series III of 1978 was 9 boards.

The Series 33 (and the smaller series 30) were to be cost reduced versions, to slot in between the high end Series III and the newly introduced HP 300 microcomputer.  In order to do this those 9 boards for the processor needed to be greatly simplified.  HP engineers decided to use a processor they had already, the CPU from the HP 300 Amigo.  The HP Amigo was a bit of a disaster for HP, after 5 years of development, including

1AB4-6003 RALU -Silicon on Sapphire - 8000 Transistors

1AB4-6003 RALU -Silicon on Sapphire – 8000 Transistors

designing an entirely new processor it was a failure in the market, suffering from management and politics more then from a technical standpoint (it was not file system compatible with the 3000 line and that caused some concerns).  After being released in 1978 it made only around $15 million in sales and was canceled after a short time.

Part of that 5 year development was for its 16-bit VLSI processor.  In order to get the speed needed for the HP 300 and at a low price, the pressor needed to be a VLSI design (a few chips rather then a few boards).  In order to fit in a smaller pedestal cabinet it needed to energy efficient and heat efficient as well.  HP’s engineers decided to use a Silicon On Sapphire (SoS) CMOS design, a process HP had some great experience with in the MC2 processor.  SoS is a form of Silicon on Insulator, a manufacturing method that is very common in today’s IC’s (using Silicon Dioxide).  Instead of an IC being made on a pure silicon wafer, the silicon is deposited on a wafer of sapphire.  Sapphire is an excellent insulator which wels reduce leakage currents, as well as spurious currents from such things as radiation.  Radiation tolerance is perhaps what SoS became known for most, but its low power performance was what HP was after in the 1970’s.

Die shot of the RALU with labels.

Die shot of the RALU with labels.

The processor for the HP 300 was designed into 3 separate IC’s, totaling 20,000 transistors (some documentation says 25,000) and running at a clock of 11MHz.  The processor control unit (PCU 1AB2-6003) chip generates microinstruction addresses that control the other two chips: the register, address, skip, and special (RASS 1AB3-6003) chip and the register, arithmetic, and logic unit (RALU 1AB4-6003) chip.

The PCU contains 5000 transistors and handles the microsequencing, clock generation, and a sub-routine save stack.  Clock generation is interesting as its single phase, and variable.  The PCU can lengthen or shorten the clock period as needed.  If a memory operation needs longer to complete the PCU simply holds the lock period longer.  Data path functions are handled by the RASS and RALU chips.  The RASS contains about 7000 transistors and contains a register file for the second operand to the RALU as well as address generation and skip logic.  The largest of the chips is the RALU.  It handles all of the standard ALU functions as well as hardware multiply/divide.  It also contains 16 registers: 8 general purpose registers, and 8 for address storage.  Together these three chips form the CPU of the HP 300 and consume only 1Watt of power.  The processor is a microcoded design so the actually instruction set resides in ROM, in this case on a separate board.  In the case of the HP 300 this also allowed the I/O processor duties to be microcoded into the general processor, eliminating another subsystem.

Read More »

Tags:
, ,

Posted in:
CPU of the Day

November 5th, 2016 ~ by admin

GRAPE-6 Processor: A Gravitational Force of Reckoning

GRAPE-6 Processor - 90MHz

GRAPE-6 Processor – 90MHz -2000

Understanding the movements of the stars has been on mankinds mind probably since we first stared into the sky.  Through the ages we can predict where a star or planet will be in the sky in the next few months, years, even hundreds of years, but to be able to predict the exact orbital details for ALL time is rather more tricky.

This helps understand how planetary systems form, and the conditions that make that possible.  It allows us to see what happens when two massive black holes pass each other by, will the merge? will they orbit? will one go rogue?  These are interactions that take millions of years, and thus we need to calculate the gravitational forces very accurately. This isnt a terribly hard problem for two bodies, and is doable for three with little fuss, but for numbers of bodies greater then that, the calculations grow rapidly, on the order of N2/2.

In the late 1980’s Tokyo University began work on developing a computer to calculate these forces.  Every gravitational force had to be be calculated with its effects on every other body in the system.  These results were then fed to a commodity computer for summation and final results.  This made the Tokyo project a sort of Gravity co-processor, or as they called it a Gravity Pipeline, GRAPE for short.  The GRAPE would do the main calculations and feed its results to another computer.

Read More »

Posted in:
CPU of the Day

October 20th, 2016 ~ by admin

Processors to Emulate Processors: The Palladium II

Cadence Palladium II Processor MCM 1536 cores - 128MB GDDR - Manufactured by IBM

Cadence Palladium II Processor MCM 1536 cores – 128MB GDDR – Manufactured by IBM

Several years ago we posted an unusual MCM that’s purpose was a mystery.  It was clearly made by IBM and clearly high end.  While researching another mystery IBM MCM both of their identities came to light.  The original MCM is an emulation processor from a Cadence Palladium Emulator/Accelerator system.

In the 1990’s IBM had been working on technology to make emulating hardware/software designs more efficient as such designs got more complicated.  At the time it was most common to emulate a system in an FPGA for testing, but as designs grew more complex this became a slower and slower process.  IBM developed the idea of an emulation processor.  This was to be known as CoBALT (Concurrent Broadcast Array Logic Technology).  It was licensed to a company called QuickTurn in 1996.  At its heart the QuickTurn CoBALT was a massively parallel array of boolean logic processors.  Boolean processors are similar to a normal processor

Here is a flipped (and very rough) die from a Palladium II. You can make out the very repeating design of the 768 boolean processors.

Here is a flipped (and very rough) die from a Palladium II. You can make out the very repeating design of the 768 boolean processors.

but only handle boolean data, logic functions such as AND, OR, XOR, etc.  Perhaps the most well known, is the boolean sub-processor that Intel built into the 8051, it excelled at bit manipulation.  The same applies for the emulation processors in CoBALT.  Each boolean processor has at its heart a LUT (Look Up Table), with 8-bits to encode the logic function (resulting in 256 possible logic function outputs) and the 3 gate inputs serving as an index into the LUT, as well as the associated control logic, networking logic, etc.

A target design is compiled and emulated by the CoBALT system.  The compiling is the tricky part, the entire design is broken down into 3-input logic gates, allowing the emulator to emulate any design.  Each processor element can handle one logic function, or act as a memory cell (as many designs obviously include memory).  The CoBALT had 65 processors per chip, and 65 chips per board, with a system supporting up to 8 boards.  This 33,280 processor system could compile 2 Million gates/Hour.  The CoBALT plus sped this up a bit and supported 16 boards, doubling capacity and added on board memory.

Read More »

October 16th, 2016 ~ by admin

Signetics 2650: An IBM on a Chip

Signetics 2650I - Original Version from May of 1976

Signetics 2650I – Original Version from May of 1976

The Signetics 2650 processor has always been described as ‘very mini-computer like’ and for good reason, it truly is very minicomputer like in design.  It is an 8-bit processor released in July of 1975 made on an NMOS process.  The 2650 has a 15-bit address bus (the upper bit (16) is reserved for specifying indirect addressing) allowing addressing of up to 32K of memory.  It has 7 registers, R0, which is used as an accumulator, as well as 2 banks of 3 8-bit registers accessed.  The 2650 supports 8 different addressing modes, including direct, and indirect with autoincrement/decrement.  Its clearly a mini-computer design and there is a reason for that, it was based on one.

The 2650 is very closely based on the IBM 1130 mini-computer released in 1965.  Both use 15-bit addressing, many addressing modes, and a set of 3 registers (Signetics added support for 2 banks of 3,  The Signetics 2650 is often noted for its novel use of a 16-bit PSW status register, but this too is from the 1130, which used a 16-bit Device Status Register for talking with various I/O components.  So why would Signetics base a processor released in 1975 on a 1965 mini-computer?

Because the 2650 was not designed long before it was released.  J. Kessler  was hired by Signetics in 1972 in part to help design an 8-bit processor.  Kessler was hired by Jack Curtis, (Of Write Only Memory fame) from…IBM. Kessler designed the architecture very similar to the IBM 1130 and Kent Andreas did the silicon layout.  The design contains 576 bits of ROM (microcode mainly), ~250 bits of RAM (for registers, stack, etc) and about 900 gates for logic.  Clock speed was 1.25MHz (2MHz on the -1 version) on a ion implanted NMOS process, very good for 1972 (this was as fast as the fastest IBM 1130 made), but Signetics was tied up working with Dolby Labs on audio products (noise canceling etc) and didn’t have the resources (or perhaps the desire) to do both, so the 2650 was pushed back to 1975.  In 1972 the IBM 1130 it was inspired by was still being made.  If the 2650 had been released in 1972 it would have had the Intel 4004 and 8008 as competition, both of which were not easy to use, and had complex power supply and clocking requirements.  The 2650 needed a 5V supply, and a simple TTL single phase clock.

Read More »

October 4th, 2016 ~ by admin

Testing all the ARMs

ARM946E on a Chartered Semiconductor 0.18u Process

ARM946E on a Chartered Semiconductor 0.18u Process

ARM is one of the most popular RISC cores used today, and has been for over a decade now.  ARM is an IP company. They license processor designs/architectures for others to use, but do not actually manufacturer the processors themselves….or do they?

ARM offers a variety of cores, and licenses them in a variety of different ways.  There are, in general, three main ways to get an ARM design.  Larger companies with may resources (such as Apple, Broadcom, or Qualcomm) will purchase an ARM architecture license.  This isn’t specific to any ARM core in particular (such as say a ARM946) but the entire ARM architecture, allowing these companies to design their own ARM processors from the ground up.  This takes a lot of resources and talent that many companies lack.

Second, ARM offers RTL (Register Transfer Level) processor models, these are provided in a hardware programming language such as VHDL or Verilog.  They can be dropped into a design along with other IP blocks (memory, graphics, etc) and wrapped with whatever a company needs.  This is a fairly common method, and typically the lest expensive.  It does require more work and testing though.  Designing a chip is only part of the process. Once it’s designed it still must be fab’d.

ARM7EJ-S on a TSMC 0.18u Process. Wafer #25 from June 2003

ARM7EJ-S on a TSMC 0.18u Process. Wafer #25 from June 2003

ARM also offers ARM models that are transistor level designs, pre-tested on various fab processes.  Pre-tested means exactly what it sounds like. ARM designed, built and had them manufactured, fixing any problems, and thus giving the ability to say this core will run at this speed on this fab’s process.  Testing and validation may often go as far as testing a particular fab’s particular process, in a particular package.  Its more work, and thus cost more, but these make for drop in ARM cores. Want to use a ARM946 core, on a TSMC 0.18u process in a lead free Amkor BGA package? Yah ARM’s tested that and can provide you with a design they know is compatible.  This allows extremely fast turn around from concept, to design to silicon.

In the below picture (click to enlarge) you can see a large variety of ARM cores from the early 2000’s. They span ARM7, ARM9, ARM10 and ARM11 designs.  Each is marked with info as to what exactly it is.  The core name, the revision (such as r2p0, meaning major revision 2, pass/subversion 0) as well as the Fab (TSMC, UMC, SMIC, Chartered) and the design node (all of these are either 0.18 or 0.13u processors).

21 Various ARM design tet chips from TSMC, UMC, Charted, covering many ARM cores.

21 Various ARM design tet chips from TSMC, UMC, Charted, covering many ARM cores.

Also noted on some is the exact wafer the die was cut from, this is typical on VERY early production tests, usually first run silicon, so they can identify any physical/manufacturing defects easier.  Some design modifications have little to do with the processor itself, but are done to increase yields on a given process/node.

ARM926EJ on a UMC 0.13u Process. THe package has a removable die cover.  Note the large die, thought he processor core itself is very small (its in the upper left)

ARM926EJ on a UMC 0.13u Process. 

Package type (in this case most are Amkor BGA) and other features are noted.  Many say ‘ETM’ which is ARM’s Embedded Trace Macrocell, a debugging tool that allows instruction and date traces of an in operation core, very useful for debugging. ARM offers ETM for each of their processor types (ETM9 for example covers all ARM9 type cores) and itself has a revision number as well.

Some of these chips come in an interesting BGA package. The package has a removable die cover for inspection/testing (and possibly modification). Note the large die in the ARM926EJ on the left, though the processor core itself is very small (its in the upper left only a few square mm).  This is done to facilitate bonding into the package, In this type of package there wouldn’t be any way to connect all the bonding wires to the very tiny ARM core, so the die has a lot of ‘wasted’ space on it.

So does ARM make processors? Yup! but only for internal use, to help develop the best possible IP for their clients.

 

Tags:

Posted in:
CPU of the Day

August 25th, 2016 ~ by admin

Intel i486 Prototype: Intel’s Gamble with CISC

Intel A80486DX SXE19 Engineering Sample - May 1989

Intel A80486DX SXE19 Engineering Sample – May 1989

The Intel 80486 was announced at COMDEX in April 11th 1989, justy 3 years after the 80386 hit the market.  The 80486 was really a greatly enhanced 80386. It added a few instructions, on-chip 8KB Write-Thru cache (available off chip on 386 systems) as well as an integrated FPU.  Instruction performance was increased through a tight pipeline, allowing it to be about twice as fast as the 80386 clock for clock.  Like the 80386 the 80486 was a CISC design, in an era when the RISC processor, in its may flavors, was being touted as the future of ALL computing.  MIPS, SPARC, and ARM all were introduced in the late 1980’s.  Intel themselves had just announced a RISC processor, the i860, and Motorola had the 88k series.  Intel in fact was a bit divided, with RISC and CISC teams working on different floors of the same building, competing for the best engineering talent.  Would the future be CISC, with the 80486? Or would RISC truly displace the CISC based x86 and its 10 years of legacy?

This dilemma is likely why Intel’s CEO, Andy Grove, was nearly silent at COMDEX.  It was only 4 years previous the Mr. Grove, then as President, made the decision to exit the memory market, and focus on processors, and now, a decision would soon loom as to which type of processor Intel would focus on.  Intel eventually ditched the i860 and RISC with it, focusing on the x86 architecture.  It turns out that ultimately CISC vs RISC didn’t greatly matter, studies have shown that the microarchitecture, rather then the Instruction Set Architecture, is much more important.

Intel A80486DX-25 - SX249 - B4 Mask from Sept 1989 with FPU Bugs

Intel A80486DX-25 – SX249 – B4 Mask from Sept 1989 with FPU Bugs

Whether due to the competition from the i860 RISC team, or knowing the markets demands, the 80486 team knew that the processor had to be executed flawlessly.  They could ill afford delays and bugs.  Samples of the 80486 were scheduled to be released in the 3rd quarter of 1989 with production parts shipping in the 4th quarter.  The above pictured sample is from May of 1989, a quarter ahead of schedule.  Production parts began to ship in late September and early October, just barely beating the announced ship date.

Perhaps due to the rush to get chips shipping a few minor bugs were found in the FPU of the 486 (similar to bugs found in the FPU of the 387DX).  Chips with the B4-Mask revision and earlier were affected (SX249).   These bugs were relatively minor and quickly fixed in the B5 mask revision (SX250), which became available in late November of 1989, still within Intel’s goal of the 4th Quarter.

The 80486 was a success in the market and secured CISC as the backbone of personal computing.  Today, the CISC x86 ISA is still used, alongside the greats of RISC as well.

August 19th, 2016 ~ by admin

CPU of the Day: Motorola MC6801 – The (second) first 6800 MCU

Motorola XC6801L - Early White ceramic package from 1979. XC denotes a not fully qualified part.

Motorola XC6801L – Early White ceramic package from early 1979. XC denotes a not fully qualified part.

A microcontroller (or microcomputer) is a CPU, with additional on-board peripherals, usually containing RAM, ROM, and I/O as to serve as a single (or close to single) chip solution for a computer system.  As the program space is typically small, they were designed and used for high volume, low cost, simple applications.  Today we would refer to them as embedded applications.  The Motorola MC6800, released in 1974 was a decent 8-bit processor.  It was however not inexpensive (a fact not lost upon one of its designers, Chuck Peddle, who left to design the 6502).  Initial pricing for the MC6800 was $360, dropping to $175 the next year.

For embedded use, prices needs to be in the few dollars range, with as little chips as possible required for a design.  By 1977 Motorola had a solution, the MC6802.  This MC6802 was an enhanced MC6800 64-bytes of RAM and an on-board clock-generator.  When combined with the MC6846 (which provided ROM, I/O and Timers) a complete system could be built.  Defective MC6802s were often sold as RAM-less MC6808s.

Motorola MC6802L - Dated March of 1978. The 6802 had 64-bytes of RAM and no ROM.

Motorola MC6802L – Dated March of 1978. The 6802 had 64-bytes of RAM and no ROM.

The MC6802 was followed by the more complex MC6801, which integrates the features of the MC6846 on die, as well as increasing the RAM to 128-bytes, making a true 8-bit single chip microcomputer.  Most sources refer to the MC6801 being released in 1978, however it was actually released in 1977, likely at the same time, or similar as the MC6802.  US Patent Application US4156867 filed on September 9th of 1977 references both processors.  GM was to be the lead customer for the MC6801, it was the MCU of choice for the digital trip meter (TripMaster) of the 1978 Cadillac Seville.  The 1978 Seville began production on September 29, 1977.  It is likely that all of the first production of the 6801 was reserved for GM, and it wasn’t until 1978 and later that Motorola began to market it (it begins to show up in Motorola marketing only in 1979).  The TripMaster was a $920 factory option that proved to be rather unpopular, likely due to it adding nearly $1000 in cost to a $14,000 car.

Motorola MC68701U4L-1 1987 6801 with upgraded RAM/ROM and Timers

Motorola MC68701U4L-1 1987 6801 with upgraded RAM/ROM and Timers

This lack of early availability, coupled with the fact that while capable, the 35,000 transistor 6801 wasn’t particularly inexpensive led it to have very little success in the market.  The EPROM version, the MC68701 infact is much more common, likely due to the fact that it was used in lower volume products, where cost wasn’t such an issue.  In 1979 Motorola attempted to remedy this by releasing the MC6805 series.  This was designed from the ground up to be low cost.  The first versions had half the ROM and half the RAM as the 6801, while keeping the I/O.  They were also available in CMOS (as the MC146805).  They were inexpensive, and highly functional, and were widely used.  The 6805 continues to see use today as the 68HC05 and 68HC08 series.

Motorola XC68HC11A0FN - 1987 - Preproduction, Enhanced 6801

Motorola XC68HC11A0FN – 1987 – Preproduction, Enhanced 6801

The MC6801 was not, however, done.  By this time manufacturing had improved, allowing costs to be lower.  Motorola released an upgraded 6801, the MC6801U4 which expanded the timer functions, increased the ROM to 4K, and increased the RAM to 192-bytes.   In 1985 the MC6801 was upgraded again, a second 16-bit index register was added, as well as true bit-manipulation instructions.  The Motorola MC68HC11, the name change reflecting the greatly enhanced core, was made in many varieties with different sizes of RAM, ROM, and EEPROM. The MC68HC11A8 was also the first MCU to integrate EEPROM on die, in this case, 512 bytes worth.  The MC68HC11 series, and its 68HC12 and 16 successors, continue to be made, and used today, ironically, frequently in automotive applications, where the original MC6801 failed to be a success.

 

 

May 21st, 2016 ~ by admin

Azul Systems Vega 3: 54 Cups of Coffee

Azul Systems V03A0L1-Vega 3 - 54-core RISC Java Processor

Azul Systems V03A0L1-Vega 3 – 54-core RISC Java Processor

Azul Systems was started in 2002 to do what anyone who has used Java wishes it to be, make it faster, and more scalable.  Azul did this using both software (optimized Java compilers/Run time environments) and hardware.  The Vega processor line was Azul’s attempt at a hardware acceleration of Java.  This wasn’t a new concept, many companies have created hardware implementations to execute Java.  Notable is the Jazelle extentions from ARM, which can directly execute Java byte-codes and Sun developed the pico.Java processor to do similar.  The Vega takes a rather different route though.  Azul found that direct execution of Java byte codes wasn’t really that important if you had very efficient JIT (Just In Time) compilation to an efficient architecture.  This allows the processor to be a bit more adaptable as you now have a layer between the hard to change hardware, and the Java feeding it.  New instructions, or work arounds/speed up become easier to implement.

The Vega 3, the last of the Vega series is a 54-core processor, each core is a classic 3-address 64-bit RISC processor with 32 registers and 16K of Instruction cache + 16K of Data cache.  The architecture is designed to be ‘Java friendly’  with fairly weak memory model for easier scaling, support for more robust garbage collection, and not a large focus on FPU performance. There is 12MB of L2 cache on chip as well (each 9-cores share 2M). The chips are fab’d by TSMC on a 65nm or 90nm process (it isn’t clear which from Azul’s documentation).  All registers and caches support ECC, and the chips themselves self-report any problems, allowing the system (which may use up to 16 chips (864 cores) to disable any misbehaving processor or memory).

Vega 3 - 54-core die.  Truly massive die.  Software though allows workaround for many hardware defects.

Vega 3 – 54-core die. Truly massive die. Software though allows workaround for many hardware defects.

The Vega 3, and the systems it was used in, allowed Java to be scaled to much larger heap sizes (500G+) and core counts, without coherency problems.  Many institutions (especially financial) still use Java programs that were written long ago, recoding them would speed them up, but that is not practical.  The Vega3 (and other Azul products) allow old code, to be ran faster with no modifications.

Azul sold many systems running the Vega processors but eventually moved to software only solutions, that could efficiently run Java on existing x86 hardware.  The methods though are similar, just no longer the need for custom hardware to run it on.  Azul appliances can be added to any datacenter to catch and accelerate Java applications.

Azul wasn’t the first company to accelerate Java, and they certainly won’t be the last.  Java’s simplicity and platform independence will keep it around, and the ability to run decades old code fast and safely on modern hardware will continue to drive products.  Its like COBOL all over again…

Posted in:
CPU of the Day

April 28th, 2016 ~ by admin

The Evolution of the Intel 8051 Processes

Intel C8051-3 - 1981 - Original 3.5u HMOS-E

Intel C8051-3 – 1981 – Original 3.5u HMOS

That’s not a typo, we’re going to look briefly at the technology processes (rather then the processors themselves)  Intel went through in the first 5 years of the MCS-51 microcontrollers, and the exceedingly confusing nature of the resulting naming.  When the Intel 8051 series was released in 1980 it was made on two different processes.  The 8031/8051 (non-EPROM) were made on the HMOS-I process, a 3.5 micron single poly process.

Intel C8751-8 - 1982 - Orignal 3.5u HMOS-E

Intel C8751-8 – 1982 – Orignal 3.5u HMOS-E

The EPROM version, the 8751 was made on an EPROM process, HMOS-E, which was still a 3.5 micron process, but with 2 poly layers.  This resulted in some slight differences in electrical characteristics (not to mention the programming features not needed on the MaskROM and ROMless versions.

Intel 8751H B-2 ENG. SAMPLE - 1985 -HMOSII-E - 2u

Intel 8751H B-2 ENG. SAMPLE – 1985 -HMOSII-E – 2u

Intel then moved to the HMOS-II (Intel Process P414.1) process in 1984.  This was a shrink to 2 microns, and the EPROM version was also shrunk, but again, using a slightly different EPROM process (Intel Process P421.X).  The HMOSII MaskROM and ROMless versions received the suffix AH, ‘A’ denoting a minor update to the architecture, and ‘H’ for the new HMOSII process.  The EPROM version did not see the same updates though, it received EPROM security bit support and was simply called the 8751H.

Read More »

Posted in:
CPU of the Day

April 14th, 2016 ~ by admin

DEC NVAX++ NV5: The End of VAX

DEC NVAX 21-34457-05 246B - 1992  -71MHz

DEC NVAX 21-34457-05 246B – 1992 -71MHz

About a year ago we covered the DEC RIGEL VAX Processor.  After The RIGEL DEC moved to make a single chip VAX processor that would include the CPU, FPU, and cache controller on one single die.  Work on the design began in 1987, and first silicon shipping in 1991.  Performance ended up being as good or better then the very high end VAX 9000 systems (implemented in ECL logic).

The original NVAX processor was made on a 0.75u 3-Layer CMOS process (DEC CMOS-4) and contained 1.3 million transistors in a 339 pin CPGA package.  Initial clock speed, in 1991 was 71MHz.  NVAX was then the fastest CISC processor made.  Speeds ramped up to 90.9MHz at the high end and a lower end of 62.5MHz. The first NVAX models were identified as 246B and 246C. Later versions, made well into 1996, were made on the CMOS-4S process, a 10% shrink to 0.675u and were labeled 1001C.

Internally NVAX was very familiar, the FPU was largely reused directly from RIGEL.  The NVAX also maintains the 4-phase clocking scheme from RIGEL, but moves the clock generator on chip. It also maintained the 2K of on die instruction cache from RIGEL, but added a 8K data/instruction mixed cache as well.  An L2 cache was supported in sizes of 256K 512K 1M or 2M, and located off chip.  The NVAX continued the 6-stage pipeline of RIGEL with some enhancements.  One of the greatest performance enhancements over RIGEL is the handling of pipeline stalls.  In the RIGEL pipeline, a stall in one stage would stall the entire pipe line, whereas on NVAX, in most cases, a stall in one stage does not prevent the other stages from continuing.

At nearly the same time as the development of the NVAX DEC was also developing a competitor to MIPS, a RISC architecture.  This new RISC architecture was codenamed EVAX, for Enhanced VAX, and was a purely RISC architecture that could run translated VAX CISC code with very little performance penalty.  It did however borrow from VAX, like the NVAX, EVAX used the FPU from the RIGEL. DEC went on to brand the EVAX as Alpha AXP, to separate it from the VAX line, though its internal naming of EV4, EV5 etc was left intact, as the last remnant of VAX.

DEC 2140568-02 299D NVAX++ 170.9MHz - 1996 - from a VAX7800

DEC 21-40568-02 299D NVAX++ 170.9MHz – 1996 – from a VAX7800

Having two high performance processor types at the same time left DEC in a bit of a dilemma so they created a third, known as the NVAX+ (DEC 262D).  The NVAX+ was originally made on the same CMOS-4 process as the NVAX and ran at 90.9MHz.  The NVAX+ was meant to be a bridge between the VAX line and the Alpha AXP.  It was a NVAX core, wrapped in an EVAX (Alpha AXP) external interface, it was made in the same 431PGA as the Alpha 21064 and was pin for pin compatible, the same board could be used for either.  It supported more L2 cache then the NVAX, supporting six cache sizes (4MB, 2MB, 1MB, 512KB, 256KB, 128KB),

In 1994 the NVAX+ was shrunk to the DEC CMOS-5 4-Layer 0.5 micron process resulting in the NVAX++ (DEC 299D) which ran from 133-170.9MHz.  These speeds continued to be the fastest CISC processors until Intel released the Pentium Pro at 180 and 200MHz in 1996.  Ultimately Intel’s dominance, and the coming dominance of RISC performance were the writing on the wall, and the VAX, and not long after it DEC itself were doomed to reside in the history books.  By 1997 The NVAX++ was off the market.  In 1997 the DEC Alpha team was operating out of offices owned by Intel (who also took over DEC’s fab’s), and in 1998 the remains of DEC, and the Alpha team, were bought by Compaq. And by 2004 Alpha was phased out in favor of Itanium (a now rather ironic decision by HP/Compaq).

 

Posted in:
CPU of the Day