Archive for the 'CPU of the Day' Category

May 21st, 2016 ~ by admin

Azul Systems Vega 3: 54 Cups of Coffee

Azul Systems V03A0L1-Vega 3 - 54-core RISC Java Processor

Azul Systems V03A0L1-Vega 3 – 54-core RISC Java Processor

Azul Systems was started in 2002 to do what anyone who has used Java wishes it to be, make it faster, and more scalable.  Azul did this using both software (optimized Java compilers/Run time environments) and hardware.  The Vega processor line was Azul’s attempt at a hardware acceleration of Java.  This wasn’t a new concept, many companies have created hardware implementations to execute Java.  Notable is the Jazelle extentions from ARM, which can directly execute Java byte-codes and Sun developed the pico.Java processor to do similar.  The Vega takes a rather different route though.  Azul found that direct execution of Java byte codes wasn’t really that important if you had very efficient JIT (Just In Time) compilation to an efficient architecture.  This allows the processor to be a bit more adaptable as you now have a layer between the hard to change hardware, and the Java feeding it.  New instructions, or work arounds/speed up become easier to implement.

The Vega 3, the last of the Vega series is a 54-core processor, each core is a classic 3-address 64-bit RISC processor with 32 registers and 16K of Instruction cache + 16K of Data cache.  The architecture is designed to be ‘Java friendly’  with fairly weak memory model for easier scaling, support for more robust garbage collection, and not a large focus on FPU performance. There is 12MB of L2 cache on chip as well (each 9-cores share 2M). The chips are fab’d by TSMC on a 65nm or 90nm process (it isn’t clear which from Azul’s documentation).  All registers and caches support ECC, and the chips themselves self-report any problems, allowing the system (which may use up to 16 chips (864 cores) to disable any misbehaving processor or memory).

Vega 3 - 54-core die.  Truly massive die.  Software though allows workaround for many hardware defects.

Vega 3 – 54-core die. Truly massive die. Software though allows workaround for many hardware defects.

The Vega 3, and the systems it was used in, allowed Java to be scaled to much larger heap sizes (500G+) and core counts, without coherency problems.  Many institutions (especially financial) still use Java programs that were written long ago, recoding them would speed them up, but that is not practical.  The Vega3 (and other Azul products) allow old code, to be ran faster with no modifications.

Azul sold many systems running the Vega processors but eventually moved to software only solutions, that could efficiently run Java on existing x86 hardware.  The methods though are similar, just no longer the need for custom hardware to run it on.  Azul appliances can be added to any datacenter to catch and accelerate Java applications.

Azul wasn’t the first company to accelerate Java, and they certainly won’t be the last.  Java’s simplicity and platform independence will keep it around, and the ability to run decades old code fast and safely on modern hardware will continue to drive products.  Its like COBOL all over again…

Posted in:
CPU of the Day

April 28th, 2016 ~ by admin

The Evolution of the Intel 8051 Processes

Intel C8051-3 - 1981 - Original 3.5u HMOS-E

Intel C8051-3 – 1981 – Original 3.5u HMOS

That’s not a typo, we’re going to look briefly at the technology processes (rather then the processors themselves)  Intel went through in the first 5 years of the MCS-51 microcontrollers, and the exceedingly confusing nature of the resulting naming.  When the Intel 8051 series was released in 1980 it was made on two different processes.  The 8031/8051 (non-EPROM) were made on the HMOS-I process, a 3.5 micron single poly process.

Intel C8751-8 - 1982 - Orignal 3.5u HMOS-E

Intel C8751-8 – 1982 – Orignal 3.5u HMOS-E

The EPROM version, the 8751 was made on an EPROM process, HMOS-E, which was still a 3.5 micron process, but with 2 poly layers.  This resulted in some slight differences in electrical characteristics (not to mention the programming features not needed on the MaskROM and ROMless versions.

Intel 8751H B-2 ENG. SAMPLE - 1985 -HMOSII-E - 2u

Intel 8751H B-2 ENG. SAMPLE – 1985 -HMOSII-E – 2u

Intel then moved to the HMOS-II (Intel Process P414.1) process in 1984.  This was a shrink to 2 microns, and the EPROM version was also shrunk, but again, using a slightly different EPROM process (Intel Process P421.X).  The HMOSII MaskROM and ROMless versions received the suffix AH, ‘A’ denoting a minor update to the architecture, and ‘H’ for the new HMOSII process.  The EPROM version did not see the same updates though, it received EPROM security bit support and was simply called the 8751H.

Read More »

Posted in:
CPU of the Day

April 14th, 2016 ~ by admin

DEC NVAX++ NV5: The End of VAX

DEC NVAX 21-34457-05 246B - 1992  -71MHz

DEC NVAX 21-34457-05 246B – 1992 -71MHz

About a year ago we covered the DEC RIGEL VAX Processor.  After The RIGEL DEC moved to make a single chip VAX processor that would include the CPU, FPU, and cache controller on one single die.  Work on the design began in 1987, and first silicon shipping in 1991.  Performance ended up being as good or better then the very high end VAX 9000 systems (implemented in ECL logic).

The original NVAX processor was made on a 0.75u 3-Layer CMOS process (DEC CMOS-4) and contained 1.3 million transistors in a 339 pin CPGA package.  Initial clock speed, in 1991 was 71MHz.  NVAX was then the fastest CISC processor made.  Speeds ramped up to 90.9MHz at the high end and a lower end of 62.5MHz. The first NVAX models were identified as 246B and 246C. Later versions, made well into 1996, were made on the CMOS-4S process, a 10% shrink to 0.675u and were labeled 1001C.

Internally NVAX was very familiar, the FPU was largely reused directly from RIGEL.  The NVAX also maintains the 4-phase clocking scheme from RIGEL, but moves the clock generator on chip. It also maintained the 2K of on die instruction cache from RIGEL, but added a 8K data/instruction mixed cache as well.  An L2 cache was supported in sizes of 256K 512K 1M or 2M, and located off chip.  The NVAX continued the 6-stage pipeline of RIGEL with some enhancements.  One of the greatest performance enhancements over RIGEL is the handling of pipeline stalls.  In the RIGEL pipeline, a stall in one stage would stall the entire pipe line, whereas on NVAX, in most cases, a stall in one stage does not prevent the other stages from continuing.

At nearly the same time as the development of the NVAX DEC was also developing a competitor to MIPS, a RISC architecture.  This new RISC architecture was codenamed EVAX, for Enhanced VAX, and was a purely RISC architecture that could run translated VAX CISC code with very little performance penalty.  It did however borrow from VAX, like the NVAX, EVAX used the FPU from the RIGEL. DEC went on to brand the EVAX as Alpha AXP, to separate it from the VAX line, though its internal naming of EV4, EV5 etc was left intact, as the last remnant of VAX.

DEC 2140568-02 299D NVAX++ 170.9MHz - 1996 - from a VAX7800

DEC 21-40568-02 299D NVAX++ 170.9MHz – 1996 – from a VAX7800

Having two high performance processor types at the same time left DEC in a bit of a dilemma so they created a third, known as the NVAX+ (DEC 262D).  The NVAX+ was originally made on the same CMOS-4 process as the NVAX and ran at 90.9MHz.  The NVAX+ was meant to be a bridge between the VAX line and the Alpha AXP.  It was a NVAX core, wrapped in an EVAX (Alpha AXP) external interface, it was made in the same 431PGA as the Alpha 21064 and was pin for pin compatible, the same board could be used for either.  It supported more L2 cache then the NVAX, supporting six cache sizes (4MB, 2MB, 1MB, 512KB, 256KB, 128KB),

In 1994 the NVAX+ was shrunk to the DEC CMOS-5 4-Layer 0.5 micron process resulting in the NVAX++ (DEC 299D) which ran from 133-170.9MHz.  These speeds continued to be the fastest CISC processors until Intel released the Pentium Pro at 180 and 200MHz in 1996.  Ultimately Intel’s dominance, and the coming dominance of RISC performance were the writing on the wall, and the VAX, and not long after it DEC itself were doomed to reside in the history books.  By 1997 The NVAX++ was off the market.  In 1997 the DEC Alpha team was operating out of offices owned by Intel (who also took over DEC’s fab’s), and in 1998 the remains of DEC, and the Alpha team, were bought by Compaq. And by 2004 Alpha was phased out in favor of Itanium (a now rather ironic decision by HP/Compaq).


Posted in:
CPU of the Day

March 10th, 2016 ~ by admin

Milandr K1886VE: The PIC That Went to Russia

Milandr K1886VE2U PIC17C756A w/ Flash Memory

Milandr K1886VE2U PIC17C756A w/ Flash Memory

We have previously talked about the Microchip PIC17, and its less then stellar success in the market.  After being introduced in the early 1990’s it was discontinued in the early 2000’s, though Microchip continued to provide support (and some devices) to users for some time after that.

In the early 1990’s a IC company was formed in Zelenograd, Russia (just a short distance to the NW of Moscow), the silicon valley of Russia, home to the Angstrem, and Micron IC design houses.  This company was Milandr, one of the first post-Soviet IC companies, with ambitious plans, and many highly capable engineers from the Soviet times.  They are a fabless company, though with their own packaging/test facilities, specializing in high reliability metal/ceramic packages.

The K1886VE is Milandr’s version of a PIC17C756A, though updated for the 21st century.  While mask-ROM versions are available the VE2 version replaces the ROM with modern FLASH memory.  This is a upgrade that perhaps would have kept the PIC17 alive if Microchip would have done similar.  It is packaged in a 64 pins CQFP white ceramic package with a metal lid and gold leads, not what one is use to seeing a PIC in.  Production of these PICs continues at Milandr (the pictured example is from 2012), as customers still use the parts, mainly in industrial and other places where reliability is key.

The use of a PIC in high reliability applications isn’t something entirely new.  The Microhard MHX-2400 radio system, designed for small satellites such as cubesats, runs on a PIC17C756A, a version flew on NASA’s Genesat-1 in 2006 carrying bacteria samples.  Milandr does offer radiation resistant devices so its likely that some Milandr PIC has flown to space as well.


February 13th, 2016 ~ by admin

RCA CDP1855: A Multiplier for the COSMAC

RCA CDP1855CE - 3.2MHz @ 5V

RCA CDP1855CE – 3.2MHz @ 5V

In the 1970’s MULT/DIV instructions were fairly uncommon to be implemented in hardware on a processor.  They were implemented in software (usually be the compiler, or hand coded) as a series of adds and subtracts/shifts.  In some cases dedicated hardware, usually through a series of bit slice processors, or ‘181s were added to handle MULT/DIV requirements.

In 1978 RCA announced the CDP1855 Programmable Multiplier/Divider for the 1802 COSMAC processor.  Sampling began in 1979, making this one of the earliest ‘math coprocessors’ of the time.  The 1855 was an 8×8 Multiplier/Divider, handling Multiplies with Addition/Shift Right Ops, and Division using Subtractions/Shift Left Ops.  It was, like the COSMAC, made in CMOS, and at 10V ran at 6.4MHz, allowing for a 8×8 MULT to finish in 2.8us.  The CDP1855 was also designed to be cascaded with up to 3 others, providing up to a 32×32 bit multiply, in around 12usec, astonishing speed at the time.  Even the slower CDP1855CE (using a 5V supply and clocked at 3.2usec) could accomplish a full 32×32 MULT in 24usec.  An AMD AM9511 (released a year earlier) can do a 32×32 fixed point multiply in 63usec (@ 3MHz).

Soviet Integral 588VR2A - CDP1855 'Analog' from 1991

Soviet Integral 588VR2A – CDP1855 ‘Analog’ from 1991

The CDP1855 was designed to interface directly with the 1802 processor, but could be used with any other 8-bit processor as well.  It was programmable, so the host processor only needed to load with the data to be multiplied/divided, the control values ot tell it what to do, and then wait for the results.

As was typical, the Soviets made an ‘analog’ of the CDP1855 called the 588VR2 and 588VR2A.  The 588VR2 was packaged in a 24-pin package vs the 28 pins of the CDP1855, so its certainly not directly compatible.  Soviet IC design houses were instructed and paid to design and make copies of Western devices, typically original ideas were discouraged.  This led to a lot of devices being made that were similar, but not the same as their Western counterparts, the design firm could make a somewhat original device, and then simply claim to the bureaucrats that it is an ‘analog’ to a certain Western design.  Thus the 588VR2 is ‘similar’ or an ‘analog’ to the 1855.

The CDP1855 continued to be made, and sold into the late 1990s, much like the 1802 processor it supported.


February 3rd, 2016 ~ by admin

The End of the Omega

ST STi5500 - The Original 50MHz Transputer based Omega

ST STi5500 – The Original 50MHz Transputer based Omega

In January ST announced that they would be exiting the Digital Set Top Box (STB) market.  This is a market that they arguably led for the last 20 years, and one that really began with their Omega processor in 1997. The ST Omega processor line, beginning with the STi5500 powered set top boxes, for cable companies, satellite companies, and DVR’s as well as other TV connected devices.  Open up a satellite TV receiver from the last 20 years and you are very likely to find a STi Omega chipset.

The STi5500 was the beginning, and interestingly at its core was a ST20 processor, based on the Inmos Transputer (which ST now owned) from the late 1980’s.  The Transputer was meant to revolutionize computing, making processors so cheap, that they could be embedded into pretty much any other logic device, what today we call an SoC, but in 1985, was a novel idea.  At the time it didn’t really succeed, but ended up seeing its intended use 10+ years later in the Omega.  In the 1980s the Transputer saw speeds of up to 30MHz, int he STi5500 it ran at 50MHz with 2K of I-cache + 2K of Data Cache as well as 2K of SRAM that could be used as data cache.

ST STi5514 - Enhanced 180MHz Omega

ST STi5514 – Enhanced 180MHz Omega

In the early 2000s the Omega was upgraded to a faster ST20 core, eventually hitting 243MHz in the STi5100, now with the caches increased to 8K each, as well as 8K of SRAM.  This was getting to be the limit of the ST20 Transputer core.  ST needed a core that could support higher speeds running such things as Java and Windows CE amongst other things, as well as support the higher resolutions and audio quality requirements.

ST handled this is in two entirely different ways.  First they licensed the SH-4 32-bit RISC core from Hitachi, a rather surprising move but STBs was not a market Hitachi was in, so it was in both companies best interest.  ST also was working on their own new core to replace the ST20, and they had help, from a very surprising partner.

Read More »

January 15th, 2016 ~ by admin

The Oracle SPARC M4 and how it became the M5 (but really didn’t)

Oracle SPARC M4 Wafer # 1 - No date, likely early 2011.

Oracle SPARC M4 Wafer # 1 – No date, likely early 2011.

The story of the Oracle SPARC M4 is best told starting with Afara websystems.  Afara was the original developer of the SPARC processor that became the SUn UtraSPARC T1, aka the Niagara.  Sun acquired Afara in 2002 in a sale that was really designed as a capital campaign for Afara, they had the technology and design for the processor, just not the money to enter the market, Sun had the money (or so they thought at the time).  The T1 was released in 2005 and had 4-8 cores.  The individual cores were called the SPARC S1 core (now an open source SPARC core).  In 2007 Sun released the Nigara 2, the UltraSPARC T2, with 4-8 cores, based on the second version of the S1, the S2.  Both the S1 and S2 were designed with multi-threading as the primary performance point.  They excelled at it, and the UltraSPARC T3, released in September 2010 (though it had been sampling all the way back in Dec. of 2009) did even better at multi-threaded applications.  The T3 also was fab’d by TSMC, a change from previous SPARCs which were almost entirely fab’d by Texas Instruments.

The T3, and the S2 core it was based on had one major problem. The S2 core had sub-par single thread performance.  While the workloads given to a SPARC server can be tailored somewhat to match was the processor does best (multi-threading) there is always going to be a point at which a single thread task must be done, and it will hold up the entire processor if it cannot be processed efficiently.

Read More »

January 6th, 2016 ~ by admin

Signetics SPC-16/10: Another Mini goes Micro

Philips P860 Minicomputer - 1971

Philips P860 Minicomputer – 1971

In the 1960’s the Dutch Philips Data Systems marketed computers from Honeywell.  By 1970 they decided that simply reselling others machines was not the best value for them, or their customers and set off to design their own series of mini computers.  The first design was the 8-bit P410, which only saw limited success, it was a bit too mini for the early 1970’s when 16-bits or better was the standard. 1970 saw Philips begin work on its successor in Fontenay Aux Roses, near Paris, France, a project known internally as Sagittaire.  It was released in 1971 as the P800 series of mini computers, starting with the P850.   These were a 16-bit design, using 16 16-bit registers.  It shipped with 2k x 16bits of memory and had a cycle time of 3.2 microseconds (~312KHz).  Further versions were released that supported up to 32k x 16bits of memory and faster cycle times.

Philips P851 Chipset

Philips P851 Chipset

The P800 architecture used the A0 register as the Program Counter and the last register (A15) as a stack pointer.  The design supported up to 64 I/O devices and 64 interrupt levels.  The addressing modes include direct, register, indirect, indexed and indexed indirect types and can operate on bits, bytes (characters), words, and double words.  Since the stack is maintained in memory, the stack pointer can be rewritten, preserving the current stack for easier context switches.  This is of course important as the P800 is designed as a multi-user. multi tasking computer.  The P800 instruction set included 97 instructions, including MULT/DIV, though depending on the model, some of these were simulated (microcoded).  The P800 family found wide use in offices and eventually banks (always the big money market) throughout Europe.  It also proved to be useful in industrial environments, a somewhat underappreciated market for mini-computers at the time.

IRAS - Infrared Astronomical Satellite - Launched 1983 - Based on P851 chipset

IRAS – Infrared Astronomical Satellite – Launched 1983 – Based on P851 chipset

In 1979 Philips released the P851, a Single Board Computer (SBC), version of the P800 series.  It included the full 32k words of memory and was an LSI implementation using 5 Philips LSI’s consisting of 4 4-bit ALUs and a control path.  The P851 was used extensively for industrial automation as well as Philip’s own PM4400 computer system.  This system became the basis of the PM4421 development system which supported development and emulation of many processors, including the Intel 8085/86/88, Zilog Z80, 650x, Motorola MC68k, Signetics 2650 and many others.

The P851 LSI design was also used in space missions, perhaps the most famous in the IRAS mission launched in 1983.  This was the first full Infrared mapping mission launched, and in its 10 month mission, mapped almost the entire sky in 4 different IR wavelengths, IRAS Space Discoveries that are even today not yet identified.  The mission was of course limited by the coolant carried to keep the IR detector cold, but the IRAS satellite continues to orbit Earth to this day, with a 16-bit P851 computer still on board.

Read More »

January 1st, 2016 ~ by admin

Siemens SAB80199: 16-bits for Europe

Siemens SAB80199 - Introduced 1983 @ 20MHz This example is made in 1985

Siemens SAB80199 – Introduced 1983 @ 20MHz This example is made in 1985

By 1982 Siemens has firmly established themselves as a semiconductor powerhouse in West Germany, and the entirely of western Europe.  Their manufacturing prowess led them to be Intel’s second source of choice in Europe, building 8008,8080, and 8086/8 processors, with production beginning for the 186 and 286s processors as well.  Siemens’ expertise was not just in making second sourcing others work, they had their own design/development as well, doing a large amount of work for the industrial automation market as well as others.

In late 1982 they announced a new 16-bit processor, one of their own design.  Production began in 1983 and continued for over a decade.  The 80199 had a 8086 compatible bus, but that’s where the similarities end.  The 80199 is often described as a ‘Terminal COntrol Processor’ or a ‘Printer Controller’ which is a bit deceptive.  It was designed  from the outset as a real time processor, capable of handling multiple real time tasks.

Siemens SAB80199 made in 1990, and still marked 'W. GERMANY'

Siemens SAB80199 made in 1990, and still marked ‘W. GERMANY’

The SAB80199 was built on a 3 micron NMOS process and contains 40,000 transistors on a 45mm2 die.  Clock speed is 20 MHz (faster then most anything else in 1983) and had an instruction cycle of 0.5 microseconds.  It moved many of the RTOS functions from software (or an external chip like Intel’s 80130 RTOS co-processor for the 808x) to on chip hardware.  It had 8 status registers, 8 instruction pointers, and 8 sets of registers.  This allowed very rapid task switching as each tasks data did not have to be saved/restored, a complete task switch took 1 microsecond to complete.  In addition the 80199 had another feature that was rather novel at the time, cache.  The processor contained an on chip instruction cache the could hold 16, 16-bit instructions.  For some sets of code, such as a simple loop, the entirely of the instructions for it, would reside on chip, resulting in very fast execution.  Today of course caches for data/instructions are normal, and very large, measured in KB and MB but in 1983 it was virtually unknown.

In 1983 the ‘West Europe Report’ called Siemens 80199 the ‘Fast Bavarian’, fast indeed, and it was adopted across Europe, but never made it to the American market in any quantity.  It is perhaps one of the ‘forgottens’ but certainly deserves a place in the history of real time computing.

December 11th, 2015 ~ by admin

Akatsuki: Dawn rises again at Venus

Akatsuki - Though by now its main antenna is probably brown or black from being baked by the sun

Akatsuki – Though by now its main antenna is probably brown or black from being baked by the sun – Powered by a NEC uPD55117B-018 16-bit processor.

Akatsuki, Japanese for Dawn, was launched in May of 2010 for a journey to the morning star, Venus, on a JAXA H-IIA rocket. The H-IIA flight computer runs on a space rated version of the NEC V70 32-bit processor, running the NEC RX616 RTOS.  A processor significantly faster than that of the interplanetary probe it was launching.

“it will have a short cruise to Venus, entering its long, elliptical orbit in December. Its mission should last several years. “

In space, things don’t always go as planned…

On December 7th Akatsuki entered orbit around Venus, December of 2015 rather than 2010.  Due to a valve in the fuel pressurization system not opening all the way the orbital insertion engine ran much too lean on its attempt to enter orbit, causing it to overheat and catastrophically fail.  This left the probe on a heliocentric orbit, moving away from Venus.  The Japanese Space Administration (JAXA) was not deterred, Akatsuki’s orbit would eventually meet up with Venus again, almost exactly 5 years later.  JAXA determined they could use the probes attitude control thrusters, which feed off the same fuel tank as the failed main thruster, to insert Akatsuki into a highly elliptical, yet still useful orbit.  Had the Attitude control system used a separate fuel system (which is actually the more common design method) this would not have been possible, as it would take a relatively large amount of fuel, fuel that was available on Akatsuki due to the main engine failing and being shut down before its burn was completed.  It should be noted that such a maneuver had never previously been even proposed, let alone attempted.  There was however another small problem…

Read More »

Posted in:
CPU of the Day