June 5th, 2017 ~ by admin

SiFive FE310: Setting The RISC Free

SiFive FE310 RISC-V Processor. Early LSI SPARC Processor for size comparison. Both are based on U.C. Berkeley RISC designs.

The idea of RISC (Reduced Instruction Set Computer) processors began in education, specifically University of California, Berkeley in the early 1980’s, and it was out universities that some of the most famous RISC designs came.  MIPS, still in use today, started life as a project at Stanford University, and SPARC, made famous by Sun, and now made by Oracle and Fujitsu, started life as a Berkeley University project.  Universities have continued to work with RISC architectures, for research and teaching.  The simplicity of RISC makes them an ideal educational tool for learning how computers/processors function at their basic levels.

By the late 1980’s RISC had begun to become a commercial revolution, with nearly every player having their own RISC design.  AMD (29k), Intel (i960), HP (PA-RISC), Weitek (XL8000), MIPS, SPARC, ARM, Hitachi (SH-RISC), IBM (POWER), and others offered their take on the RISC design.  Most were proprietary, while a few were licenseable, none were open architectures for anyone to use.

Unfortunately, outside of the university, RISC processors are not as simple.  The architectures, and their use may be, but licensing them for the design is not.  It can often take more time and effort to license a modern RISC processor then it does to actually implement it.  The costs to use these architectures,both in time and money often prohibit their very use.

SiFive FE310 – Sample Donated by SiFive. Full 32-bit RISC on a 7.2mm2 die in a ~36mm2 package

It is out of this that SiFive began.  SiFive was founded by the creators of the first commercially successful open RISC architecture, known as RISC-V.  RISC-V was developed at Berkeley, fittingly, in 2010 and was designed to be a truly useful, general purpose RISC processor, easy to design with, easy to code for, and with enough features to be commercially useful, not limited to the classroom.  It is called the RISC-V because it is the fifth RISC design developed at Berkeley, RISC I and RISC II being designed in 1981, followed by SOAR (Smalltalk On A RISC) in 1984 and SPUR (Symbolic Processing Using RISC) in 1988.  RISC-V has already proved to be a success, it is licensed freely, and in a way (BSD license) that allows products that use it to be either open, or proprietary.  One of the more well known users is Nvidia, which announced they are replacing their own proprietary FALCON processors (used in their GPUs and Tegra processors) with RISC-V.  Samsung, Qualcomm, and others are already using RISC-V.  These cores are often so deeply embedded that their existence goes without mention, but they are there, working in the background to make whatever tech needs to work, work.

The RISC-V architecture supports 122 instructions, 98 of which are common to almost all prior RISC designs and 18 common to a few.  Six completely new instructions were added to handle unique attributes of the architecture (using a 64-bit Performance Register in a 32-bit arch.) and to support a more powerful sign-injection instruction (which can be used for absolute value, among other things). It uses 31 32-bit registers (Register 0 is reserved for holding the constant ‘0’) with optional support for 32 floating point registers.  True to the RISC design, it is a pure Load/Store processor, the only accesses to memory are via the Load/Store instructions.

Intel 4004 with 5 SiFive RISC Processors. The 4004 was meant for a calculator. The FE310 is meant for whatever your mind may dream up.

SiFive is unique among RISC IP companies.  They not only license IP but also sell processors and dev boards.  The FE310 (Freedom Everywhere 310) is a 320MHz RISC-V architecture with 16K of I-cache and 16K of scratchpad RAM fabbed by TSMC on a 180nm process. Even on this process, which is now a commodity process, the FE310’s efficient design results in a die size of only 2.65mm x 2.72mm.  On a standard 200mm wafer , this results in 3500 die per wafer, greatly helping lower the cost.  Its an impressive chip, and one that is completely open source.  What is more impressive is licensing SiFive cores, it is a simple and straightforward process.  The core (32 bit E31 or 64-bit E51) can be configured on SiFive’s site, with pricing shown as you go.  The license is a simple 7 page document that can be signed and submitted online.  Pricing starts at $275,000 and is a one time fee, there are no continuing royalty payments.  The entire process can be completed in a week or less.

In comparison, ARM, the biggest licensor of RISC processors, does not publish pricing, charges 1-2% royalties on every chip made, and has a license process that can take over a year.  The base fees start at around $1 million and go into the 10’s of millions, depending on how you want to use the IP, where it will be, and for how long.  For many small companies and users this is simply not feasible, and it is these smaller users that SiFive wishes to work with.  Licensing a processor for the next great tech, should not be the hurdle that it has become.  Many great ideas never make it to fruition due to these roadblocks.  We look forward to finding SiFive processors and cores in all sorts of products in the future.

Thanks to SiFive for their generous donation of several FE310 processors to the CPU Shack Museum.

January 28th, 2017 ~ by admin

Stratus: Servers that won’t quit – The 24 year running computer.

Stratus XA/R (courtesy of the Computer History Museum)

Making the rounds this week is the Computer World story of a Stratus Tech. computer at a parts manufacturer in Michigan.  This computer has not had an unscheduled outage in 24-years, which seems rather impressive.  Originally installed in 1993 it has served well.  In 2010 it was awarded for being the longest serving Stratus computer, then being 17 years.  Phil Hogan, who originally installed the computer in 1993, and continues to maintain it to this day said in 2010  “Around Y2K, we thought it might be time to update the hardware, but we just didn’t get around to it”  In other words, if it’s not broke, don’t fix it.

Stratus computers are designed very similar to those used in space.  The two main difference are: 1) No need for radiation tolerant designs, let’s face it, if radiation tolerance becomes an issue in Michigan, there are things of greater importance than the server crashing and 2) hot swappable components.  Nearly everything on a Stratus is hot-swappable.  Straus servers of this type are based on an architecture they refer to as pair and spare.  Each logical processor is actually made from 4 physical CPU’s.  They are arranged in 2 sets of pairs.

Stratus G860 (XA/R) board diagram. Each board has 2 voting i860. (the pair) and each system has 2 boards (the spare).  The XP based systems were similar but had more cache and supported more CPUs.

Each pair executes the exact same code in lock-step.  CPU check logic checks the results from each, and if there is a discrepancy, if one CPU comes up with a different result than the other, the system immediately disables that pair and uses the remaining pair.  Since both pairs are working at the same time there is no fail-over time delay, it’s seamless and instant.  The technician can then pull the mis-behaving processor rack out and replace it, while the system is running.  Memory, power supplies, etc all work in similar fashion.

These systems typically are used in areas where downtime is absolutely unacceptable, banking, credit card processing, and other operations are typical.  The exact server in this case is a Stratus XA/R 10.  This was Stratus’s gap filler.  Since their creation in the early 1980’s their servers had been based on Motorola 68k processors, but in the late 1980’s they decided to move to a RISC architecture and chose HP’s PA-RISC.  There was a small problem with this, it wasn’t ready, so Stratus developed the XA line to fill in the several years gap it would take. The first XA/R systems became available in early 1991 and cost from $145,000 to over $1 million.

Intel A80860XR-33 – 33MHz as used in the XA/R systems. Could be upgraded to an XP.

The XA is based on another RISC processor, the Intel i860XR/XP.  Initial systems were based on 32MHz i860XR processors.  The 860XR has 4K of I-cache and 8K of D-cache and typically ran at 33MHz.  Stratus speed rating may be based on the effective speed after the CPU check logic is applied or they have downclocked it slightly for reliability. XA/R systems were based on the second generation i860XP.  The 860XP ran at 48MHz and had increased cache size (16K/16K) and had some other enhancements as well.  These servers continued to be made until the Continuum Product Line (Using Hewlett Packard “PA-RISC” architecture) was released in March of 1995.

This type of redundancy is largely a thing of the past, at least for commercial systems.  The use of the cloud for server farms made of hundreds, thousands, and often more computers that are transparent to the user has achieved much the same goal, providing one’s connection to the cloud is also redundant.  Mainframes  and supercomputers are designed for fault tolerance, but most of it is now handled in software, rather than pure hardware.

April 14th, 2016 ~ by admin

DEC NVAX++ NV5: The End of VAX

DEC NVAX 21-34457-05 246B - 1992  -71MHz

DEC NVAX 21-34457-05 246B – 1992 -71MHz

About a year ago we covered the DEC RIGEL VAX Processor.  After The RIGEL DEC moved to make a single chip VAX processor that would include the CPU, FPU, and cache controller on one single die.  Work on the design began in 1987, and first silicon shipping in 1991.  Performance ended up being as good or better then the very high end VAX 9000 systems (implemented in ECL logic).

The original NVAX processor was made on a 0.75u 3-Layer CMOS process (DEC CMOS-4) and contained 1.3 million transistors in a 339 pin CPGA package.  Initial clock speed, in 1991 was 71MHz.  NVAX was then the fastest CISC processor made.  Speeds ramped up to 90.9MHz at the high end and a lower end of 62.5MHz. The first NVAX models were identified as 246B and 246C. Later versions, made well into 1996, were made on the CMOS-4S process, a 10% shrink to 0.675u and were labeled 1001C.

Internally NVAX was very familiar, the FPU was largely reused directly from RIGEL.  The NVAX also maintains the 4-phase clocking scheme from RIGEL, but moves the clock generator on chip. It also maintained the 2K of on die instruction cache from RIGEL, but added a 8K data/instruction mixed cache as well.  An L2 cache was supported in sizes of 256K 512K 1M or 2M, and located off chip.  The NVAX continued the 6-stage pipeline of RIGEL with some enhancements.  One of the greatest performance enhancements over RIGEL is the handling of pipeline stalls.  In the RIGEL pipeline, a stall in one stage would stall the entire pipe line, whereas on NVAX, in most cases, a stall in one stage does not prevent the other stages from continuing.

At nearly the same time as the development of the NVAX DEC was also developing a competitor to MIPS, a RISC architecture.  This new RISC architecture was codenamed EVAX, for Enhanced VAX, and was a purely RISC architecture that could run translated VAX CISC code with very little performance penalty.  It did however borrow from VAX, like the NVAX, EVAX used the FPU from the RIGEL. DEC went on to brand the EVAX as Alpha AXP, to separate it from the VAX line, though its internal naming of EV4, EV5 etc was left intact, as the last remnant of VAX.

DEC 2140568-02 299D NVAX++ 170.9MHz - 1996 - from a VAX7800

DEC 21-40568-02 299D NVAX++ 170.9MHz – 1996 – from a VAX7800

Having two high performance processor types at the same time left DEC in a bit of a dilemma so they created a third, known as the NVAX+ (DEC 262D).  The NVAX+ was originally made on the same CMOS-4 process as the NVAX and ran at 90.9MHz.  The NVAX+ was meant to be a bridge between the VAX line and the Alpha AXP.  It was a NVAX core, wrapped in an EVAX (Alpha AXP) external interface, it was made in the same 431PGA as the Alpha 21064 and was pin for pin compatible, the same board could be used for either.  It supported more L2 cache then the NVAX, supporting six cache sizes (4MB, 2MB, 1MB, 512KB, 256KB, 128KB),

In 1994 the NVAX+ was shrunk to the DEC CMOS-5 4-Layer 0.5 micron process resulting in the NVAX++ (DEC 299D) which ran from 133-170.9MHz.  These speeds continued to be the fastest CISC processors until Intel released the Pentium Pro at 180 and 200MHz in 1996.  Ultimately Intel’s dominance, and the coming dominance of RISC performance were the writing on the wall, and the VAX, and not long after it DEC itself were doomed to reside in the history books.  By 1997 The NVAX++ was off the market.  In 1997 the DEC Alpha team was operating out of offices owned by Intel (who also took over DEC’s fab’s), and in 1998 the remains of DEC, and the Alpha team, were bought by Compaq. And by 2004 Alpha was phased out in favor of Itanium (a now rather ironic decision by HP/Compaq).


Posted in:
CPU of the Day

September 15th, 2013 ~ by admin

Compaq 21364 Processor – The Omega of the Alpha

Compaq 21364 Alpha Prototype - 2002

Compaq 21364 Alpha Prototype – 2002

The DEC Alpha was one of the fastest processors of the 1990’s. The original 21064, manufactured in CMOS, rivaled the fastest ECL processors and blew away most everything else.  Clock speeds were 150-200MHz (eventually hitting 275MHz) at a time when a standard Intel PC was hitting 66MHz, at the very top end. It was manufactured on a 0.75u process using 1.68 million transistors.  The Alpha was a 64-bit RISC design, at a time when 16-bit computing was still rather common.  This gave the architecture a good chance at success and a long life.

The 21064 was followed by the 21164 in 1995 with speeds up to 333MHz on a 0.5u process, now using 9.3million transistors.  It added an on die secondary cache (called the Scache) of 96KB as well as 8KB instruction and Data caches.  These accounted for 7.2 million transistors; the processor core itself was only around 2.1 million, a small increase over the 21064.  At the time the main competition was the Pentium Pro, the HP PA8800 and the MIPS R10000.  Improved versions were made by both DEC and Samsung, increasing clock speeds to 666MHz by 1998.

In 1996 DEC released the next in the series, the 21264.  The 21264 dropped the secondary cache from the die, and implemented it off chip (now called a Bcache).  The level 1 caches were increased to 64KB each for instruction and data resulting in a transistor count rise to 15.2 million, 9.2 million of which were for the cache, and the branch prediction tables.  Frequency eventually reached 1.33GHz on models fab’d by IBM. However the end of the Alpha had already begun. DEC was purchased by Compaq in 1998, in the midst of the development of the enhanced 21264A.  Compaq was an Intel customer, and Intel was developing something special to compete with the Alpha.

Read More »

Posted in:
CPU of the Day

September 3rd, 2013 ~ by admin

ARCA: The Processor that came from the East

Arca-1 Rev2 166Mhz - Late 2001

Arca-1 Rev2 166Mhz Processor – Late 2001

China is generally seen as where devices are made or assembled, rather then where they are designed or invented, certainly in the computer world.  In 2001 a Chinese Gov’t funded venture known as ARCA Technologies changed that.  ARCA (Advanced RISC Computer Architecture) designed and released a completely new processor known as the Arca-1.  At the time there were two design houses working to create China’s first CPU. Arca, and BLX.  BLX made the Godson series of processors which are MIPS32 and MIPS64 implementations.  Arca, took a different approach.  Not only did they seek to make an indigenous design, but they wanted to do so with their own Instruction Set Architecture (ISA).

The ArcaISA is, of course, RISC based, it contains 80 instructions, with each instruction consisting of up to 3 operands, and contains 32 general purpose registers.  The original Arca-1 design is made on a 0.25 micron process (by which foundry is unclear, BLX used ST) with a 5-stage pipeline and drawing 1.2W at a clock speed of 166MHz.  It contained separate 32 way associative 8K caches for Instruction and Data.  The Arca also includes a DSP unit that has a pair of multiply/Accumulate Units (MACs) as well as basic SIMD support for media acceleration (including hardware MPEG2).   Not exactly impressive for 2001, but not bad for a first release.  However there was more to come.

Read More »

Posted in:
CPU of the Day

April 2nd, 2013 ~ by admin

CPU of the Day: Motorola XC88110 88000 RISC Processor

MC88100 20MHz - 1992

MC88100 20MHz – 1992

In the late 1980’s Motorola was developing a full 32-bit RISC processor from the ground up.  Initially called the 78000, it was renamed the 88000.  The first implementation of the 88000 Instruction Set Architecture was the 88100.  It included a FPU and integer unit but required a separate chip (the 88200 CMMU) for caching and memory management.  Typically 2 of the 88200s were required (one for instruction cache, one for data, 16kb of cache each).  A 64lb cache was also available called the 88204.  Made on a 1.5u process the 88100 contained 165,000 transistors while the CMMU chips contained 750,000.  Each chip dissipated 1.5Watts at 25MHz.  Prices in 1989 were $494 for the CPU and $619 each for the CMMUs.  A complete system of 3 chips would be nearly $2000.  Not exactly competitive pricing.

The initial, and biggest, customers for the 88000 were to be Apple, and Ford Motor Company, an unusual combination to say the least.  Apple invested in the 88000 to be the replacement for the 680×0 processors it had been using.  Ford was looking to replace the Intel 8061 processors (from which the MCS-96 MCUs were developed) that had run their EEC-IV engine computers since the early 1980’s.  Motorola (as well as Toshiba) had been second sourcing these for Ford for sometime.  Ford based its choice on the 88100 based ECU on the assumption that Apples adoption of the 88100 would guarantee good software and compiler support. If Apple stuck with it that is..

Read More »

January 18th, 2013 ~ by admin

CPU of the Day: Cypress CY7C601 25MHz SPARC

Cypress CY7C601-25GC

Cypress CY7C601-25GC – First package with heatspreader – Omitted on later versions

In Mid-1987 Sun Microsystems (now owned by Oracle) released the SPARC (Scalable Processor ARChitecture)  processor architecture to be used in their computers (replacing the 68k based systems they had previously used).  The SPARC was designed from the outset to be an open architecture, allowing manufactures to license and built processors that implemented it using whatever technology they wished.  The goal of this was to 1) build a large SPARC ecosystem and 2) keep prices in check by fostering competition among manufacturers.  The SPARC is still used today by Oracle, Fujitsu, the European Space Agency and others, owing largely to its design as an open architecture from the very beginning.

The first version was made by Fujitsu on a 20,000 gate array at 1.2 micron and ran at 16.6MHz.  In July 1988 Cypress  (later to be spun off as Ross and make the famous HyperSPARC line) announced the CY7C601.  This was the fastest implementation of the SPARC at the time.  It was made on 0.8u CMOS process and contained 165,000 transistors, dissipating around 3.3Watts.  As was typical of many processor designs of the time, it was an integer only processor, requiring a separate chip (the CY7C602) for floating point work.  In September of 1988, Cypress cross licensed the ‘601 to Texas Intruments in exchange for rights to the 8847 floating point processor.  This was mainly to appease one of Cypress main customers who demanded that a second source for the ‘601 chips be available, a demand more common in the 1970s then in 1988 but Cypress obliged.  Cyrpress also gained the rights to make the next generation SPARC processor that TI was developing.  TI would go on to make many SPARC processors, and continued to be the primary fab for Sun up through the SPARC T2 Plus in 2008.  Oracle now used TSMC to fab the T3 and T4 SPARC processors.

January 16th, 2011 ~ by admin

CPU of the Week: Intergraph Clipper C4 MCM

Fairchild developed the Clipper architecture in 1986, and sold it to Intergraph in 1987.  The design never enjoyed wide success and was only used in systems made by Integraph, as well as some by ‘High Level Hardware.’  The deign itself was RISC like and competed mainly with the Sun SPARC processors.

The final version was the C400 which was released in 1993 (preceded by the C100 and C300). Presumably there was a C200 but I have not seen any documentation on it.  The C400 ran at 50MHz (like the C300) and actually consisted of 3 separate chips. The CPU, the FPU and the CAMMU (Cache/Memory Management Unit).  Intergraph developed their own version of UNIX called CLIX to run on the clipper, and demonstrated a version of Windows NT that ran on the C400 as well. Ultimately the lack of software support, and the slow adoption killed the Clipper.  While Intergraph was designing the C5, Intel assured them a good supply of processors, and this convinced Intergraph to cancel the C5.

Intergraph C4 MCM

It was also available as a MCM (multi-chip-module) incorporating all three dies in a single ceramic package.  This is one of the nicest looking MCMs I have seen, unfortunately the bottom plate was missing when I got it, but the dies are at least visible.  I unfortunately am not sure which die is which so if you know, let me know.

February 4th, 2009 ~ by admin

Mystery Uncovered: National Winbond Nuvoton PC97551

So I bought some chips on eBay, they arrived, and are New Old Stock, made in 2004, really fairly recent.  I have a datasheet for them that is marked Winbond which I found rather strange, since the chips, as you can see are marked National. This in itself isn’t super unusual. Occasionally a smaller company will use a larger companies markings to get design wins. The larger company acts in essance like a co-signer, validating and approving of the design.

National PC97551

National PC97551

Winbond isn’t small though, and the datasheet was marked 2006.  A quick look on Winbond’s site shows no info on this chip. Turns out Winbond spun off their controller business to a company called Nuvoton. And  how did Winbond get the desgin? Yup, National sold off their Super I/O and embedded controller division to Winbond in 2005.

And it is of course a processor, in this case a 16bit RISC processor running at 20MHz based on the (formerly) National CompactRISC architecture.

February 3rd, 2009 ~ by admin

Software Configurable Processors – The Stretch S6000 Line

When designing a system, the best performance is often reached by using an ASIC, you can customize it to your design and tweak it for maximum performance.  This, however, adds costly development time, and little flexiblility.  You could use a general purpose processor; this saves dev time, and cost, but at the expense of performance.  What if you could have both? Off the shelf processor technology, AND customizable speed.

You can. This is what Software Configurable Processors are designed for. In simple terms they are a standard CPU core, wrapped in a FPGA.  This way istructions for the processort can be configured for maximun speed.  If you have a function in your code that is repetitive, it can be reduced to a single instruction for the processor.


One of the leaders in Software Configurable Processors is called Stretch. Their S6000 line of processors use a Tensilica Xtensa core (a VLIW RISC design), wrapped in a custom FPGA. In this way the RISC core can be programmaed on the fly, providing much faster performance then a normal processor, or DSP.