October 20th, 2016 ~ by admin

Processors to Emulate Processors: The Palladium II

Cadence Palladium II Processor MCM 1536 cores - 128MB GDDR - Manufactured by IBM

Cadence Palladium II Processor MCM 1536 cores – 128MB GDDR – Manufactured by IBM

Several years ago we posted an unusual MCM that’s purpose was a mystery.  It was clearly made by IBM and clearly high end.  While researching another mystery IBM MCM both of their identities came to light.  The original MCM is an emulation processor from a Cadence Palladium Emulator/Accelerator system.

In the 1990’s IBM had been working on technology to make emulating hardware/software designs more efficient as such designs got more complicated.  At the time it was most common to emulate a system in an FPGA for testing, but as designs grew more complex this became a slower and slower process.  IBM developed the idea of an emulation processor.  This was to be known as CoBALT (Concurrent Broadcast Array Logic Technology).  It was licensed to a company called QuickTurn in 1996.  At its heart the QuickTurn CoBALT was a massively parallel array of boolean logic processors.  Boolean processors are similar to a normal processor

Here is a flipped (and very rough) die from a Palladium II. You can make out the very repeating design of the 768 boolean processors.

Here is a flipped (and very rough) die from a Palladium II. You can make out the very repeating design of the 768 boolean processors.

but only handle boolean data, logic functions such as AND, OR, XOR, etc.  Perhaps the most well known, is the boolean sub-processor that Intel built into the 8051, it excelled at bit manipulation.  The same applies for the emulation processors in CoBALT.  Each boolean processor has at its heart a LUT (Look Up Table), with 8-bits to encode the logic function (resulting in 256 possible logic function outputs) and the 3 gate inputs serving as an index into the LUT, as well as the associated control logic, networking logic, etc.

A target design is compiled and emulated by the CoBALT system.  The compiling is the tricky part, the entire design is broken down into 3-input logic gates, allowing the emulator to emulate any design.  Each processor element can handle one logic function, or act as a memory cell (as many designs obviously include memory).  The CoBALT had 65 processors per chip, and 65 chips per board, with a system supporting up to 8 boards.  This 33,280 processor system could compile 2 Million gates/Hour.  The CoBALT plus sped this up a bit and supported 16 boards, doubling capacity and added on board memory.

Cadence Palladium MCM - 256 core processor + 64MB DRAM + 1MB SRAM

Cadence Palladium MCM – 256 core processor + 64MB DRAM + 1MB SRAM

The next design (know as ET4) was being designed around the time Cadence bought QuickTurn in 2002.  This is the first MCM we found, it includes a single processor die containing 256 Boolean processors, 64MB of DRAM, and a 1MB SRAM.  The boolean processors were upgraded to compile design down to a 4-input gate rather than the prior 3.  A 16 board system would then have 66,560 processors and 65GB of memory and could compile 5 Million gates/hour.  Emulation speed was 750KHz, about the speed of an Intel 4004 in 1972, but this is a massive array of processors emulating a design, 750KHz is considered VERY good emulation speed.

In 2004 Cadence released the Palladium II (Cadence believes they need to double emulation speed/capacity every 2 years to keep up with what is being emulated).  THe Palladium II is made on the same 90nm IBM process but moves to a 2 die MCM with 128MB of GDDR 300MHz RAM per MCM.  Each die contains 768 processors for a total of 1536 PER MCM.  interchip communication speed runs at 190MHz and emulation speed is 1.5MHz, emulation of up to 256 million gates and 61,440 I/Os.  It can compile 10-30 million gates/hour, compared to several days on an FPGA emulator.

Palladium III System. Note the massive emulation cable. This is a clock speed enhanced Palladium II

Palladium III System. Note the massive emulation cable. This is a clock speed enhanced Palladium II

What does this mean? FOr design companies it saves time to market.  80% of most chip design cost is spent on emulation.  If emulation time can be sped up, the chip can make it to market, and to revenue sooner.  This makes these $1 Million + systems worth it.  It also means they get obsolete FAST.  One can pick up a Palladium or Palladium II on eBay for a few thousand dollars, or less at a scrap yard.

Cadence continues the Palladium line, with the more recent Z1 can simulate up to 9.2 Billion gates, at 140 Million Gates/hour, a design that took an hour on the original CoBALT is handled in about 52 seconds.  One of the most well known users of such systems is NVidia, they are a fabless company, and the faster they then design, emulate and verify the next GPU or processor the faster to revenue market they get.  THe Nvidia Fermi was emulated on a massive Palladium system, with literally a cable going to the back of a PC that emulated the graphics card.  The Tegra processors, and the new Nvidia Drive PX 2 autonomous vehicle system that is powering Tesla vehicles around the globe was emulated on a Cadence Palladium class system.

 

 

Leave a Reply