August 30th, 2012 ~ by admin

“We are hitting the limits of physics in many cases” – IBM zEC12 5.5GHz

z12 MCM Layout

“We are hitting the limits of physics in many cases”  These words, spoken by an IBM engineer about the new zEnterprise EC15 mainframe do well to describe the processor that runs it.  The z12, as we’ll refer to this processor, replaces the z196 as IBM’s top performer.  The z196 ran at a slothly 5.2GHz, the fastest commercial processor in the world until now.  The z12 runs at 5.5GHz and was designed to be clocked up to 6GHz.  It is made on a 13layer 32 nm High-K process (the z196 was made on a 45nm process).  This allowed a doubling of logic and cache density.

The EC12 is designed  with single thread performance in mind.  While many systems today focus on massive parallelism, and optimizing code for multi-threading, some tasks do not work well that way, data analytics, batch processing etc, are fundamentally serial processes, so less cores, and more speed per core is far more important.  The z12 is based on a MCM (Multi-chip module) that contains 6 Processing Units (PUs) and 2 Storage Controllers (SC, which contain 196MB of L4 cache each) for a total of 8 dies on each MCM.  Each PU contains 4, 5 or 6 active cores.  The MCM is a 103-layer glass ceramic substrate (size is 96 x 96 mm) containing eight chip sites and 7356 land grid array (LGA) connections.

IBM zEC12 6-core PU – 2.75 Billion Transitors – 5.5GHz

Each PU chip has 2.75 billion transistors. Each one of the six cores has its own L1 cache with 64 KB for instructions and 96 KB for data. Next to each core resides its private L2 cache, with 1 MB for instructions and 1 MB for data respectively.


There is one L3 cache, with 48 MB. This 48 MB L3 cache is a store-in shared cache across all cores in the PU chip. It has 192 x 512Kb eDRAM macros, dual address-sliced and dual store pipe support, an integrated on-chip coherency manager, cache and cross-bar switch. The L3 directory filters queries from local L4. Both L3 slices can deliver up to 160 GB/s bandwidth to
each core simultaneously. The L3 cache interconnects the six cores, GX I/O buses, and memory controllers (MCs) with storage control (SC) chips. Each SC chip contains 3.3 billion transistors, resulting in each MCM having just over 20 billion transistors and consuming over 1800Watts of power.

 

IBM zEC12 Individual core die shot

Each processor unit, or core, is a superscalar, out of order processor, having six execution
units, as follows:

  •  Two fixed point (integer)
  •  Two load/store
  •  One binary floating point
  •  One decimal floating point

Up to three instructions can be decoded per cycle and up to seven instructions/operations
can be initiated to execute per clock cycle (<0.18 ns). The instructions execution can occur
out of program order, as well as memory address generation and memory accesses can also
occur out of program order. Each core has special circuitry to make execution and memory
accesses appear in order to software. Not all instructions are directly executed by the
hardware. This is the case of several complex instructions: some are execute by millicode and
some are cracked into multiple operation, which are then executed by the hardware.

The zEnterprise systems are available with 1 to 4 MCMs, using up to 120 processor cores.  Power input requirements for each server range from around 5KVA, to over 27KVA, depending on how many MCM’s, Memory (up to 3 Terabytes is allowed) and I/O.  It may be awhile before the museum is able to acquire one of these, but at least we have die shots.

Sources:
IBM zEnterprise EC12 Overview 
IBM zEnterprise Redbook (PDF)

Tags:
,

Posted in:
Processor News

1 Response to “We are hitting the limits of physics in many cases” – IBM zEC12 5.5GHz

  1. 48 Cores and Beyond – Why more cores? | The CPU Shack Museum

    […]  There are, and will continue to be systems where this is the focus, such as the IBM system Zec12 which runs at a stunning 5.5GHz.  However, as power becomes a more and more important aspect of […]

Leave a Reply