home about pictures reference trade links  

INTEL P6 UNVEILED AT ISSCC

San Francisco, CA -- February 16, 1995 -- Intel's presentation at the
ISSCC conference today verified some of the early predictions of the
microarchitecture, and presented some vital statistics for their current
0.6um implementation. P6 will be a two-chip implementation, integrating
the P6 CPU and an L2 so that the bus between the two will run at the full
CPU clock speed.

The processor performs in-order-issue of x86 instructions, and generating
"uops" which are the atomic RISC-like operations supported on the underlying
hardware. Up to 3 uops can be decoded per cycle, with some restrictions
on the types that can be decoded simultaneously. A micro-instruction
sequencer hanging off the decoder provides the x86 instruction to uop
instruction mapping. The uops are then passed to a single 20-entry
reservation station. The reservation station decouples the in-order issue
with the out-of-order execution unit. When a uop is completed, it is placed
into the 40-entry reorder buffer. Completion is in-order, allowing for
precise interrupts and exceptions.

The instruction fetch and decode unit is 8 pipeline stages deep. It also
includes a 512-entry branch-target buffer. Of those 8 stages, 2.5 are
dedicated to the instruction-cache access, alone, with another 2.5 stages
used for the decoding of the x86 instruction. The out-of-order core is
3 pipeline stages deep for simple one-cycle execute instructions -- 2 cycles
are required to set-up the reservation station accesses. The floating point
execution pipelines are deeper. Lastly, the re-order buffer and retirement
unit is another 3 pipeline stages. The minimum number of cycles required to
complete an instruction is 14.

Intel is still fabricating with BiCMOS technology, and claimed that using
BiNMOS drivers for large-fanout gates provided a 15% performance increase.

Chip Vital Statistics
---------------------

Performance: 200 SPECint92 (estimated). No SPECfp given.
Clock: 133 MHz
Process: 4 metal 0.6um BiCMOS.
Vdd: 2.9 V
Power: 14 Watts (estimated)
Transistors: 5.5 million
Package: Dual-cavity PGA.
L1 Cache: 8k I/D caches. Dual-ported D-cache supports one
load and store per cycle.
L2 Cache: 256 kB, Unified.
External Bus: 64-bit, can operate at 1/2, 1/3, or 1/4 of the CPU processor speed, with one data transfer per cycle.
Superscalar: In-order issue, out-of-order execution, in-order
retire. Supports speculative execution. Peak
issue is 3 uops ( <= 3 x86 instructions ).
   

 

Return to main reference page

 
Copyright © 2006 CPUShack.Net All pictures and content are property of CPUShack.Net. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed without the express written permission of CPUShack.Net

Contact The CPUShack