home about pictures reference trade links  

POWERPC 620 - A LOOK UNDER THE HOOD (October 21st 1994) If there is one thing that the new PowerPC 620 proves, it is that there are other ways to increase the speed of a chip than cramming in more functional units. After all, it only contains the same number of execution units as the 604, yet provides substantial, if not scorching, speed improvements. Factoring out clock-speed differences shows that a 133MHz 604 would get up to SPECint 212.8 and SPECfp 219.5 performance compared to the PowerPC 620 which is meant to achieve 225 and 300 respectively.

SPEC marks are not the only bench test, of course. Time and time again, when the Somerset designers talk about their new baby, they emphasise that it is a processor optimised for commercial, transaction-processing work. They say the 133MHz 620 will deliver twice the TP performance of the 100MHZ 604.

How? Make no bones about it; a lot of this speed improvement is down to the enlarged cache - twice the size of the 604 and 8-way semi-associative, as opposed to the 604's four-way effort. A fractional improvement is also provided through a slightly faster transistor design. The rest comes through a number of relatively small nips-and-tucks.

To begin at the beginning, the 620 adds a pre-decode stage to the instruction pipeline. The stage categorises instructions as they are pulled from the instruction cache in terms of the resources that they will use: operands required, registers used and so on. The data needed by the pre-decode stage is actually held within an additional 7k bytes of cache. This, together with 1k used for parity information means that the chip's instruction cache is really 40k in size, though only 32k is visible to the outside world. The pre-decode is designed to eliminates an entire stage from the instruction pipeline, reducing the performance if branch prediction screws up and the processor takes a wrong turn.

That's only an outside chance however, according to the designers. Motorola's Brad Beavers says that simulations show the 620 will get it right about 90% of the time. The branch prediction capabilities have been improved over the 604 through the simple act of bumping up the size of the branch history table from 512 entries to 2048. The BHT predicts the likelihood of any branch being taken from past behaviour. At the same time, the branch target address cache is increased from 64 to 256 entries, so that the chip not only knows whether it should branch or not but also where it should branch to. Speculative execution is also improved; the new chip can run past four unresolved branches, where the 604 could manage just two.

Processor stalls have also been reduced somewhat by the addition of extra reservation stations in front of the Load/Store Unit and the Branch processing unit. But other than that the main functional units look virtually identical from the raw specifications.

The one area where the 620 really differs from its predecessor (other than the 64-bit data extensions) are in its, cache its memory handling and the system bus. Big data sets and multiprocessing should be the 620's forte. Brad Beavers says that they expect to be able to stick six PowerPC 620s into a machine with no additional glue logic. In addition to the bigger on-board L1 cache, the chip also has its L2 cache controller on-board, meaning that an external cache can be added with the minimum of glue logic. The L2 cache can be configured from anywhere between 1MB and a chunky 128MB.

Address bus capacity is a classic problem with large multiprocessor configurations - each processor needs to keep an eye on the memory that the other processors are modifying, to avoid clashing or trying to access the same piece of data. Usually this is done through a system of "snoops", querying other processors' caches. The 620 can essentially pipe-line snoop queries and responses so that it can put out new addresses every other cycle without having to wait for responses from the other processors. Bus width has also been extended to 128bits, though it can also work in a 64bit mode. The 620's designers tend to brush aside any queries over the processor's performance by saying that SPECints really don't do justice to its TP-aimed performance. The trouble is, some of the other RISC manufacturers are saying exactly the same.

 

Return to main reference page

 
Copyright © 2006 CPUShack.Net All pictures and content are property of CPUShack.Net. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed without the express written permission of CPUShack.Net

Contact The CPUShack