January 18th, 2019 ~ by admin

Part 4: Mini-Mainframe at Home: Benchmarks and Overclocking

Part 4 of the Story of a 6-CPU Server from 1997.  In this final section we will first explore (briefly) the theory of running a 6-CPU SMP system (with processors designed for 2 or 4 way) and then move to benchmark the system and overclock it.

For the background of the ALR 6×6 and Pentium Pro processors that form the basis of this project please see:

Previous Parts of the Series

Part 1: Mini-Mainframe at Home – Introduction
Part 2: Mini-Mainframe at Home: Installing a Modern OS
Part 3: Mini-Mainframe at Home: The ALR 6×6 Hardware and BIOS

Features of the architecture and operation of the six CPU

So, as the server was originally shipped with six Pentium Pro “Black” processors, I decided to add six Pentium Pro “Gold” processors with a frequency of 200 MHz and a 256 KB L2 cache for contrast. Such a volume is just four times smaller, and at the same time it will be interesting to check the effect of the cache in such a volume: six megabytes versus one and a half.  But before starting the tests, I will focus on the principle of interaction of six processors in this system. To overcome the limitations of Intel on building a system with more than four processors, ALR engineers with the support of Unisys suggested using an inter-processor interaction scheme using arbitration:

The theory behind this architecture is as simple as it is powerful. Inside new six-way systems are two Tri-6 CPU cards, A and B (Figure 1). Each of these cards is an independent, three processor ready SMP bus, complete with all logic Active CPR processor protection, and auto-recovery technology built on each CPU card. These two Tri-6 CPU cards are then plugged into a 64-bit parity SMP bus. This design keeps the processors closely coupled, just like a parallel bus architecture, without the related heat and design problems. A separate four-way interleaved memory card is attached to the bus, supporting a sustained data bandwidth of 533-MB per second. This bandwidth is ample to support two full PCI buses as well as an EISA bus bridge.

To overcome the logical limitations of the Pentium Pro chip, six-way servers use a unique expanded bus arbitration configuration referred to as Dynamic Orchestration. The best way to understand how this system works is to compare it to a typical four-way SMP architecture. On a four-way system, bus arbitration is implemented in a “round robin” fashion. That is, each processor has equal rights to the bus, and access is handled in an orderly fashion. For example, if all processors needed access to the bus, CPU 0 would gain access first, followed by CPU 1, CPU 2, CPU 3, and then back to CPU 0. If CPU 2 was executing a cycle, and both CPU 3 and CPU 1 requested use of the bus, control would first pass to CPU 3, before cycling back to CPU 1.

For purposes of this four-way arbitration, processors are identified using the two-bit ID code. The six-way solution borrows this convention, with some important modifications. Within each Tri6 CPU card, individual processors are identified using the two-bit ID code. This yields four possible combinations, although only ID codes 0 through 2 are needed. A chip on each Tri6 card handles the arbitration, following the “round robin” scheme found in a four-way system. In this case, however, the fourth processor has been replaced by a sort of “phantom” processor that actually represents the other Tri6 card:

The figure above shows the six-processor scheme of the server board ALR Revolution 6×6 and its clones. Thanks to this approach, the appearance of 8, 10 and more processor systems has become possible.

Building a chessboard from various models of Pentium Pro, I thought that I could not find a larger processor. Even the 32-core AMD Threadripper 2990WX next to the Intel Pentium Pro does not seem so big.

However, The CPU Shack sent me this photo. On the left is the engineering version of the Xeon Gold 6142 on the LGA3647 socket, on the right another engineering version, but already the Intel Xeon’a Phi in the same LGA3647 version. As you can see, the story is back to square one and perhaps all subsequent processors will not be placed on the open palm of the hand. Although the processors in the performance of LGA2066 is still far from Intel Pentium Pro.

Overclocking 6 cores together and separately

Far not often, manufacturers of server hardware provide an opportunity to choose the work of processors or RAM in non-standard conditions or in overclocking. From my personal experience, I can say that my favorite manufacturer of this kind of equipment is Asus, its motherboards for workstations, for the most part, give a choice, which was confirmed by me in previous articles.

With ALR Revolution 6×6, things are a little more complicated. If the overclocking capability was 100% dependent on the BIOS capabilities of the motherboard, then overclocking could be forgotten. But he came to the rescue – jumpers!

Jumpers are an overclocker’s best friend (s). Popular wisdom. And it’s good when they are. If you refer to the manual of the motherboard, then you can see just such a selection plate of the CPU multiplier using the jumper combinations.

A total of 14 options, of which half is responsible for the choice of FSB, the value of which can take two options: 60 and 66 MHz, the rest are responsible for the multiplier. As a result, the Pentium Pro, which has a free multiplier, can hypothetically be overclocked to 366 MHz. But I’m afraid for these figures will require at least liquid helium. It’s really quite impressive that ALR chose to implement such a large range of multipliers.  Perhaps they were hoping for more to the Pentium Pro then Intel made. A similar plate is applied on the PCB of the server’s motherboard itself.

Knowing the maximum overclocking capabilities of Intel Pentium Pro processors, which are at 300 MHz when using liquid nitrogen, and when air cooled, it’s already good, when the frequency from 200 MHz rises to 233 MHz, if it goes further, this is definitely a success. But as six processors and no selection have been applied by me, the main criterion in my case is luck: D

As a result, six Intel Pentium Pros with a frequency of 200 MHz and a second-level cache memory of 1 MB, while setting up jumpers 66.6 x 3.5, were launched at a frequency of 233 MHz. This is 16.5% from each core or totally + 99% to the performance of the entire system. Or almost the seventh processor for nothing. Below is a link to the CPU-Z validation. http://valid.x86.fr/0fisvz

More processors did not want to overclock.

Then, after squeezing out a half-tube of thermal paste, on a new batch of test subjects, and the paste is needed for such areas decently, I installed radiators on six Intel Pentium Pro CPUs with a frequency of 200 MHz and 256 KB cache, that is, four times smaller. Forgetting to return the jumpers to the place from the previous start, I saw 233 MHz on the screen. The next step was the frequency of 240 MHz according to the formula 60 x4 and it was taken! Good luck and this time did not disappoint.

And this is already plus 20% of the performance for each processor. The next mark was supposed to be 266 MHz, but alas, I saw only a black screen. This frequency did not submit to the six samurai. But even 240 MHz is a decent figure for so many CPUs. It remains to find out the last – performance in the tests!

Test bench and test results

The test bench will include processors:

  • 6x Pentium Pro 200 МHz L2=1024KB;
  • 6x Pentium Pro 200 МHz L2=256KB;

Motherboard:

  • Unisys Aquanta HS6 (10140), «Intel 450GX» chipset (6x Socket 8);

Videocard:

  • PNY GeForce2 MX400 PCI 64Mb (Forceware 93.21);

SSD:

  • Kingston SSDNow V300 (60 Gb).

Performance testing was carried out in the “Windows Whistler .Net Advanced Enterprise Server, Build 2600, Service Pack 2, 3 in 1”  using the following software:

  • Super Pi mod. 1.5XS (1M Task);
  • PiFast v.4.1;
  • wPrime v.1.43;
  • HWBOT Prime v.0.8.3;
  • CPU-Z v.1.87.0;
  • WinRAR x86 v. 5.40;
  • 7-Zip v.16.04;
  • AIDA64 5.50.3600;
  • SiSoftware Sandra 2004 SP2;
  • Cinebench 2003;
  • Cinebench R10.

Tests

To get warmed up, a pair of single-flow tests: Super Pi task 1M and PiFast.

Super Pi mod. 1.5XS (1M task)

Minutes (less is better)

If you compare the performance of the Pentium Pro with a clock frequency of 200 MHz and just a Pentium with a frequency of 100 MHz, you can see more than 2 times the superiority of the server version of Intel. The L2 cache size affects the final result. The processor with a frequency of 240 MHz could not beat his colleague with a frequency of 233 MHz with a 4 times larger L2 cache. Here we see that the limiting factor was mainly memory bandwidth.  The PPro at 233MHz with 1MB of cache was able to outperform the 240MHz 256K model as it was running on a higher bus speed (66MHz vs 60MHz) and had to go to main memory for data less (as more could be cached on chip) At the same time, no one Pentium Pro managed to reach Pentium II with a frequency of 233 MHz (Klamath). A Pentium I overclocked using extreme cooling to 300 MHz becomes the winner of this round, despite the absence of an onboard second-level cache.

PiFast v.4.1

Seconds (less is better);

In this test, the 300 MHz frequencies did not bring Pentium-I wins, and even the Pentium Pro outperformed the equivalent Pentium II (Klamath). It would seem that two similar tests, but different algorithms. But these tests were for illustrative example, of course the whole ALR Revolution 6×6 POWER in supporting multithreading, here will be where to turn.

For all of the tests below, I tried to find similar results with different numbers of physical processors in the system, so that it would be easier to imagine what the actual performance of this interesting machine. Results were taken from the database of HWBOT.org

wPrime v.1.43

Seconds (less is better);

CPU Cores, like physical processors, do not happen much, do they? Six Pentium Pros turned out to be faster in their default state than a pair of the very first Pentium III with a frequency of 500 MHz in the Slot 1 form factor, and even faster than a pair of 550 MHz Xeons with a weighty L2 cache of 2 MB in the form-factor of Slot 2. But a single 1 GHz S462 AMD Athlon with Thunderbird core turned out to be faster.

Acceleration of each processor by 16.5% or +1 additional processor overtakes a pair of Pentium III E with a clock frequency of 600 MHz on the Coppermine-256 core. And six Pentium Pros with L2 = 256 Kb can compete on an almost equal basis with Celeron at 2.4 GHz on the Northwood core, which went on sale in March 2003 at a price of $127. If you count, then since the appearance on the market of ALR Revolution 6×6 only 6 years have passed. Progress spares no one, even monsters like the ALR Revolution 6×6.

HWBOT Prime v.0.8.3

Total score (more is better)

Running this Java test was made possible thanks to my Windows OS miracle. The results are of course very funny, otherwise they can’t be called LOL. Bear in mind that Java only came out in 1995 just a few months before the Pentium Pro was released.  Drop the comparison with two Pentium III Xeon and take a look at the neighborhood of six 200 MHz Pentium Pro and Samsung Galaxy S3 on the already old ARM Cortex-A9 with four cores with a frequency of 1200 MHz. What can we say about modern flagships that will not leave a stone unturned from the superserver of those years. And if such a performance is divided by the mass of devices, then ALR Revolution 6×6 in this ratio certainly will not have a chance.

Another very nice look is a couple in the form of an Intel Atom Z2480 processor for mobile devices with a frequency of 2 GHz, with one core and Hyper Threading technology and the overclocked to the limit Pentium Pro with a frequency of 240 MHz. This “baby” size 12×12 mm and TDP equal to three Watts! comes on equal terms with six processors of their past. The rest of the comments are superfluous.

WinRAR x86 v. 5.40

Kb/sec (more is better)

Performance indicators in archiving data from a test system are more than decent, it was possible to overtake even 2 processor systems with faster more modern RAM, which are far ahead in their speed characteristics the ancient Fast Page Mode memory.

7-Zip v.16.04 (dictionary size 32 Mb);

Total score in MIPS (more is better)

In this test, it can also be stated that all 6 processors did not lose face. Even against the background of two Pentium II Overdrive 333 MHz, they look good. Cache memory also contributes to the final result, as does the FSB frequency, which did not allow six 240 MHz Pentium Pros to rise above the 6x 233 MHz Pentium Pros.

AIDA64 5.50.3600

Reading from memory, MB/s

Below in the picture is a screenshot from AIDA64.

Reading from the memory of this system at a very good level, but that cannot be said about the speed of writing to memory.

AIDA64 5.50.3600

Write memory, MB/s

AIDA64 5.50.3600

CPU Queen, score (more is better) FPU Julia, score (more is better)
The test system turned out to be more than 7 times faster than a Pentium with a frequency of 166 MHz and 3.15 times faster than a Compaq ProLiant 800 server with a pair of 200 MHz Pentium Pros. Similarly, 6 processors overclocked to 240 MHz are faster than a Compaq ProLiant 800 server with a pair of 200 MHz Pentium Pros.

And of course the test Cache and Memory Benchmark. From left to right: Pentium Pro 200 MHz (L2 = 1024 Kb), Pentium Pro @ 233 MHz (L2 = 1024 Kb) and Pentium Pro @ 240 MHz (L2 = 256 Kb).

SiSoftware Sandra 2004 SP2

Arithmetic benchmark, MIPS (more is better) Multi-media benchmark, it/s (more is better)

 

Cinebench 2003

Score (more is better)

Cinebench R10

Score (more is better)

To the question on what is better to produce the final rendering. Very responsive test, it is a pity that we could not find similar results, but the graph shows that the final result depends on both the clock frequency and the amount of cache memory of the second level processor.

As additional information for reflection, I will write the total time to complete this test. So the Intel Core i7 7800X performed the final rendering in 20 seconds exactly, and for the same task, the six Pentium Pros took 21 minutes and 14 seconds. Like this!

CPU-Z 1.87.0 Benchmark

Score (more is better)

Conclusion

Well, perhaps that’s all, all promises are fulfilled, the results were very interesting, there is something to think about and think about, I have nothing more to add to them. I hope the half a year spent time on this project was spent not in vain. The issue with the Intel Pentium II Overdrive 333 MHz remains open, but I still don’t lose hope for the second part of this article.  Once we find 6 working Overdrives we will see what a revolution that provides in the ALR 6×6. And most importantly, can they be overclocked?

 

Previous Parts ot the Series:

Part 1: Mini-Mainframe at Home: Introduction
Part 2: Mini-Mainframe at Home: Installing a Modern OS
Part 3: Mini-Mainframe at Home: The ALR 6×6 Hardware and BIOS

Posted in:
Boards and Systems

3 Responses to Part 4: Mini-Mainframe at Home: Benchmarks and Overclocking

  1. Andre L. C.

    Hello from Brazil,

    A very interesting experiment and a true demonstration of patience and determination.
    Thanks for sharing your work and congratulations.

  2. ph4nt0m

    “As a result, the Pentium Pro, which has a free multiplier, can hypothetically be overclocked to 366 MHz.”

    It cannot. The Pentium Pro treats multipliers above 4x as 2x. 266MHz is as high as it can go on the 66MHz system bus. It takes a bit of luck to find a 512K chip that can actually operate at 266MHz. 256K and 1M are nearly hopeless.

  3. Sudos

    A friend of mine has an ALR 6-way and I just went through this entire article thinking he had the last two of these in existence, only to be pleasantly surprised that someone already did all the hard work for us of getting it running *something* modern. Are you able to put a compressed dd image of the boot drive you made for this anywhere so that others have some sort of modern OS that this machine can boot?

Leave a Reply