May 13th, 2020 ~ by admin

Chapter 2: Mini-Mainframe at Home: The Story of a 6-CPU Server from 1997

At the end of 2018, I started one project, which was called “Mini-Mainframe at Home: The Story of a 6-CPU Server from 1997”. It was dedicated to the ALR Revolution 6×6 super server with six Intel Pentium Pro processors and a cost comparable to that of a brand new Ferrari in 1997. It took some 450 days and finally follows the continuation of the story, the super server received the long-awaited upgrade – six Intel Pentium II Overdrive 333 MHz Processors! For those years, such power was simply colossal, but how it compares with today’s and how much increased performance you will learn from this article.

I’ll admit 450 days is quite a long time, so I will briefly recall the contents of the previous series of the article.
And it all started like this: plunging into the world of mainframes and supercomputers , I wanted to try some super powerful system and the choice fell on the ALR Revolution 6×6 super server, which had six Socket 8 and supported up to 4 GB of RAM. For the late 90s, these were scary numbers, as well as its cost. One processor for such a system was estimated by Intel at $ 2675, and six were required, for one module of 256 MB of server memory it was necessary to pay $ 3500, and sixteen sticks were needed to get the coveted 4 GB of RAM.

A disk subsystem was also available with seven raid controllers and an 860 GB disk array, a twenty-kilogram power supply unit and the server itself … As a result, it was possible to reach amounts from 270 to 500 thousand dollars, and if you add here the inflation level over the years, these numbers will range from 435 to almost 800 thousand dollars. Now, in terms of performance, any low-cost computer will be faster than this monster, but the very fact of having such an opportunity in 2020, to feel the full power of that time, makes these large numbers insignificant, it is much more important to find and assemble such a monster.

ALR 6×6 Available Options

In the previous story, I studied performance with six Intel Pentium Pro processors with a frequency of 200 MHz and a 256 KB second-level cache and even overclocked all six copies to 240 MHz. As well as six top-end Intel Pentium Pro “black color” with a frequency of 200 MHz and a 1M L2 cache, which were able to overclock to 233 MHz. In my configuration, I had 2 GB of RAM standard FPM, 16 memory modules of 128 MB, which took over 4 minutes to initialize during the initial POST procedure.

Four gigabytes of RAM would bring this figure to 9 minutes, which is comparable to accelerating a train or taking off an airplane, although the latter can do it much faster. But then, having loaded at my disposal, six physical cores arrived at once, but without the support of MMX and especially SSE instructions.

Intel Pentium II Overdrive 333 MHz processor

The basis of any computer is the central processor. Intel Pentium Pro processors first appeared in 1995. Then there were the usual Pentiums without the Pro prefix, but this prefix in the name of the models said that these processors are positioned primarily as solutions for servers and workstations with their special Socket 8. The usual Intel Pentiums were installed in Socket 5 and 7. A significant difference between the Pro and the regular version of the Pentium desktop was the presence of a second-level cache in the Pro version, which, being on the same package, worked at the processor’s core frequency, thus allowing it to significantly increase performance.

For the various Intel Pentium Pro models, the L2 cache size ranged from 256 KB to 1 MB. Pentium Pro’s first level cache was 16 KB, of which 8 KB was for data and the same for instructions. For the subsequent Intel Pentium-IIs, the second-level cache worked at half the processor core frequency and amounted to 512 KB for all models, and it was located in the form of separate microcircuits on the cartridge at a distance from the CPU die itself. The L1 cache size was doubled in size to 32K, which offset the performance hit of the slower L2 cache.

Pentium Pro Slot 1 Slockets – Also made were Slot 2 versions.

The tested processors were produced at a 350 nm process technology. The number of transistors in the Pentium Pro totaled 5.5 million for the processor core itself and as many as 15.5 – 31 million were in the L2 cache memory, depending on its size. The L2 cache itself was located on a separate die near the CPU core. The processor had a free multiplier and the system bus frequency, depending on the model, was 60 or 66 MHz. Overclocking of the processor rested on overclocking the L2 cache, it the limiting factor.

CPU core on the right, L2 cache on the left

The Intel Pentium II Overdrive 333 MHz was a very interesting processor. This processor appeared, it can be said, thanks to the US Government, which funded a program to create supercomputers for modeling nuclear explosions and tracking the state of the country’s nuclear arsenal. The US government allocated funds for the construction of such a supercomputer, Intel won the tender and in 1997 handed over a turnkey supercomputer called “ASCI Red”.

ASCI Red consisted of 9298 200MHz Pentium Pro processors , all modules of the supercomputer were located in 85 rack cabinets. The total amount of RAM was 594 gigabytes, the disk subsystem consisted of 640 hard drives with a total disk space of 2 terabytes (consider now that this amount of storage is now provided by a single inexpensive hard drive). ASCI Red was the first supercomputer to break the line of 1000 GFLOPS or 1 teraflops. For several years in a row, it led the list of the TOP-500 fastest supercomputers in the world.

In 1999, modeling tasks became more complicated and the capacities of ASCI Red were already beginning to be lacking; an upgrade was needed. Programmers will always find a way to need more performance, no matter what you give them, especially if its for modeling the reliability of a strategic deterrent, or the weather, or…..  Intel won the tender again, and thanks to this event, a unique processor with a Socket 8 socket and the power of the Pentium II – Intel Pentium II OverDrive with a frequency of 333 MHz was born. The upgraded second-generation ASCI Red with 9632 processors after the upgrade provided 2.38 TFLOPS performance in the Linpack benchmark. Such high-quality characteristics allowed ASCI Red to hold the title of the fastest supercomputer until June 2000.

The Intel Pentium II OverDrive, which was the final stage in the evolution of Socket 8, belonged to the sixth generation of Intel processors (P6). The processor was announced in August 1998, despite its specificity, the recommended cost of the processor in batches of 1000 pieces was $ 599. Physically, this processor was installed in Socket 8, however, in fact, we see “Deschutes” core Pentium II , supplemented by a 512-kilobyte L2 cache operating at the processor core frequency, unlike the normal Deschutes core PIIs.  These are the only Pentium II processors (excluding the Celerons of course which had a on die cache and the Mobile Dixon core which had 256k of fullspeed cache) with a full speed L2 Cache. The Pentium II OverDrive VRM was integrated into the Pentium II OD module and lowered the supplied voltage from the motherboard (3.1-3.3V) to the required 2 volts for the PII core.

Pentium II Overdrive Module with Heatsink removed. CPU die is on the left and 512K of Cache on the right

The processor multiplier is locked at 5x, which with a 66.6 MHz FSB gives a total of 333 MHz. There are two versions of this processor, the first with SPEC – SL2KE, which is equipped with an active cooling system and SL3EA with a passive one. But the biggest plus is not only the increased processor clock speed, but also the support of the MMX instruction set and some others.

Since the motherboard supports multiplier changes up to x5.5, which would result in 366 MHz, I at the same time studied the properties of engineering samples of the Intel Pentium II Overdrive 333 MHz with SPEC Q0125. As the owner of such a processor told me, even the multiplier for this Engineering Sample is locked. Maybe it’s for the better, since acquiring six of these ES processors will be comparable to buying any top-end modern CPU, but first you need to find them somewhere else in such quantity.

Mendocino

It would seem that I spent more than a year to find and purchase six Intel Pentium II Overdrive 333 MHz processors, which now sell for an average of $ 200 at the world-famous flea market (Ebay.com), and got the maximum ALR Revolution 6×6 config, but as always there is no limit to perfection. But more about that below.

Mendocino is the name of the core of Celeron processors manufactured since 1998 in the performance of SEPP (Slot1) and PPGA (Socket 370). In 1999, Intel abandoned the Slot 1 form factor in favor of the familiar PPGA. Plastic Celeron processors were cheaper to manufacture, manufactured using 250 nm technology and had a built-in 128 KB L2 cache running at full processor core speed. The frequency range ranged from 300 to 533 MHz.

PPGA Celeron Processor – Full speed 128K of cache

And where does the Celeron Mendocino fit? The fact is that it is possible to launch Celeron processors in SMP (Symmetric Multiple Processor) and enthusiasts have been doing this for quite some time. Celeron at its core has the core of a full-fledged Pentium II, which, as you know, supports SMP. The difference between these processors is only in the L2 cache, Celeron L2 = 128 Kb, but the frequency can reach higher values of 533 MHz versus 450 for Pentium II.

Support for SMP is the presence of the BR # 1 signal, which is physically present in the processor itself, but has not been routed on the motherboard. Once this secret was discovered, the solution to the SMP problem was not long in coming. Enthusiasts picked up a soldering iron, and motherboards manufacturers ABIT and QDI, which were inspired by this idea, even released their serial products. Suffice it to recall the ABIT BP6 motherboard based on the Intel 440BX chipset with two Socket 370. (Editor’s Note: Oh the Days of running my BP6 with dual Celeron 366s happily running at 550MHz, Intel was not amused, but I was)

Further, there is one adapter from the company Powerleap model PL-ProII, which just allows you to install Intel Socket 370 Celeron processors  in Socket 8 motherboards, they are that closely related.

Therefore, it is theoretically possible to install six Intel 533MHz Celerons, which in total will give us 3200 MHz. Of course, I don’t know if all six processors will work, but the chance is not bad =) How much I did not surf the Internet, but I did not find the implementation of such bizarre ideas. I can find six Celerons without difficulty, but six Powerleap PL-ProIIs are unlikely. I had one such adapter, but I had to sell it in order to implement this project, as well as part of my other exhibits (( Therefore, if someone has one, or they know where to find it for responsible money, write to me in the discussion of this article or to my e-mail: max1024@tut.by (perhaps they could be recreated?)

We continue to fantasize LOL. In turn, if you expand even further the boundaries of imagination, and install another adapter with support for Pentium III processors with the Tualatin core in the Powerleap adapter, then who knows what can happen at all in the output, maybe such a sandwich….

The idea turned out to be interesting, so I do not give up hope that the next part or continuation of this story will someday be published. Perhaps in 2025.

Windows Vista Server

Having now at our disposal six Intel Pentium II Overdrive 333 MHz, which have already acquired support for MMX instructions and have risen one more stage of evolution along the processor ladder, I wanted to try to install an even more modern operating system.

Let me remind you, the last time I was able to install an operating system that was different from the recommended ones: Microsoft Windows NT Server 4.0 Enterprise, Microsoft Windows 2000 Advanced / Datacenter Server, which fully prevent you from running programs and tests written for the beloved Windows XP. As a result of lengthy experiments, we got such an OS: “Windows .Net Enterprise Server. Build 2600 Service Pack 2 ”, which is a semblance of a server operating system with a Windows XP kernel.

This time I wanted to raise the bar even higher and aimed at the family of operating systems based on the Windows Vista kernel. The ideal option was Windows Server 2008 Enterprise Edition (x86), but at first I decided to try installing Windows Server 2003 Enterprise Edition. Although it was written on paper about a hardware error in the core of the CPU of the Intel Pentium Pro family of processors and earlier Pentium II’s and the related problem of memory “leak” and the inability to work in the SMP mode of the above processors, I decided to check this in practice.

The result of this experiment is predictable – six Intel Pentium II Overdrive 333 MHz will not work as part of the ALR Revolution 6×6 running the Windows Server 2003 family of operating systems. Only one CPU is visible. And it’s a pity, this is how one hardware error puts an end to the happy future of such interesting processors.

The next step was to install Windows Server 2008 Enterprise Edition. For many parameters of the minimum configuration, ALR Revolution 6×6 met those requirements needed for the successful installation of this operating system. The installation process began safely, files began to be copied from the DVD-ROM’a to the SSD.

But after reboot I saw a window like this:

Again ACPI rears its ugly head … Saying that my config almost met all the minimum requirements, I did not mention that starting from Windows Vista all the kernels of this and subsequent operating systems are ACPI compatible, in other words, nothing will work without hardware ACPI. And the answer here lies in the BIOS of the  ALR Revolution 6×6, which was released long before the advent of ACPI.

But still there is a chance of installation, but this requires intervention in the BIOS code, but unfortunately I still can’t get a BIOS programmer. Back in the days of Socket 7, when the first revision of ACPI began to appear, motherboard manufacturers released new versions of their BIOS with support for this technology. I went through this as an example of an Asus P5A motherboard on the Ali ALADDiN V chipset for Socket 7 processors, when the ACPI BIOS Revision 1006 was released. This made it possible to install Microsoft Windows 7 x86 on this motherboard with an AMD K6-2+ processor.

An alternative solution to this problem was to look for early builds of Windows Vista Server. The initial project of this development was called “Longhorn”.

The image of this OS was found on the Internet (of course), burned to DVD and the installation process begin:

Everything went as usual, the files were copied, but upon completion of the copying process and reboot, the same error with ACPI was waiting for me.

Again, having spent a fair amount of time, I decided that I would start searching for the kernel of an operating system without ACPI support in earlier versions of Windows Vista or the Longhorn project. Perhaps they exist. If any early build is installed, then it will be easier with the implementation of SMP support. I tried different builds: 4042, 5098, as well as beta’s of the 2nd version. It should have turned out like this:

But the success of this event still ended with ACPI support from my test system. All tested builds still required ACPI support. As a result, I put this idea into a long drawer and decided to conduct all the tests on a proven Windows XP-like OS, where six Intel Pentium Pros felt great. To solve this problem, one head is not enough, so valuable ideas can be written in the discussion of this article, do not be shy 😉

Test system and test results

The test bench will include processors:
• 6x Pentium II Overdrive 333MHz L2=512 Kb
• 6x Pentium Pro 200MHz L2=1024 Kb
• 6x Pentium Pro 200MHz L2=256 Kb

Motherboard:
• Unisys Aquanta HS6 (10140) chipset «Intel 450GX» (6x Socket 8);

Video card:
• PNY GeForce2 MX400 PCI 64Mb (Forceware 93.21);

SSD:
• Kingston SSDNow V300 (60 Gb).

Performance testing was carried out in the “Windows Whistler .Net Advanced Enterprise Server, Build 2600, Service Pack 2, 3 in 1” author’s edition using the following software:
• Super Pi mod. 1.5XS (1M task)
• PiFast v.4.1
• wPrime v.1.43
• HWBOT Prime v.0.8.3
• CPU-Z v.1.87.0
• WinRAR x86 v. 5.40
• 7-Zip v.16.04
• AIDA64 5.50.3600
• SiSoftware Sandra 2004 SP2
• Cinebench 2003
• Cinebench R10

Tests

To start, a couple of single-threaded tests: Super Pi (1M task) and PiFast.

Super Pi mod. 1.5XS (1M task)
Minutes (less is better)

If we compare the performance of the fastest Pentium Pro with a clock frequency of 200 MHz and a 1M L2 cache then replacing one Pentium II Overdrive 333 MHz gives an additional one third of the performance. And if the number of such processors is the same as in ASCI Red – 9632 pcs., Then it turns out almost 3 million percent, if I calculated everything correctly.  You can see that the L2 cache size helps some but mostly this is a pure clock speed/architecture test.

PiFast v.4.1
Seconds (less is better)

In this test, the previous performance growth dynamics between the Pentium Pro and Pentium II Overdrive are preserved. Although this test loves the processor clock speed more than the cache size, even so, the overdrive gap from the 400 MHz Celeron turned out to be not very large. I really want to install six such Celerons in this system.

wPrime v.1.43

The first test that supports multithreading. For this article, I decided to measure the performance of not only six Intel Pentium II Overdrive’s, but also see what five and four processors are capable of, since the system allows even odd configurations to be used and scales well.

Seconds (less is better)

The performance criterion turned out to be six to four, the performance of six Pentium Pros corresponds to four Pentium II Overdrive, more precisely, overclocking Pentium Pro up to 233 MHz. Six “overdrives” have come off enough to match the performance of four server Xeon clocked at 400 MHz, or their performance is equal to one AMD Athlon XP with a PR rating of 2100+ and a frequency of 1733 MHz, released in early 2002. It took a little less than four years  for an ‘ordinary’ processor to match the performance of the 6x Overdrives.

Also of note adding a CPU (from 5 to 6 Overdrives) results in a very linear performance increase.  The ALR has very little overhead in handling the addition of processors.

HWBOT Prime v.0.8.3
Total score (more is better)

If in the past, the performance of a pair of gigahertz Intel Pentium III Xeon was something fantastic, now, 6 overdrive even managed to outperform this pair A slightly overclocked (by 5%) representative of a 64-bit new school – AMD Athlon 64 3800+ on Socket 939 is only slightly faster  despite the technological abyss between them.  Adding processors here resulted in less gains then in wPrime.

WinRAR x86 v. 5.40;
Kb/s (more is better)

The memory subsystem from the upgrade has not changed, all of the 66 MHz Fast Page Mode memory is used, but the numbers nevertheless increased due to brute processor power.

7-Zip v.16.04 (dictionary size 32 Mb);
Total score in MIPS (more is better)

Here again we see the effect of 6 to 4 or parity in the performance of 4 “overdrive” to six Pentium Pros. The slower memory subsystem interferes with archiving with more modern opponents, if it could be overclocked to 75 MHz …perhaps in the future.

AIDA64 5.50.3600
I present to you the results in this test package of six Intel Pentium II Overdrive 333 MHz.

And my favorite test is Cache and Memory Benchmark. See how the speed of the caches of both processors has increased. From left to right: Pentium II Overdrive 333 MHz and Pentium Pro 200 MHz (L2 = 1024 Kb).  Interestingly the L2 Cache write speed is nearly 25% faster on the original Pentium Pro, similarly its latency is better as well.

AIDA64 5.50.3600
CPU Queen, score (more is better)

The 6x Overdrives beat a 2.8GHz Pentium 4.  The P6 architecture was faster then the Netburst resulting in clock for clock performance gains.

AIDA64 5.50.3600
FPU Julia, score (more is better)

AIDA64 5.50.3600
FPU VP8, score (more is better)

Both of these FPU tests are less dependent on multiple cores, significantly impacting the ALRs score.  The Pentium III FPU also was greatly enhanced (with the addition of SSE amongst other things) which is readily apparent here as both the VP8 and Julia tests are heavily optimized for these

SiSoftware Sandra 2004 SP2
Arithmetic benchmark, MIPS (more is better)

SiSoftware Sandra 2004 SP2
Multi-media benchmark, it/s (more is better)

At least in the Integer test the 6x Pentium II Overdrives do well, the Multimedia test, being more FPU heavy, favors the PIII core, but at least we can say we beat an a quad Itanium?

Now we get to the most popular multi-threading number of crushing tests – Cinebench!
Cinebench 2003
points (more is better)

To the question, how many cores and which are better for rendering. There is a Pentium III-S 1400 MHz behind the Tualatin-S core, Socket 370, which is nearly as fast as the ALR The dual Sot 1 Intel Pentium III EB 933 MHz are quite a bit faster.  Clock speed (total available clock speed) as well as architecture matter a lot here.  Though you can easily see the weakness of the P4 core.

Cinebench R10
points (more is better)

 

Interesting numbers, isn’t it? You can try to find this test and look at your result. The final rendering of the previous system with six one-megabyte Pentium Pro with a frequency of 200 MHz was completed in 21 minutes and 14 seconds. Overclocked six cores to 233 MHz reduced this time to 18 minutes and 13 seconds, and for six Pentium II OverDrive it took 13 minutes and 32 seconds. The advantage is 4 minutes 41 seconds, and if we multiply this time by the entire number of processors in the ASCI Red supercomputer, we get 31 days full time 24/7 or 1/12 of a year of time savings, and this is already a tangible figure.

In the last article, I compared the six Pentium Pros in this test with the Intel Core i7-7800X, which rendered the final image in 20 seconds. This time it became interesting to me, and in how many seconds will the modern TOP from Intel – Core i9-10980XE be able to do this? I found a man, who owns this processor and he agreed to help me with the numbers and completed the tests. Now you can find out these final figures. So, with the default settings, the test was completed in 11 seconds, and when overclocking all 18 cores to 5 GHz in nine! And although the Cinebench R10 supports only 16 threads, you can still imagine the difference when you had to wait for a few hours on the desktop PC and literally a few seconds now to complete the same task.

And in the form of a small bonus, I will give the results of the integrated CPU-Z test:

Conclusion

It is time to make a conclusion. No doubt the ALR Revolution 6×6 and similar systems are fantastic. It’s even interesting to use such a machine at home. On one processor, you can hang the server of some network game C&C, StarCraft or Counter-Strike, for example, on the other the client of this game will be launched, on the third second, on the fourth will play mp3 in Winamp and there will still be a couple of free kernels that you can always something to load in the background. Two or four gigabytes of RAM should be more than enough for these and other tasks.

So far I have only one problem, what should take up 8 free PCI slots? Ha-ha )

For its time, such performance was unattainable for most organizations because of the ultimate cost of such systems. But the most interesting thing is that since the late 90s of the last century, progress has been rapidly gaining momentum and literally after 4-5 years, single-core processors for home use, costing hundreds of times cheaper, skipped this monster.

What we have now is not necessary to explain. The progress in the past 7 years has slowed significantly, however, since the “return” of AMD to the people with the brand name “Ryzen” and the corporate “Epyc” the process has revived significantly. And for this we cannot but rejoice. Perhaps in a couple of years the Cinebench R10 test will be executed on the nex gen (no no not THAT NexGen) processor in 1 second, then we can assume that the future has already come =)

I don’t want to put an end to this experiment, as long as there is room for striving, I will try to implement it, although it becomes more and more difficult every year, but I’ll come up with something…There is yet the possibility of faster RAM, overclocked Overdrives, or the elusive 6-way Celerons, or perhaps ACPI compliant BIOS.

Posted in:
Boards and Systems

7 Responses to Chapter 2: Mini-Mainframe at Home: The Story of a 6-CPU Server from 1997

  1. fogus

    I bought one of these in 2002 as a fun project to work on from time to time. The computer was not in operating condition and took me a year (in between other festivities) to get up and running again. Even then, years after its heyday, it felt like I had control over something powerful and primal. There was a fringe industry back then of folks in the web resurrecting these beauties, running benchmarks, and posting their pictures. I’m not sure that anyone has ever run anything other than benchmark programs on these things for the past 20 years. Thank you for this trip down memory lane!

  2. kls0e

    beautiful project 🙂 did you try to boot debian as well? i think it’d make a great linux server. do you have any numbers about power consumption?

  3. Max1024

    Thanks for the feedback)) I have installed Microsoft operating systems for now. Linux, BeOS has not tried it yet, but is still to come. For power consumption, the maximum figure on the meter I saw 295 watts.

  4. Liam Proven

    Interesting experiment! Good work! 🙂

    I too would be interested to see how well BeOS could work on that… or yellowTab Zeta, the final iteration of the original BeOS codebase.

    Something else interesting would be Solaris. Before proprietary *nix sank into obscurity, the smart observers commented that Sun’s multiprocessor scalability was the best in the industry.

    It would be fascinating to see some multicore server benchmarks between OpenIndiana or Illumos and Linux on that.

  5. rm

    Just wanted to note these all are pretty hard to read. The language feels like it was written in colorful and long-winded Russian, then taken and translated word-for-word into English, without much restructuring, if any at all. (Saying this as a Russian native speaker also proficient in English).

  6. rm

    …well, maybe just for me, due to knowing both languages, reading one disguised into the other feels weird. Maybe for English native speakers it’s all fine.

  7. admin

    Because you know both its probably a lot more apparent lol, Maxsim writes these in Belorussian, then translates them to English (Google Translate I think?) Then I go through and edit them, trying to fix various translation issues and generally clean up the grammar. It’s not perfect for sure, but its pretty readable for someone who doesn’t know Russian at all (me haha).

Leave a Reply