Archive for the 'Boards and Systems' Category

August 14th, 2019 ~ by admin

How to 386 Your AT: Intel Inboard 386/AT

With the release of the 32-bit Intel 386 processor in 1986, owners of IBM PC/XT and AT type systems (8088 and 80286 systems) were left a bit in the dust.  This was a concern (or opportunity) for Intel as well. They designed an upgrade solution at the same time as the 386, to be able to be used in the now obsolete computers.  This was the Intel InBoard 386 series of upgrade cards.

InBoard 386 AT with 1MB of RAM and 80287 FPU Option (very unusualy on a late model Inboard, this one from 1990, but the FPU is from 1986)

The InBoard, as its name implies, was a internal 16-bit ISA card that was used to upgrade these systems.  It included a 386DX processor running at 16MHz, 64K of cache, and (optionally) 1-3MB of additional RAM.  Two version of the board were made: the PC/XT version was designed for 8088 processor based systems, and the AT version was for the 286 systems.  These boards required the removal of the original processor, and then a cable was ran from the old CPU socket, to the the InBoard 386 board.  On system start up the original BIOS booted the system, and loaded the DOS operating system.  The config.sys file would then call on the drivers to load the InBoard 386 specific features.  The original system was essentially unaware of the new processor, instructions were executed by the InBoard transparently.

Flat Ribbon Cable used for connecting the board to the old CPU socket. If the cable could not reach the socket, your system was not compatible. Cable length was restricted by signal timing, rather then the common complaint of Intel being ‘stingy’

Early AT systems used a 6MHz CPU and ISA bus speed, so Intel provided a 8MHz crystal to replace the original on the motherboard. This ensured the ISA bus that the InBoard used to communicate with the original memory and peripherals ran fast enough and did not become such a huge bottle neck.   The base model InBoard did not come with any RAM, it could use your existing system RAM just fine.  Adding RAM, however, was a worthwhile upgrade.  The Board itself supports 1M (36 100ns 256 kbit chips, including parity) and a daughter card could add another 1M or 2M.  This RAM was accessed via the 80386s 32-bit address bus so was much quicker.  It also was a single wait state access.  You could configure the InBoard to backfill (take over for) your existing system RAM, at least down to 256K, so that the computer would only use the first 256K of the slower RAM before moving to the RAM on the InBoard.  If your system had 512K of RAM you would ‘waste’ half of it but at the benefit of much faster access times.  The Inboard 386 had another trick up its sleeve to improve speed…

Read More »

June 12th, 2019 ~ by admin

Xeon Overclocking: Making Gallatin Gallop

This article is part of The CPU Shack’s continued partnership with guest author max1024, hailing from Belarus. I have provided some minor edits/tweaks in the translation from Belorussian to English.

If you still remember the times of the Pentium 4 running on Socket 478 with the Northwood, Prescott and Gallatin cores, then you should remember what about these processor cores were different from each other. Northwood was fast like a mountain doe due to a shorter 20-stage pipeline that allowed it to perform many operations very quickly without tremendous losses due to branch mis-predictions etc. , but inferior to Prescott frequency potential in overclocking, which in turn was as strong as a buffalo, due to twice the L2 cache memory(1M vs 512K) and finer tech process (90nm vs 130nm). But like any hoofed animal, it was not agile, to achieve the higher clock speeds its pipeline was extended to 31-stages, resulting in some cases, clock for clock out performing Northwood, But doing so at the expense of much heat.

A separate niche in the food chain was occupied by “Gallatin”, which combined the properties of the two previous iterations, a shorter 20-stage pipeline, with the high clock speed of the Prescott, but in its arsenal it also had a very formidable weapon, which was the presence of an additional L3 cache of 2 MB. The price of ownership of this “beast” was high, and in the literal sense of the word, it was equal, like any other representative of the Extreme Edition series – $ 999. I resisted this extreme processor, choosing  hero from AMD, the FX-51, which I consider to be one of the most outstanding processors of all times and peoples.

Xeon Universal Chip Analyzer by x86.fr

What could be better, cooler or faster? I’ve been looking for an answer to this question for a long time, until I became acquainted with the Intel Xeon server processors on Socket 604 and in particular with processors based on the Prescott 2M core, which have twice the cache size compared to their desktop counterparts and can run on ASUS production boards.

As everybody knows, it is the advanced desktop flagships of both processor manufacturers that originate from the server segment. So from the Opteron’s turned out the AMD Athlon FX-51, and from the Intel Xeon MP – the Pentium Extreme Edition. This parity of events has been preserved until now.

Xeon Gallatin MP

The server representatives of Intel Xeon processors on the Gallatin core are divided into two branches: Xeon MP (Gallatin) and simply Xeon (Gallatin). The differences are in the number of simultaneously supported processors in the system. So Xeon MP supported running up to four processors  usual Xeon could be installed in servers only in pairs. There is also a difference in steppings of the processor core itself. Let me remind you that the desktop version of “Gallatin” were the M0 stepping, just like the regular Intel Xeon series.

The Xeon MP line, by contrast, is based on an earlier stepping from A0 to C0. Among the representatives of M0 stepping, you can find four Xeon models (Gallatin) with 1M of L3 cache, with frequencies from 2.4 GHz to 3.2 GHz, and one model with a doubled  L3 cache to 2 MB, pretty much the same as a Pentium 4 Extreme Edition. This model gave rise to the first “extreme” Pentium.

Read More »

January 18th, 2019 ~ by admin

Part 4: Mini-Mainframe at Home: Benchmarks and Overclocking

Part 4 of the Story of a 6-CPU Server from 1997.  In this final section we will first explore (briefly) the theory of running a 6-CPU SMP system (with processors designed for 2 or 4 way) and then move to benchmark the system and overclock it.

For the background of the ALR 6×6 and Pentium Pro processors that form the basis of this project please see:

Previous Parts of the Series

Part 1: Mini-Mainframe at Home – Introduction
Part 2: Mini-Mainframe at Home: Installing a Modern OS
Part 3: Mini-Mainframe at Home: The ALR 6×6 Hardware and BIOS

Features of the architecture and operation of the six CPU

So, as the server was originally shipped with six Pentium Pro “Black” processors, I decided to add six Pentium Pro “Gold” processors with a frequency of 200 MHz and a 256 KB L2 cache for contrast. Such a volume is just four times smaller, and at the same time it will be interesting to check the effect of the cache in such a volume: six megabytes versus one and a half.  But before starting the tests, I will focus on the principle of interaction of six processors in this system. To overcome the limitations of Intel on building a system with more than four processors, ALR engineers with the support of Unisys suggested using an inter-processor interaction scheme using arbitration:

The theory behind this architecture is as simple as it is powerful. Inside new six-way systems are two Tri-6 CPU cards, A and B (Figure 1). Each of these cards is an independent, three processor ready SMP bus, complete with all logic Active CPR processor protection, and auto-recovery technology built on each CPU card. These two Tri-6 CPU cards are then plugged into a 64-bit parity SMP bus. This design keeps the processors closely coupled, just like a parallel bus architecture, without the related heat and design problems. A separate four-way interleaved memory card is attached to the bus, supporting a sustained data bandwidth of 533-MB per second. This bandwidth is ample to support two full PCI buses as well as an EISA bus bridge.

To overcome the logical limitations of the Pentium Pro chip, six-way servers use a unique expanded bus arbitration configuration referred to as Dynamic Orchestration. The best way to understand how this system works is to compare it to a typical four-way SMP architecture. On a four-way system, bus arbitration is implemented in a “round robin” fashion. That is, each processor has equal rights to the bus, and access is handled in an orderly fashion. For example, if all processors needed access to the bus, CPU 0 would gain access first, followed by CPU 1, CPU 2, CPU 3, and then back to CPU 0. If CPU 2 was executing a cycle, and both CPU 3 and CPU 1 requested use of the bus, control would first pass to CPU 3, before cycling back to CPU 1.

For purposes of this four-way arbitration, processors are identified using the two-bit ID code. The six-way solution borrows this convention, with some important modifications. Within each Tri6 CPU card, individual processors are identified using the two-bit ID code. This yields four possible combinations, although only ID codes 0 through 2 are needed. A chip on each Tri6 card handles the arbitration, following the “round robin” scheme found in a four-way system. In this case, however, the fourth processor has been replaced by a sort of “phantom” processor that actually represents the other Tri6 card:

The figure above shows the six-processor scheme of the server board ALR Revolution 6×6 and its clones. Thanks to this approach, the appearance of 8, 10 and more processor systems has become possible.

Building a chessboard from various models of Pentium Pro, I thought that I could not find a larger processor. Even the 32-core AMD Threadripper 2990WX next to the Intel Pentium Pro does not seem so big.

However, The CPU Shack sent me this photo. On the left is the engineering version of the Xeon Gold 6142 on the LGA3647 socket, on the right another engineering version, but already the Intel Xeon’a Phi in the same LGA3647 version. As you can see, the story is back to square one and perhaps all subsequent processors will not be placed on the open palm of the hand. Although the processors in the performance of LGA2066 is still far from Intel Pentium Pro.

Overclocking 6 cores together and separately

Read More »

January 16th, 2019 ~ by admin

Part 3: Mini-Mainframe at Home: The ALR 6×6 Hardware and BIOS

Part 3 of The Story of a 6 CPU Server from 1997 – In this section we’ll learn about the hardware and BIOS that makes the ALR Revolution 6×6 with 6 Pentium Pro Processors work.

For the background of the ALR 6×6 and Pentium Pro processors that form the basis of this project please see:

Part 1: Mini-Mainframe at Home – Introduction
Part 2: Mini-Mainframe at Home: Installing a Modern OS

Exterior and Interior

The size of the case is quite large for the desktop (and it came with wheels, so probably not good  to have rolling about ones desk), but relatively compact for servers of this class. The height of the server is – 68 cm, width – 32 cm and depth – 58 cm. The weight of the server starts from 52 kg. I have a complete server kit, but the case is missing, because, due to its size and weight, the shipping to Belarus would be around $ 400, if not more, so the photos of the appearance were taken from the Internet.
Editor’s Note: The empty case is currently serving as a kitchen counter at the CPU Shack Museum.  Its really THAT big 

The first thing that catches the eye is the information touch! LCD display, the task of which is to display all the information about the status of the six processors, RAM, temperature, status of hard drives and other vital information. Today, such informative displays are the norm, but 21 years ago I even could not imagine that such a thing ever happened. The front of the case also has two compartments, the upper one under 5.25” devices, such as CD-ROM’s, the lower one opened access to the cage with SCSI drives. Behind you can see 14 expansion slots, a cooling system and a cage with power supplies.

To ensure the operation of  the server, two power supplies are needed, which are connected to a special board in the cage. The third power supply unit is a spare one in case of a single power supply failure. It is allowed to install four power supplies with the connection of two pairs to a pair of electrical outlets for complete duplication of all functions providing the server power.

Read More »

January 14th, 2019 ~ by admin

Part 2: Mini-Mainframe at Home: Installing a Modern OS

Part 2 of The Story of a 6 CPU Server from 1997 – In this section we’ll try to get a modern OS running on the ALR Revolution 6×6 with 6 Pentium Pro Processors.

For the background of the ALR 6×6 and Pentium Pro processors that form the basis of this project please see Part 1 of the project.

Part 2: Installing and Using an OS

Before you start installing the OS, you need to select the correct kernel of the operating system. To do this, at the initial stage of installation, press the F5 key.

In this case, we choose – MPS Multiprocessor PC, since the other options simply do not fit, since this server naturally does not support ACPI. In general, I will advise anyone who makes such experiments by choosing a more “modern” OS, which is older than the hardware itself – to turn off ACPI support in the BIOS (if present). This simple action will keep your nerves decent.

Windows Server 2003 R2 Enterprise Edition was installed and, as I wrote above, the system had one working CPU core.

Next, an attempt was made to install the operating system from the operating system itself using the update method, but at the initial stage the Windows Server 2003 Enterprise Edition installer warned me that a multiprocessor configuration not supported by the operating system was used.

But there are many ways to install the OS. Alternatively, I tried the OS “transfer method” with a known-workable SMP configuration. Taking the ASUS P2L97-DS motherboard on an Intel 440LX chipset with a pair of Intel Pentium-II with a frequency of 450 MHz, which should be deprived of a hardware error and chose the “MPS Multiprocessor PC” core, but the installation process did not start at the stage of copying the original files, reaching until installation on the hard disk. At this point, the system hung, not reaching the choice of the installation source. Much has been tried, loops, different drives and RAM, but all to no avail. At this point, a single Pentium-3 was also hanging on the Asus P3B-F motherboard (Intel 440BX chipset).

In the end, I decided to take another board with two SLOT 1 connectors – Asus P2B-D (Intel 440BX chipset) and a pair of Intel Pentium-III. OS Windows Server 2003 R2 Enterprise Edition was safely installed, it remains to transfer it to a six-processor server. As a result, having moved the necessary hard drive, I decided to do the first boot in “safe mode” in order to exclude the influence of different devices of both systems on each other, but as a result I received a BSOD.

Read More »

January 12th, 2019 ~ by admin

Part 1: Mini-Mainframe at Home: The Story of a 6-CPU Server from 1997

Introduction

This article/project is provided in cooperation with guest author max1024, hailing from Belarus. I have provided some minor edits/tweaks in the translation from Belarusian to English.

As part of this project, you will have a unique opportunity to learn about a mini mainframe worth more than a Ferrari, which had enormous power by the standards of 1997, as well as the intricacies of installing a more modern operating system on it and other interesting details. I think that to some readers, the bold name of the super server ALR Revolution 6×6 already says something, and it will be discussed in this article.

Pentium Pro Processor versions – Minus the Overdrive

Alone, it would be simply not realistic for me to translate everything I had planned, without the help of my comrades from the United States, Russia and Great Britain, this project would have remained a project on paper, but their invaluable help would make it possible for the planned and almost forty kilograms of net weight (nearly 90lbs) to go a long way, more than 11 thousand kilometers (6800 miles) for three separate packages. The total distance as a result of which all the parts came together was 30 thousand kilometers (18,000 miles)  – for reference, the circumference of the Earth is 40 thousand km. (~25,000 miles)),  So this work is partly their merit, for which I am immensely grateful.

Editors Note: This ALR 6×6 came from the CPU Shack Museum, having sat in my house for some years. While chatting to Maksim last year he mentioned he would like to find one, so it was clearly meant to be.  You can’t just ship an ALR 6×6 across the world to Belarus, at least not economically, so over several months I disassembled the entire server and shipped it in pieces to a mutual friend in Russia, who then forwarded it to Maksim in Belarus.

Connor Krukosky and his IBM z890

Before embarking on the initial part of the project, I’ll tell you that trying to understand Mainframes and supercomputers  , I realized one thing that it’s quite possible to assemble even a “mini” mainframe at home, as Connor Krukosky did, but also overclocking would be even more interesting.

Studying such computational supermachines, I decided to dwell on systems consisting of Pentium Pro processors, so by installing Windows compatible applications and benchmarks, one could see how much the performance went ahead over the decades. Ideally, of course, it would be nice to get Intel ASCI Red, but I decided to start with its mini version.

Read More »

May 27th, 2018 ~ by admin

Mainframes and Supercomputers, From the Beginning Till Today.

This article is provided by guest author max1024, hailing from Belarus.  I have provided some minor edits/tweaks in the translation from Belarusian to English.

Mainframes and Supercomputers, From the Beginning Till Today.

Introduction

We all have computers that we like to use, but there are also more productive options in the form of servers with two or even four processor sockets. And then one day I was interested, but what is even faster? And the answer to my question led me to a separate class of computers: super-computers and mainframes. How this class of computer equipment developed, as it was in the past and what it has achieved now, with what figures of performance it operated and whether it is possible to use such machines at home, I will try to explain all this in this article.

FLOPS’s

First you need to determine what the super-computer differs from the mainframe and which is faster. Supercomputers are called the fastest computers. Their main difference from mainframes is that all the computing resources of such a computer are aimed at solving one global problem in the shortest possible time. Mainframes on the contrary solve at once a lot of different tasks. Supercomputers are at the very top of any computer charts and as a result faster than mainframes.

The need for mankind to quickly solve various problems has always existed, but the impetus for the emergence of superfast machines was the arms race of well-known superpower countries and the need for nuclear calculations for the design and modeling of nuclear explosions and weapons. To create an atomic weapon, colossal computational power was required, since neither physicists nor mathematicians were able to calculate and make long-term forecasts using the colossal amounts of data by hand. For such purposes, a computer “brain” was required. Further, the military purposes smoothly passed into biological, chemical, astronomical, meteorological and others. All this made it necessary to invent not just a personal computer, but something more, so the first mainframes and supercomputers appeared.

The beginning of the production of ultrafast machines falls on the mid-1960s. An important criterion for any machine was its performance. And here on each user speaks of the well-known abbreviation “FLOPS”. Most of those who overclock or test processors for stability are likely to use the utility “LinX”, which gives the final result of performance in Gigaflops. “FLOPS” means FLoating-point Operations Per Second, is a non-system specific unit used to measure the performance of any computer and shows how many floating-point arithmetic operations per second the given computing system performs.

“LinX” is a benchmark of “Intel Linpack” with a convenient graphical environment and is designed to simplify performance checks and stability of the system using the Intel Linpack (Math Kernel Library) test. In turn, Linpack is the most popular software product for evaluating the performance of supercomputers and mainframes included in the TOP500 supercomputer ranking, which is made twice a year by specialists in the United States from the Lawrence Berkeley National Laboratory and the University of Tennessee.

When correlating the results in Giga, Mega and Terra-FLOPS, it should be remembered that the performance results of supercomputers always are based on 64-bit processing, while in everyday life the processors or graphics cards producers can indicate performance on 32-bit data, thereby the result may seem to be doubled.

The Beginning

Read More »

November 22nd, 2017 ~ by admin

CPU of the Day: DEC LSI-11 Chipset

LSI-11 Chipset with EIS/FIS Chip – 1976-1977

Back in 2014 we discussed the Western Digital WD/9000 Pascal Microcomputer system.  Today we’ll look at the LSI-11 chip set, the basis of the Pascal.  Back in 1974 DEC (Digital Equipment Corporation) contracted Western Digital to design and build a 16-bit chipset to emulate the Bipolar PDP-11/05 Minicomputer.  Western Digital was paid $6.3 million for the work, and would be allowed to market and sell the resulting chipset themselves, as well as grant license to it to others (including DEC).

The LSI-11 was to be a 16-bit chipset, but was based around a 8-bit Data chips (the 1611).  The 1611 has an 8-bit ALU , 26 8-bit registers and a microinstruction register.  This is controlled by the 1621 control chip, which interprets macroinstructions from handles all the timing, as well as interrupts.  The 1621 control chip is what allows the 8-bit 1611 to be used as a 16-bit processor.  The chips are connected by an 18-bit  microinstruction bus, and a 16-bit address/data bus handles access to the rest of the system (memory/I/O).  Each microm is a 512 Word by 22-bit ROM, which can hold 80 instructions.  It is these MICROMs that allow the WD MCP1600 to function as a PDP-11/05.  The instructions in the the MICROMs (2 are required for the LSI-11) emulate the PDP-11 instructions.

DEC M7264 LSI-11 KD11-L Board from PPD- 11/03

First production of the LSI-11 chipset began in March of 1975 with shipments commencing that year.  The PDP-11/03 based on this chipset was released later that year.  The KD-11 M7264 board formed the hear of the 11/03 (as well as other DEC systems).  In typical DEC fashion it came in many flavors with different amounts of memory, as well as different instruction support.  This was completely due to the design of the LSI-11 chipset and its MICROMs .  The basic LSI-11 need 2 MICROMs to handle the basic PDP-11 instructions, the chipset however supported 4.  This mena that more instructions could be added.  One of the most common and useful additions was the EIS/FIS (Extended Instruction Set/Floating Point Instruction Set) microm.  This added 8 more instructions including MUL, DIV, FADD, FSUB, FMUL, FDIV and 2 register shifts (ASH, ASHC).  Adding the EIS/FIS chip to a standard KD-11-F board turned it into a KD-11-L (like the one pictured).

Western Digital 1611 Die –
Pauli Rautakorpi

There were other MICROMs available as well.  This included a set of 2 for support of DIBOL (Digital Business Oriented Language), a DEC language similar to COBOL.  Since the DIBOL chipset needed 2 chips a system could support DIBOL, OR EIS/FIS but not both.  MICROMs were revised as bugs were found, or faster ways of handing an instruction were made.  MICROMs revisions could also be made to support different PCB revisions.  In some ways they played the part of firmware to the PCB, as well as the instruction set for the processor.  In this way many MICROMs are specific to PCB etch revisions and other revisions of the system outside of the processor itself.  Matching the correct MICROMs, as well as Control and Data chips to the correct board is a bt of a task, and take several dozen pages of the LSI-11 maintenance manual.

Here are a few part #s to help sort things out

Data Chip
DEC 1611
Control Chip
DEC 2007C
MICROMs
MICROM 1 3010D/A
MICROM 2 3007D
EIS/FIS 3015D
Notes
21-11549-01 23-008B5-00 STD INST 1
21-15579-00 (1611H) 23-003C4-00 23-007B5-00 STD INST 2
21-16890-00 (1611H) 23-002C4-00 23-003B5-00 EIS/FIS
23-001C3 CP1621B14 23-009B5-00 EIS/FIS
23-001C2-01 CP1621B451 23-001B5-00 CP1631B103 STD INST 1
23-002B5 CP1631B073 STD INST 2
 23-091A5-01 CP1631B153 EIS/FIS
23-004B5 DIBOL 1
23-005B5 DIBOL 2
23-008A5-01 CP1631B-10 STD INST 1
23-007A5-01 CP1631B-07 STD INST 2

DEC M7270 LSI-11 – 1982 – All WD Chips

There are more to be found as DEC and Western Digital made many versions.  In early 1976 Western Digital licensed the MCP1600 chipset design to National Semiconductor, in exchange for some RAM technology licensing.  It is unclear if National actually made any of the MCP1600 chipset.  By 1977 DEC had started to produce the LSI-11 chip itself while continuing to source parts from Western DIgital as well.  It is common to see LSI-11 boards with DEC and WD chips mixed well into 1982.

The popularity of the PDP-11 in the 1970’s resulted in many customers for the LSI-11 based PDP’s, and their use continued well into the 1990’s with many systems continuing to be used today.  As with many such systems, they found use in industrial control and automation, where they continue to work.

March 29th, 2017 ~ by admin

TeraNex: Filling the GAPP

Teranex Piranha TN3260B – 1024 PE Array @ 64-90MHz

The GAPP (Geometric Arithmetic Parallel Processor) was designed in 1981 at Martin Marietta, which later became Lockheed Martin Electronics & Missiles.  It was funding in large part by the US Dept. of Defense as a way to develop technologies for ultra high-speed image processing.  There was a strong need for image processing, in near real time for military applications, in particular pattern recognition.  Being able to process a moving image and match its features to known patterns was very useful for targeting of many weapons system.

The GAPP processor was a massively parallel SIMD (Single Instruction Multiple Data) processor.  SIMD works very well on large sets of data that are processed in the same way.  In the design of GAPP, this data set was the 2D-array of an image, or frame, from a video.  The GAPP is at its core a very large array of simple processors, called processor elements (PE).  Each PE is relatively simple, containing a single bit ALU and registers/memory.  Each PE handles a single pixel of the image/frame, and is connected in a 2-D mesh to its 4 nearest neighbors.  This allows arrays of these PE’s to scale very well.  By 1992 Lockheed had GAPP systems with 82,944 elements and by the 2000’s systems were available with nearly 300,000.

In 1998 TeraNex was formed to commercialize this technology, and in 1998 there was a looming problem in television, one that the GAPP, and newly formed TeraNex were well suited to solve.

Read More »

March 15th, 2017 ~ by admin

MC6801/6803 Expansion Now Available for the 680x/650x Test System

6801/6803 Expansion Board and PCP

After several months of development an expansion for the 680x/650x Test system is now available to support the very popular and widely used 6801 and 6803 MCUs.  The Motorola 6801 was one of the first (with the 6802) MCU’s that Motorola made based on the MC6800 8-bit processor.  It includes RAM/ROM, Serial I/O and timers.  The test board tests the function of the base CPU, the timers/data capture, and the Serial I/O.  The MC6803 is a 6801 without the built-in ROM and with less I/O.

The expansion supports both types as well as their copies/derivatives made by Hitachi, Fujitsu, SGS and others.  The expansion is included in the complete 680x/650x Test system, bringing its total supported processors to well over 35.  The expansion does require updated firmware, which is included in all new systems (and available to upgrade previously sold systems.)