September 5th, 2014 ~ by admin

MasPar: Massively Parallel Computers – 32 cores on a chip

MasPar PE3232 - 32 12.5MHz 32 bit Processing Elements - 1992

MasPar PE3232 – 32 12.5MHz 32 bit Processing Elements – 1992

In the 1980’s DEC researchers were designing a supercomputer based on the Goodyear MPP from 1983.  Jeff Kalb was in charge of the division of DEC involved in this work.  The original Goodyear MPP wa based on a 1-bit processor element (PE).  DEC increased that to a 4-bit PE as well as increased the connectivity between PE’s.  When DEC decided to not commercialize the supercomputer design Kalb left (with DEC’s blessing) to start a company of his own that would.  Thus the creation of MasPar in 1987.

MasPar derives its name from the product it sought to create, a Massively Parallel supercomputer.  These type of computers, also referred to as vector processors are SIMD machines, Single Instruction, Multiple Data.  They perform the same operation on a very large set of data.  SIMD instructions are now found on most all desktop processors, where they can greatly speed up processing of multimedia.  In the late 1980’s there was several companies making such MPP computers.  Perhaps the most famous was Cray, but there was also Thinking Machine’s Connection Machine, Intel’s Paragon (i860 based), nCUBE’s hypercube, Meiko Scientific’s CS-1 (Transputer based) and several others.  Such systems cost from upwards of $100,000 each so sales were not vast, typically companies sold a few hundred to a few thousand systems.

MasPar’s first design, the MP-1 was based directly on the research done at DEC.  Each processing element contained a 4-bit ALU, a 1-bit logic unit, a 64/16 (mantissa/exponent) unit for handling floating point.  Each PE also had 48 32-bit registers.  There were designed as a 32-bit RISC processor, which means, that with the 4-bit ALU, any ALU operation would take at least 8 cycles.  This was considered acceptable in a MPP type system.  Each custom VLSI CMOS MP-1 chip contained 32 individual PE’s.  They were made on a 1.6u process and contained 400,000 transistors.  Clock speed was a fairly low 12.5MHz but this allowed the chips to be air cooled with no special cooling systems.   They were packaged in an inexpensive 208 PQFP, nothing special needed due to the low heat dissipation.  A 1024 PE board (32 chips) dissipated only 50 Watts, and an entire 16k processor system dissipated less than 1,000 watts.

MasPar MP-2 Board - 1024 Processor Elements dissipating 50W total

MasPar MP-2 Board – 1024 Processor Elements dissipating 50W total – 32 PE chips and 3 router chips along with 192 DRAM chips.

The PE chips were very much like an independent processor except that they lacked the ability to fetch instructions.  Instructions and data were assigned to them by another system processor called the ACU (Array Control Unit). The ACU was similar to the PE but included the logic needed to fetch and decode instructions.

First silicon was finished in 1989 and a working system was made in 1990.  The MP-1 system was of course scalable.  Systems could have from 1024 processors (32 32-PE chip on one board) to 16,384 PE’s, using 16 boards.  By 1992 MasPar had sold over 130 MP-1 systems, at a starting price of $150,000 for the 1024 PE system.

In 1992 the MP-2 was announced.  It was largely the same as the MP-1, with some important upgrades, the ALU had been upgraded to a full 32-bits.  This did two things, it greatly increased performance, and it did so with no change in code, the MP-2 was binary compatible with the MP-1.  This compatibility is very important when most of the software running on a MasPar is custom wrote.  A client could upgrade their performance, with no change to their software. The MP-1’s 48 32-bit registers had been upgraded to 64 32-bit registers as well.  The MP-2 chips were made on a 1 micron process, and now contained 950, 000 transistors.  The process shrink allowed them to dissipate a similar amount of heat at the same, 12.5 MHz, clock.

MP-2 PE Diagram

MP-2 PE Diagram

The MP-2 VLSI design was done by Won Kim, with the goal of an affordable, well performing design.  Keeping the clock rate down greatly simplifies the design, allowing for a smaller die (1962 mm), and greater yields. Each PE element took up about 3% of the die with associated extra logic taking up the remaining 4%. The registers (64Kbit of SRAM) consumed 20% of the die area, and 50% of the transistor count.  Each chip dissipated just 0.8W.  Each 32 PE chip shared 6 DRAMs which were connected directly to the the chip, no glue logic was required, again simplifying the design.

The price of a MP-2 base system was $260,000 and $1.6 million for a fully equipped 16k system.  The MP-1 continued to be sold, at a reduced price of $75,000, which wasn’t much more then that of a high end workstation at the time. DEC worked as a second source for MasPar selling the systems as well as maintaining them.  By 1996 demand for MPP supercomputers had dropped off and MasPar exited the hardware market.  In there 6 years of sales they sold just over 200 systems.


2 Responses to MasPar: Massively Parallel Computers – 32 cores on a chip

  1. Blase

    Hey I have a 1992 speech system and based on what I saw when I took it apart this board looks identical to the one in the machine. I put it back together perfectly and is functional

  2. Blase

    I just don’t know what this is. So far this is the only website that has given me hope on what this system could be

Leave a Reply