September 5th, 2014 ~ by admin

MasPar: Massively Parallel Computers – 32 cores on a chip

MasPar PE3232 - 32 12.5MHz 32 bit Processing Elements - 1992

MasPar PE3232 – 32 12.5MHz 32 bit Processing Elements – 1992

In the 1980’s DEC researchers were designing a supercomputer based on the Goodyear MPP from 1983.  Jeff Kalb was in charge of the division of DEC involved in this work.  The original Goodyear MPP wa based on a 1-bit processor element (PE).  DEC increased that to a 4-bit PE as well as increased the connectivity between PE’s.  When DEC decided to not commercialize the supercomputer design Kalb left (with DEC’s blessing) to start a company of his own that would.  Thus the creation of MasPar in 1987.

MasPar derives its name from the product it sought to create, a Massively Parallel supercomputer.  These type of computers, also referred to as vector processors are SIMD machines, Single Instruction, Multiple Data.  They perform the same operation on a very large set of data.  SIMD instructions are now found on most all desktop processors, where they can greatly speed up processing of multimedia.  In the late 1980’s there was several companies making such MPP computers.  Perhaps the most famous was Cray, but there was also Thinking Machine’s Connection Machine, Intel’s Paragon (i860 based), nCUBE’s hypercube, Meiko Scientific’s CS-1 (Transputer based) and several others.  Such systems cost from upwards of $100,000 each so sales were not vast, typically companies sold a few hundred to a few thousand systems.

MasPar’s first design, the MP-1 was based directly on the research done at DEC.  Each processing element contained a 4-bit ALU, a 1-bit logic unit, a 64/16 (mantissa/exponent) unit for handling floating point.  Each PE also had 48 32-bit registers.  There were designed as a 32-bit RISC processor, which means, that with the 4-bit ALU, any ALU operation would take at least 8 cycles.  This was considered acceptable in a MPP type system.  Each custom VLSI CMOS MP-1 chip contained 32 individual PE’s.  They were made on a 1.6u process and contained 400,000 transistors.  Clock speed was a fairly low 12.5MHz but this allowed the chips to be air cooled with no special cooling systems.   They were packaged in an inexpensive 208 PQFP, nothing special needed due to the low heat dissipation.  A 1024 PE board (32 chips) dissipated only 50 Watts, and an entire 16k processor system dissipated less than 1,000 watts.

Read More »