The CPU Shack Museum CPU History Museum for Intel CPUs, AMD Processor, Cyrix Microprocessors, Microcontrollers and more. Fri, 27 Dec 2019 23:21:58 +0000 en-US hourly 1 RIP Chuck Peddle: Father of the 6502 Fri, 27 Dec 2019 22:56:14 +0000

Original MOS 6501 Processor from 1975 – Designed by Chuck Peddle.

On December 15th one of the truly greats of processor design passed away at age 82.  Chuck Peddle, born in 1937, before semiconductors were even invented, designed the 6502 processor back in 1974.  The 6502 (originally the 6501 actually) went on to become one of the most popular and widely used processors of all time.  It powered the likes of the Apple 1, Commodores, ATARIs and hundred of others.  It was copied, cloned, and expanded by dozens of companies in dozens of countries.  It was so popular that computers were designed to use it in the Soviet Union, eventually making their own version (Pravetz in Bulgaria).

Sitronix ST2064B – Based on the 65C02 – Core is visible in the upper right of the die. (photo by aberco)

The 6502 was a simple but useful 8-bit design, which meant that as time went along and processors migrated to 16 32 and 64-bits and speeds jumped from MHz to GHz the venerable 6502 continued to find uses, and be made, and expanded.  Chuck continued to be involved in all things 6502 until only a few years ago, designing new ways to interface FLASH memory (which hadn’t been invented when he designed the 6502) to the 6502.

The chips themselves, now in CMOS of course, continue to be made to this day by Western Design Center (WDC) and the 65C02 core is used in many many applications, notably LCD monitor controllers and keyboard controllers.  We can hope that the 6502 will have as long of life as Mr. Peddle, though I woud wager, that somewhere, somehow , in 2056 a 6502 will still be running.

]]> 0
CPU of the Day: Motorola MC68040VL Fri, 01 Nov 2019 23:12:28 +0000

Motorola MC68040VL

A month or so ago a friend was opening up a bunch of unmarked packages, and taking die photos and came across an interesting Motorola.  The die looked familiar, but at the same time different.  The die was marked 68040VL, and appeared to be smaller version of the 68040V.  The Motorola 68040V is a 3.3V static design of the Motorola MC68LC040 (It has dual MMUs but lacks the FPU of the 68040).  The 68040V was made on a 0.5u process and introduced in 1995.  Looking closely at the mask revealed the answer, in the form of 4 characters. F94E

Motorola Mask F94E – COLDFIRE 5102

Motorola uses mask codes for nearly all of their products, in many ways these are similar to Intel’s sspecs, but they are more closely related to actual silicon mask changes in the device.  Multiple devices may use the same mask/mask code just with different features enabled/disabled.  The Mask code F94E is that of the first generation Motorola COLDFIRE CPU, the MCF5102.  The COLDFIRE was the replacement for the Motorola 68k line, it was designed to be a 32-bit VL-RISC processor, thus the name 68040VL for VL-RISC. .  VL-RISC architectures support fixed length instruction (like a typical RISC) but also support variable length instructions like a traditional CISC processor.  This allows a lot more code flexibility and higher code density.  While this may be heresy to RISC purists it has become rather common.  The ST Transputer based ST20 core is a VL-RISC design, as is the more modern RISC-V architecture.  The COLDFIRE 5102 also had another trick, or treat up its sleeve.  It could execute 68040 code.

Motorola XCF5102PV20A 03F94E – 1995

The COLDFIRE, and the 68040 are microcoded processors, meaning they do not execute the instructions directly, the opcodes are translated in a PLA to the actual instructions that manipulate the flow of data. This is common in processors today and allows greater flexibility.  Its what allows the COLDFIRE to execute 68040 code as well as the new VL-RISC instructions.  In fact, its actually what allowed Motorola to re-spin the 68040V as the COLDFIRE, at its heart the COLDFIRE 5102 is actually a slightly modified 68040V.  It seems that Motorola may have even been thinking about calling it the 68040VL before renaming it COLDFIRE. There are some minor differences however.

The 68040V had dual 4K instruction/Data caches, which in the COLDFIRE 5102 have been reduced to 2K Instruction cache and 1K Data cache (clearly visible on the die).

68040V on the left with clearly larger cache’s and COLDFIRE on the right, smaller caches. Overall very similar designs.

It omits the dual MMU of the 68040V which is less needed in embedded processors (it got added back in in the V4e version a decade later) The COLDFIRE retains the 6-stage pipeline of the 68040 but uncouples the Instruction Fetch and Decoding stages, allowing for somewhat faster processing.  The Register structure is also the same, with 8 Data Registers, 8 Address Registers and a PC.   The COLDFIRE instruction set is actually a subset of the 68040, most 68040 code will run on a 68020 or higher.  For later versions of the COLDFIRE the opposite is not true, but in the 5102 the additional 68040 instructions are supported, to allow easier transition to the platform.

Motorola MC68040RC25V

By the 1990s the 68k line was getting a bit tired, and increasing competition was making it less relevant and competitive.  Motorola’s quick update to the design, made possible by good engineering and microcoding allowed them to make a ‘new’ product and compete again in the 32-bit embedded market.  The complete renaming of the design to COLDFIRE from 68040VL helped market it as ‘new’ and certainly COLDFIRE is a cool sounding name for a product that had grown cold and needed a bit of reheating.

Thanks to my friend aberco for sending me down this rabbit hole with his nice die photos

]]> 0
The Forgotten Ones: RISCy Business of Winbond Mon, 07 Oct 2019 22:05:38 +0000

Winbond W77E58P-40 – Your typical Winbond MCS-51 MCU

Winbond Electronics was founded in Taiwan back in 1987, and is most widely known for their memory products and system I/O controllers (found on many motherboards of the 1990s).  They also made a wide variety of microcontrollers, mostly based on the Intel MCS-51 core, like many many other companies have and continue to do.  They also made a few 8042 based controllers, typically used as keyboard controllers, and often integrated into their Super I/O chips.  So why do I find myself writing about Winbond, whose product portfolio seems admittedly boring?

It turns out, that once upon a time, Winbond decided to take a journey on a rather ambition path.  Back in the early 1990’s they began work on a 32-bit RISC processor, and not an ARM or MIPS processor that were just starting to become known at the time, but a processor based on the HP PA-RISC architecture. This may seem a odd, but HP, in a shift form their previous architectures, wanted the PA-RISC design to be available to others.  The Precision RISC Organization was formed to market and develop designs using the architecture outside of HP.  HP wanted to move all of their non-x86 systems to a single RISC architecture, and to help it become popular, and well supported, it was to be licensed to others.  This is one of the same reasons that made x86 so dominate in the PC universe.  More platforms running PA-RISC, even of they were not HP, meant more developers writing PA-RISC code, and that mean more software, more support, and a wider user base.  Along with Winbond, Hitachi and OKI also developed PA-RISC controllers.  Winbond’s path was innovative and much different then others, they saw the need for easy development as crucial to their products success, so when they designed their first PA-RISC processor, the W89K, they made it a bit special.

Original Winbond W90210F Development board

In 1994, most everyone had a Intel 486 based computer, so Winbond decided to make the W89K 486DX compatible, electrically and logically, this allowed many existing boards to be used as development systems.  Replace the 486DX processor with he W89K and replace the BIOS with a Winbond one, and instant development system.  The system hardware (RAM, PCI slots, etc are agnostic about what CPU is talking to them, so this is easier then it sounds, and much more so back in the 1990’s then today.

The W89K was made on a 0.8u double metal CMOS process and ran at up to 66MHz (clock doubled version using the standard 33MHz 486 bus) with 1.1 million transistors .  It implemented PA-RISC V1.1 with a 5-stage pipeline and 2K each of instruction and data caches, both fully associative.  It was designed with only the integer unit (no FPU) as a way to reduce die size and cost.  This was considered acceptable as it was targeted as a high end embedded controller (for use in things like printers). They did support a L2 external cache as well, something that was unusual for the PA-RISC.   Performance was around 89 DMIPS (Dhrystone MIPS for the 66MHz part.

Winbond W90210F 66MHz PA-RISC 0.8u CMOS – 4K instruction cache is in the upper left, while the smaller 2K Data cache is in the upper right (die shot by aberco)

The successor to the W89K was the W90K family which was developed in 1997.  The first processor of this family was the W90210F.  It originally was going to be still called the W89K family but Winbond decided in late 1987 to called it the W90K, due to its greatly improved design compared to the W89K.  The 90K maintained the same PA-RISC core as before but added a host of peripherals to increases its usefulness as an embedded controller.  These included embedded ROM/Flash interfaces, a DRAM controller, a DMA controller and various timers/counters.  It also added the 5 PA-RISC multimedia instructions  (MAX-1).  These were some of the very first SIMD instructions added to general purpose processors (originally designed for the PA7100LC).  Intel added similar support to the Pentium as ‘MMX’.  The W90210 also changed the cache structure.  The L1 Instruction cache continued to be direct mapped but was increased from 2K to 4K.  The Data cache remained 2K but was now 2-way set associative.   Clock speed remained the same at 66MHz.  A W90215F version was also made, that did not come with an embedded OS license (write your own).  These were used in a number of printers, set top boxes and digital picture frames back in the late 1990s.

W90221X – 100MHz with hardware MAC and SDRAM support. and built in 2-D Graphics Maintained the same package as the W90210F to simplify designs

In 1991 the last versions were released.  These were the W90220 and W90221.  These both had some big improvements over the previous design.  They were made on a CMOS triple layer 0.35u process allowing clock speeds of up to 150MHz (this appears to be the design goal, actual devices may have topped out at 133MHz in practice).  A Multiply Accumulator unit was added allowing for DSP like functionality and the pipeline was increased to 6-stages (a Load/store unit) which helped achieve the faster clock speeds.  Both caches were now 4K, with the instruction cache still being direct mapped, and the Data cache being 4-way set associative.  It also is the first of the line to support hardware branch prediction.  These were 3.3V parts with 5V I/O.

Digital picture Frame based on a PA-RISC processor (The W90215F)

The W90220 added a 2D graphics controller to the system as well as support (finally) for SDRAM.  Two versions of the 90220 were made.  The W90220F which was to hit 180MHz and support SDRAM and EDO RAM, and the cost reduced W90220X which was limited to 80-100MHz, had less I/O and no EDO RAM support.

By the early 2000’s the W90K PA-RISC processor was dead, its a rather unfortunate end to an ambitious project, and a processor that really had great potential and performance for its time.  In researching these processors it seemed that one of the reasons the processor failed was poor support from Winbond.  Its ironic that a processor designed for easy development and interfacing with existing PC peripherals would be hindered by poor tech support from the manufacturer but that appears to be the case.  Also contributing to its demise was the fall of PA-RISC itself, by 2000 H was ‘all in’ with Intel on PA-RISC’s successor, the IA-64 architecture Itanium processor and we all know how that turned out.  It’s perhaps interesting then that laser printers and digital picture frames are powered by a processor that evolved into what was suppose to be the next great Intel architecture, but now, in a twist of fate, is itself a wall hanger

]]> 0
The Story of the IBM Pentium 4 64-bit CPU Tue, 01 Oct 2019 23:07:09 +0000 Introduction

This time we will talk about one unique Intel processor, which did not appear on the retail market and whose reviews you will not find on the Internet. This processor was produced purely by special order for one well-known manufacturer of computer equipment. Also in the framework of this article I will try to assemble one of the most powerful retro-systems with this processor.

From the title of the article, I think many people understand that we will talk about the Socket 478 Intel processor

Most people are familiar with the Socket 478 that replaced Socket 370 at the end of 2001 (we omit Socket 423 due to its short lifespan of less then a year) and allowed the use of single-core, and then with Hyper Threading technology “pseudo-dual” processors that can perform two tasks in parallel. All production Intel processors within Socket 478 were 32-bit, even a couple of representatives from the Pentium Extreme Edition server segment on the «Gallatin» core. But as always there are exceptions. And this exception, or to be more precise, two exceptions, were two models of Pentium 4 processors with the Prescott core, which had 64-bit instructions (EM64T) at their disposal.

Intel Pentium 4 SL7QB 3.2GHz: 64-bits on S478

This pair of processors were commissioned by IBM for its eServer xSeries servers. These processors never hit the retail market and their circulation was not very large, so finding them now is very problematic. It is interesting that the fact that if you want and naturally have the right amount of money, or a large enough order, you can count on a special order of the processor that is needed for the specific needs, with characteristics that will be unique and will not be repeated in standard production products. And it should be noted that not a few such processors have been released, in fact, in the 70’s and early 80’s this was the very purpose of the now ubiquitous ‘sspec.’ Chips with an Sspec (Specification #) were chips that had some specification DIFFERENT from the standard part/datasheet.  A chip WITHOUT a sspec was a standard product.  By the late 1980’s all chips began to receive sspecs as a means of tracking things like revisions, steppings, etc.  I will talk about some a little later.

hat’s how the processor looks through the eyes of the CPU-Z utility. In the “Instructions” field after SSE3, the EM64T proudly shows off! Link to popular CPU-Z Validation.

Special processors made for IBM belonged to the Prescott core and were based on E0 stepping with support for 64-bit instructions, which is not typical for Socket 478! The first 64-bit CPUs for “everyone” appeared only with the arrival of the next LGA775 socket, and even then it wasn’t right away; some Pentium 4 models in LGA775 version were 32-bit. I specifically pointed out that the Pentium 4 Socket 478 model with EM64T support belonged to the E0-stepping, although later the more advanced stepping G1 was released, which did not have such innovations. The first model worked at a frequency of 3.2 GHz and had a SPEC code – SL7QB, the second was slightly faster with a frequency of 3.4 GHz and the SPEC code – SL7Q8.

For the rest, these were the usual «Prescott». But the presence of 64-bit instructions made these processors unique, capable of working with 64-bit operating systems and the same applications, allowing them to do what their 32-bit comrades simply could not do.


Not many companies were able to place their order with Intel, but the «Blue Giant» or IBM could do it, and all in order to defeat HP and Dell in a fierce struggle for the server market share for small and medium-sized businesses. And for one, in order to extend the life of their servers with Socket 478. For these purposes, these two processors were released, capable of executing 64-bit instructions. Another advantage of such processors in conjunction with 64-bit operating systems can be called support for a large amount of RAM, but interestingly, in the age of DDR1 with its small amounts of memory of this standard and chipsets of that time, operating more than four gigabytes of RAM was physically not possible even with 64-bits.

So the whole point of using these processors was precisely in supporting 64-bit operating systems and the same software, behind which IBM saw a promising future, as it once was when changing from 16-bit software to 32-bit back in the days of the i386 . And it should be noted they guessed (correctly), that the sunset is approaching the 32-bit era.

I managed to find a processor running at 3.2 GHz with a SPEC code – SL7QB in Canada, so its journey was not close to me. This processor was part of the IBM eServer xSeries 306 server. This server is a regular single-processor 1U blade server that can be installed in a rack. Inside the server, a single Socket 478 was used to hold the Pentium 4 processor, which had support for up to 4 gigabytes of RAM (and the chipset couldn’t see more RAM), two Gigabit network controllers, a pair of 64-bit / 66 MHz PCI-X expansion slots and the ability to support not very sophisticated RAID arrays from SATA-150 or SCSI drives.

Initially, such IBM servers supported conventional 32-bit Pentium 4 processors with Prescott cores, and then the option of using 64-bit Pentium 4 was added. These processors are listed under the part number 26K8430 for the server models using the IBM spare parts database (FRU) (41x and 45x).

If you look at the motherboard of this server, you can see that it is the simplest solution. In fact, this is dictated by the use of the Intel E7210 chipset, which is a close relative of the desktop Intel 875P, but lacking an AGP port, it uses a pair of PCI-X slots instead.

Windows Server 2003, x64 Edition, or various types of Linux were installed on the IBM eServer xSeries 306 server with a 64-bit Pentium 4. Subsequently, IBM expanded the range of its servers, where it was possible to install SL7QB or SL7Q8, among them were models: x206, x226 and x236.

Thanks to its pricing policy, the cost of new 64-bit servers was very affordable compared to competitors. At the time the updated servers were released (2nd half of 2004), prices for the xSeries 206 model started at $909 for a system with a 3.2 GHz processor and 256 MB of memory, the cost of a more advanced xSeries 306 started at $1,409 for a system with a 3.2 GHz processor and 512 MB of memory.

In the server lineup there are also similar models, but with the letter “m” added to the model name. Do not pay attention to them, as these are completely different machines, which are based on processors in a different – LGA775 version.

Squeeze everything to the last drop.

In assembling such a system, I wanted to squeeze everything out of it possible! and even more. But I ran into a number of problems both hardware and software. My goal was: 8 GB RAM + Windows 10 x64. But here a number of nuances arose.

Let’s start with the hardware problems. 4 GB of RAM are easily supported by all the boards, even with DDR1 you can get 4 GB on four slots with four sticks of one gigabyte each. But it is boring and not interesting. DDR2 opens up much more promising horizons, but here a problem arises, often suitable motherboards offer only 2 memory slots. A simple solution to install 2 strips of 4 GB. But the creator (Intel) introduced its limitations, I will dwell on them a little more in detail.

Often questions arise about installing more than 4 GB of memory on the relatively “recent” Intel chipsets with an external memory controller (Memory Controller Hub, MCH). Here we briefly consider the necessary conditions for this, since it is not always that the maximum possible amount is written in the manual for the board. Perhaps many believe that it is necessary to have an x86 processor with support for 64 bit expansion (EM64T), and a board that, in principle, allows you to install more than 4 GB of memory (supporting a sufficient number of slots and memory densities, this depends not only on the chipset, but also on specific board). And of course, a BIOS that can initialize this memory, correctly configure the mapping of PCI devices, and so on. Not all motherboards have a BIOS capable of doing this, but all because there were no 64bits on Socket 478 and all of the above motherboards from which the choice was made are transitional models, since their chipsets existed in LGA775 as well, and were already familiar with the 64-bit CPU architecture from Intel.

CPU: In fact, for addressing more than 4 GB of memory, a 64-bit x86 processor is generally not required, since starting with Pentium Pro, the ability to expand the physical address (PAE) to 64 GB has been introduced (address lines A32 # – A35 # have been added), but at the same time each task can address no more than 4 GB. However, a processor with 64-bit mode allows you to get the most benefits from RAM over 4 GB, and there will be much less problems with the operating system and drivers than in PAE mode. Note that the width of the address lines for 64-bit processors under LGA775 and even Xeon under LGA771 remained the same (36 bit), that is, they still have a maximum of 64 GB of memory, like Pentium Pro. Isn’t it true that the potential laid down in 1995 is impressive?

Chipset: The chipset must be able to address the address space abroad 4 GB, and this feature is not directly related to the supported DRAM organizations, since memory is understood in the broad sense here – this is all the address space available to the processor, in particular, the memory of PCI devices, BIOS, APIC etc. To do this, you must have at least one additional address line on the chipset. That is, the presence of the HA32 # line will provide addressing up to 8GB, HA33 # up to 16GB, HA34 # up to 32GB, and HA35 # up to 64GB.
And if the server chipsets from Intel (for S603/604/771) have no special problems with addressing, then a study of datasheets for Intel’s desktop chipsets showed that Intel’s first desktop chipset with support for advanced addressing is 955x . Earlier 865, 915, 920, 945 have an older address line HA31 #, that is, physically impossible to install more than 4 GB of RAM in the motherboards on these chipsets.

To summarize, the success of the whole undertaking in the hardware implementation consists of the correct BIOS that “understands” all available RAM + 64-bit Processor + Chipset no older than Intel 955x. But, there is one more nuance, this is the manufacturer of the final motherboard, which, even with a good combination of all circumstances, decided to save money and simply did not route the necessary lines from the chipset, and the lower the cost of the motherboard, the higher the risk. And the boards under consideration are from this lower cost range.

Is there a way out? It seems that there is (but to the end I’m not sure due to the lack of the necessary board) and it lies in Socket 478 motherboards based on the Intel G31 / G41 chipset. There are enough examples of working with 8 GB of RAM on motherboards based on the G31 chipset performed by LGA775, but I haven’t seen Socket 478, but as they say there’s a chance =) I’ll leave this for the near or distant future.

Software problem: As I wrote above, the ultimate task was to launch Windows 10 x64. At the moment, I have not been able to do this, one cannot cope here, but theoretically it is possible. Windows 7 x64 ran with a bang, no problems arose. But already with the installation of Windows 8.1 there were problems, or rather, there was only one problem – the lack of the NX-bit’s processor, and without this «feature» installation of a modern OS is impossible.

The fact is that NX-bit support is very different for x86 in 32-bit mode, x86 in 64-bit mode and PAE mode. For 32-bit mode, the good old PAE and NX bits via CPUID. That is, basically, you just need to change the value returned to EDX after CPUID with EAX = 80000001h (for example, delete the CPUID check and change the value in EDX to the desired one). NX bit functions are not supported in normal 32-bit mode, and you just need to “calm” the OS. There are software PAE patches for the kernel of the OS where everything works, including Windows 8.1 and early builds of Windows 10.

For 64-bit mode, NX bits are already in use and the NX bit value is located in the 64-bit record of page tables and catalogs (PTE and PDE). The difficulty is that even if you manage to trick the OS by deleting its check of NX bits, then the kernel (and all other drivers / programs) will try to switch the NX bit each time instructions are stored in the page table. This will cause the system to crash. I have, so far, found no confirmation of running Windows 10 x64 on the Pentium 4 Socket 478: SL7QB or SL7Q8, possibly due to the specificity of these processors and their low prevalence, but I want to believe that it will still be possible to do it, not for nothing that I tried out dozens of early builds of Windows 10.

We assemble Super Socket 478/x64 PC.

Having such a unique processor at your disposal, it’s absurd not to build a powerful x64-retro system on it. One of the options for using such a system in general can be to build a universal “PC-harvester” that supports all Microsoft operating systems from DOS to Windows 10. And here the most interesting part begins – the selection of components and software. The main component is of course the processor – the heart of the system, it remains to choose a motherboard where it can be installed.

The selection criterion has shifted towards building the fastest system with the fastest interfaces, so there are no AGP slots, only a PCI-Express x16 graphic port, and another PCI-Express x1, and preferably a couple, several PCI, support for DDR2 memory at least, as a variant of DDR3 and the more memory, the better. The list of candidates was as follows:

  • ASUS P4GD1 (Intel 915P/ DDR1 4Гб DDR-400/ PCI-Express x16, 2x PCI-Express x1, 3x PCI)
  • Biostar G31-M4 (Intel G31/ DDR2 4Гб DDR2-800 / PCI-Express x16, 2x PCI)
  • AsRock P4i945GC (Intel 915P/ DDR1 4Гб DDR2 4Гб DDR2-667/ PCI-Express x16, 1x PCI-Express x1, 2x PCI).


ASUS P4GD1 looks the best in terms of the number of available PCI-Express connectors and configuration flexibility, there is one drawback – this is the first generation DDR memory, all SATA connectors also support only 150 MB/s.

Biostar G31-M4

Biostar G31-M4 looks like a winner due to the support of 800MHz DDR2 memory, the presence of 4 300MB/s SATA2 ports, but the board is completely devoid of PCI-Express x1 ports and, most importantly, processors with 95 Watt TDP Max are supported, and that means goodbye to “Prescott” which needs more then 95W.  This minus crosses out all available advantages, one of which is support for all operating systems, the presence of appropriate drivers up to Windows 10 x64!

AsRock P4i945GC

AsRock P4i945GC – the best solution, one additional PCI-Express x1 slot, a pair of PCI, four SATA2 ports. Supported DDR2 memory with a frequency of 667 MHz. After weighing the pros and cons, I settled on the AsRock P4i945GC, also due to the fact that it is much easier to find these days on sale, but finding the ASUS P4GD1 is already a problem.

For such a system, the use of an SSD is a prerequisite and it is better that it is installed in a PCI-Express slot. The memory capacity is 4 GB, as a video card I decided to use the  GeForce GTX 980 Ti with 6GB, a memory capacity larger than that of the system itself. In a couple of free slots, you can install a couple of 3Dfx Voodoo 2 in SLi, or something “cool” in the PCI version, for example the same 3Dfx Voodoo 5500. The final assembly I got was as follows:

  •  Intel Pentium 4, 3.2GHz, Socket 478, «Prescott», SL7QB “64-bit Edition”
  • Thermaltake Big Typhoon
  • AsRock P4i945GC, Intel 945GC + ICH7, Socket 478, PCI-Express , DDR2-667 MHz, SATA-2
  • 4 GB (2x 2GB) DDR2 800MHz
  • GeForce GTX 980 Ti, 6GB, KFA2 8Pack Edition
  • SSD HyperX Predator PCIe 240GB
  • Zalman ZM1000-EBT 1000W PSU


To the start, let’s go!

But first, let’s go into the BIOS of the motherboard.

The photo shows that the processor is correctly recognized in the BIOS, indicating its 64-bit capacity. And this is how a 240 GB HyperX Predator PCIe x4 drive installed in the PCI-Express x1 slot is displayed in the BIOS.

I like this solution more than options with SATA options. cables do not get tangles and the appearance of the system becomes more «serious». Let’s see how using just one, instead of the recommended four lanes, PCI-Express will affect the performance of this SSD.

If this result is considered in relation to modern systems, then it is clearly better than any HDD, but loses to modern SSD. But considering that such numbers are available on Pentium 4 on Socket 478!, you can only rejoice at the old man, the responsiveness of the system turned out at a very high level. But you can still connect it to the PCI-Express x4 slot, though you will have to install either a PCI video card or the video card will work in a PCI-Express x1 slot. Another PCI-Express x4 slot is needed on the motherboard =)

(CPU-Z info – click to enlarge)

I really want to try this monster in practice, but before the test results I will dwell a little on the «not for everyone» processors, this should be interesting.

Not like everyone else.

Before starting the tests, I would like to dwell on some processor models, which, let’s say, appeared due to the «efforts» of other companies, and not at the direct initiative of Intel/AMD. First, look into the distant past.

Let’s start with the a Socket 7 AMD processor, which belongs to the K6-2 line on the «CXT» core. A processor with a non-traditional AMD K6-2 38L3054 model name. This processor operates at a frequency of 337 MHz, which is obtained by multiplying the multiplier 4.5 by the system bus 75 MHz. The solution, to put it mildly, is not standard, if you look at the official AMD datasheet, then for the K6-2 processor line you can see different models,

but the 337 MHz model is missing, because it was commissioned by IBM. This is what a processor made for IBM branded PCs looks like:

AMD K6-2 38L3054 - 337MHz

AMD K6-2 38L3054 – 337MHz

As you can see, there is no clock marking on the processor cover. In place of this information there is a marking AMD K6-2 38L3054 (apparently Part number IBM). Below in the photo is a close AMD K6-2 model with a frequency of 333 MHz (3.5 x 95 MHz).

AMD K6-2 333MHz


Xeon X5698

In this case, everything is in place, including information about the frequency of the model.


The following example applies to the LGA1366 socket. The Intel Xeon processor model with the X5698 index, belonging to the «Westemere» microarchitecture, has at its disposal only two cores, while all the other representatives of this server socket have at least four. But then these two cores work at a record clock frequency of 4.4 GHz! and their speed does not decrease under any circumstances, the processor also retained 12 MB of the third-level memory cache. Intel Xeon X5698 was released on special order in limited quantities.

The processor in fact is a 6-core Xeon model, where 4 cores are disabled, but the remaining two are selected at the production stage and are able to operate at that frequency 24/7 at full load. According to one version, these processors were manufactured for the New York Stock Exchange, where at that time the highest core performance was needed, so that multi-billion dollar banking transactions from Wall Street would instantly reach the addressee. The cost of such a processor was set at $ 20,000 apiece. You can find such a processor now, but the cost of a used version will be at the level of the fastest Ryzen 3 R9.

Intel Black Ops

These processors were installed in pairs, resulting in a workstation with four cores operating at 4.4 GHz, and all this at the beginning of 2011. Each processor had a TDP of 130 watts, and water cooling was clearly assumed. It would be nice to find two of these processors and install them in the EVGA SR-2 motherboard.

Continuing the story of Wall Street, it is worth mentioning an even more interesting processor that replaced the Intel Xeon X5698. A special processor model belonging to the «Ivy Bridge» microarchitecture got its own name, immortalized on the lid of the heat distributor, this is not often seen. The name of this processor is Intel “BLACKOPS”. By special order, Intel has released two “BLACKOPS” models. The first worked at a frequency of 4.4 GHz and had at its disposal 4 cores, but at the same time, all 25 MB of the third-level cache was available.

Finding photos in decent quality of this processor is not so easy. But I managed to find a screenshot of the CPU-Z of this processor. It can be seen below.

The x44 multiplier, four cores and a TDP of 250 W, not every VRM motherboard can handle such a processor.

The older model worked at a frequency of 4.6 GHz with six active cores and 25 MB of L3 cache. Both processors have disabled Hyper-Threading Technology. The processors were installed in motherboards with an LGA2011 socket and had a TDP of 250 W, which naturally implied the use of a factory-built VRM. The presence of 25 MB of L3 cache  indicates that these processors were selected from the most successful 10 core die. I could not find information about the cost of processors, but I think it is not far from the cost of the Xeon X5698, in any case it was clearly 4-digit. More information about these processors, and others of Intel’s special ‘Everest’ series can be found in the CPU Shack’s Everest article.

Dual marked Pentium 4 3GHz, or 3.4GHz (one would hope it would also run at 3.2GHz)

At the time of the LGA775 Pentium, Core2 Duo and Quad, Intel made some of its processor models specifically for Dell, IBM, and Apple. Since the Intel Pentium 4 550 model was available for all markets, according to SPEC, the SL8BY and SL8BM variants were intended for Dell. In the first case, the frequency from 3.4 GHz was underestimated to 3.2, in the second to 3.0 GHz. This allowed a single processor to be used in multiple build configurations, simplifying the supply chain and logistics for the builder.

Intel Xeon X5557 SLBFX – Made specifically for Apple for use in the Mac Pro without a heatspreader.

To some extent, the Core 2 Duo E8290 model may be interesting, the model number itself already looks unusual. This 2-core processor operates at a frequency of 2833 MHz and a system bus frequency of 1333 MHz and is based on the Wolfdale core. This processor differs from the usual Intel Core 2 Duo E8300 in the absence of Virtualization technology and Intel Trusted Execution security technology, otherwise they are completely identical. Like its predecessor, the Core 2 Duo E8190 was used in the Apple iMac. This list also includes the Core 2 Quad Q9700 and Core 2 Quad Q9705, which are 167 MHz faster than the well-known Core 2 Quad Q9650, but have only half the level 3 cache, 6 MB instead of 12 for the core 2 Quad Q9650.


There are still a lot of other processors that came through OEM channels and which it is practically impossible to meet in retail, the most modern processor of this kind can be considered Intel Core i9-9990XE, which Intel did not even set the selling price, since the circulation obviously does not reach 1000 pieces. (the typical minimum order qty)

After a short digression, it’s time to press the «Power» button and launch the slowest x64 Monster.


Tests are a good thing, especially when there is something with what to compare. As part of this experiment, I would not want to compare Prescott with Prescott, I just don’t see the point, and it was not for nothing that I installed the GTX 980 Ti. Below I will give the results of those tests that are sharpened by 64 bits, and also try to play modern games.

Testing was conducted in Windows 7 x64 SP1 using the following software:

  • WinRAR x64 v. 5.40
  • WinRAR x32 v. 5.40
  • Cinebench 11.5 x64
  • Cinebench R15
  • Cinebench R20
  • 3DMark 2006 v.1.1.1
  • 3DMark 2011 v.
  • 3DMark (2013) v.2.9.6631
  • Far Cry
  • Battlefield 4
  • Crysis 3
  • Rise of the Tomb Raider

WinRAR v. 5.40 (32/64-bit version)
Kb/s (more is better)

The percentage difference is not significant, only 2% faster, but it is also in favor of the 64-bit version

It also gives you a reminder that the 64-bit version is better

Cinebench 11.5 (32/64-bit version);
points (more is better)

Everything here is similar to the previous result, around 2%

Cinebench R15
points (more is better)

Here it’s already more interesting, since Cinebench R15 exists only in the 64-bit version, so we can say the increase was 100% compared to the usual «Prescott». Therefore, I decided to add some competitors close in importance.  Interesting that the performance rated Athlon 64 3200+ is identical in performance (for once the PR rating is correct it seems)

Cinebench R20
I will not give graphs, I’ll just say that while the test was “spinning”, I managed to drink coffee twice =) I will give only a screenshot with the final result.  This test really rewards multi-core CPUs, so being limited to one core, and a small cache, really hinders it.

HWBOT x265 Benchmark v.2.2.0 – 1080p
FPS (more is better)
All the difference is visible in the screenshot.

Geekbench 4 v.4.2.3, Single/Multi-Core Score
points (more is better)

We pass now to 3D tests =) Will the giant GeForce GTX 980 Ti be able to help? Between them the difference in age is as much as 11 years. Although during the «honeymoon» month, when they were together in a system of serious quarrels between them, it wasn’t a trifle 😉 It’s scary to think if the GeForce RTX 2080 Ti was installed instead of the GeForce GTX 980 Ti.

3Dmark 2006 v.1.1.1, Score

Although the Pentium 4 tried its best, it couldn’t «satisfy» the GeForce GTX 980 Ti. The final result is 4666 3DMarks. In the heart of the HWBOT test I found a similar result on points – 5155, which was obtained on Intel Pentium 4 3.2 GHz Northwood and GeForce GTX 9800GT @ 850/1102 MHz.

Despite the difference of at least 10 generations, a more powerful video card without processor support could not «pull out» the final result. By the way, the balance of components must be observed under any conditions and at any time, and the GeForce RTX 2080 should not be mixed with four or, God forbid, dual-core CPU.

3DMark 2011 v.1.0.132 – Performance 720p/ Extreme 1080p

The final numbers of the result have not changed much, and FPS in a number of subtests froze in place, the video card is clearly experiencing processor hunger. Under equal conditions, the GeForce GTX 980 Ti on modern systems is gaining ~ P20123 and X9123. It’s not difficult to calculate the difference.

3DMark (2013) Fire Strike/ Extreme
In fact, I wanted to launch Fire Strike most of all, the very feeling that «this» works already instills pride and confidence in the future.

Yes, the result, as in the previous case, is extremely small, but it is still there! I think many more users are armed with the GeForce GTX 980 Ti, so you can check the results with your own and be glad how much your system bypasses mine =)

What about the games? Easy, let’s start with the “heavy.”

Battlefield 4 (Tashgar)
Frames/sec (Medium / min / max)

Even despite the high-speed SSD, loading took longer than on a modern PC, but as a result the Tashgar card was chosen, where you can ride a jeep with a breeze. All graphics settings in both resolutions were set to Medium. Although looking at the graph, we can say: Yes, what is the difference 😀 It’s a pity that the FPS did not reach 30 frames per second, I hope that the future overclock will help to reduce the gap.

Rise of the Tomb Raider
An unpleasant surprise awaited me here, the game did not want to start, even despite a couple of reinstallations. After clicking on the shortcut on the desktop, only an error warning appeared What I did not understand the reason for, I can only assume that the launch requires a set of any processor instructions that are not physically available for this processor.

Crysis 3
Here the situation is a little better, it was possible to go to the main menu, select the settings, but they could not advance further the menu, neither the “new” game, nor the loading of existing saves, showed a 3D screen, only a black screen, frozen forever. Why didn’t 3D rendering begin? Perhaps for the same reason as with Rise of the Tomb Raider.

Far Cry (1024×768/1280×1080, Max Quality, demo 3DNews – Research, 2x loop)
Average result, frames/sec

In higher resolution, greater FPS? It’s just that the video card is tired of working in low resolutions =)

What can be summarized by the 3D component? There is a lack of processor power for this video card. From here it does not matter what settings and what resolution is set. You can tighten up the performance by replacing the RAM with a faster one, by setting timings instead of fives – fours, or even all three. It is possible at such a frequency, but miracles cannot be expected from my “Chinese” kit. It’s better to overclock the processor to at least 3.8 GHz, of course, all 4 GHz, but I don’t know how the motherboard will behave, but I have a desire to try it.

By pure processor power, you need to understand that this is an ordinary “Prescott”, albeit with a tremendous zest under the hood.


As for the first impressions of the resulting 64-bit system on Socket 478, they are the most positive, even despite the fact that the processor was unable to swing the video card. But as I wrote at the beginning of the article, this assembly claims to be a «for all» role and even for launching DOS games or GLIDE from 3Dfx.

This article is part of The CPU Shack’s continued partnership with guest author max1024, hailing from Belarus. I have provided some minor edits/tweaks in the translation from Belorussian to English.

]]> 4
Pardon the Mess…Upgrading PHP – FIXED Wed, 18 Sep 2019 20:51:39 +0000 Moving The CPU Shack to PHP 7 and it has broken some old legacy code (now why would a museum have old code? ha).  A few things (like the header and the OLD pictures section) are not working, should be fixed soon.


EDIT: Looks like we got it all fixed, if ya notice anything broken/not working let me know


]]> 0
Sushi Tacos and Lasers: Marking Intel Processors Wed, 28 Aug 2019 21:02:26 +0000

Intel ink stamp used for marking chips in the 1970’s

In 1987 Intel became the first semiconductor manufacturer to use lasers to mark all component parts, including ceramic packages (they still used ink for some but had the capability and eventually rolled out laser marking to most all of their assembly/test locations).  Conventional ink marking for ceramic packages required a post-mark ink cure time and production yields ranged from 96%-98% before rework.  That percentage may be good on a school exam, but in the production environment, having to rework 2-4% of everything off the line is unacceptable.  It costs resources, money and time that do not go to making profit.

Intel A80387-20B SX024 remarked with a laser

With lasers, however, the cure operation was not needed and yields increased to better then 99.95%.  Lasers were so consistent that marking became a zero rework process and overall productivity increased by 25%.  Throughput also increased significantly (less rework and lasers are faster) and inspection requirements dropped by 95%.  These lasers were originally developed for ceramic packages but found to work well on plastic packages as well.  They also made remarking significantly easier, old markings could be crossed out with the laser and new marking made.  No stencils, pads or masks were needed, the lasers were programmable and very fast.

Intel continues to use laser marking today (as do most manufacturers).  Intel uses laser marking systems from Rofin-Sinar (now owned by Coherent).  These lasers are typically from the PowerLine E line, which are a diode end-pumped Nd: YVO4 (Neodymium doped yttrium vanadate) diode laser.  These are basically a high ends high power version of the diode lasers used in laser pointers.  Intel went with diode lasers as they were faster, and cleaner then CO2

Intel Package marked SUSHI TACO SALAD. Perhaps the technician was getting hungry while trying to dial in the laser settings.

lasers (at the same power levels).  These lasers typically run in the 10-40Watt range.  Most commonly they are a 532nm laser (green light).  In order to achieve the speeds needed, these marking systems are ran in a pulsed mode, 1-200KHz depending on the speed and material being marked.  This allows the laser to run at very high power, for very short pulses.

This of course requires some tuning, essentially simple trial and error to find the right setting for a given material.  Today’s packages are very thin, and marking on the organic substrate (or the silicon die itself) must be done in a way that leaves the markings visible, but does not damage the underlying structure. These markings are often only a few microns deep on silicon and 25 microns on a package, as deeper then th

Motorola PP603 Engineering Sample with ROFIN BAASEL test marking on the die

at is the chips circuitry.

Rofin offers testing and calibration for some of their bigger customers (such as Intel) where they help develop the settings needed.  This results in a lot of ‘oddly’ marked chips.  Companies will ship packages, dies and whatever else needs to be marked to Rofin along with

specifications of the markings (how wide, tall, deep etc) and the systems/settings are worked out to make it workable on the production line.  Anyone that has used a CO2 desktop laser knows they are not the fastest thing around.  An engraving project completion time is measured in minutes.  When marking chips, speed and accuracy are of paramount importance.  Rofin advertises their lasers as such “Our semiconductor marking solutions achieve marking speeds up to 1600 characters/second. Even at a character height of 0.2 mm and line widths of less than 30 µm they still ensure best readability.”

Package with laser settings engraved

Here we have a test chip package from Intel, marked up by Rofin, there is tests of the 3d-Bar code, Lots numbers s-specs and others.  There is also some calibration markings, its useful to engrave the settings used as for the test, as the test.  In this case we see 25k, 650mms and 23.8A.  These are 3 of the fundamental settings for the laser system.  25k is the pulse rate (25KHz) of the laser, 650mms is the speed, or feed rate, 650mm per sec (about 2ft/sec),  thats a relatively slow speed, but probably was one step in the calibration process.  The 23.8A is the current for the laser, in amps.  Its a rather high current compared to say a continuous wave CO2 laser which runs currents in the milliamps, but these are pulsed lasers, so that current is only needed for a fraction of a second.

Marking can also be done on the die itself.  Here we see a sample

Flip chip marking marketing sample by ROFIN SINAR in Tempe, AZ

(probably an actually marketing sample given away to customers) of a flip chip die, with ROFIN SINAR markings on it, and erven their phone number for their location in Tempe, AZ (only a few miles from several fabs in Chandler, AZ (including Intel and Motorola (now NXP)).

As chips become smaller, marking technology continues to evolve with it.  Markings today have become much less about what the consumer sees, and much more about traceability and trackability.  Being able to follow a device through the supply chain, or trace a defective device back to when/where it was produced.  Marking enhancements also play a great role in combating counterfeiting, helping them out of the supply chain.

There is a lot that goes into designing, making, assembling and even marking a computer chip, and often times things that seem the simplest, such as placing marking on a chip, are anything but simple, and just as important as the fabrication of the die itself.

]]> 0
How to 386 Your AT: Intel Inboard 386/AT Wed, 14 Aug 2019 22:40:54 +0000 With the release of the 32-bit Intel 386 processor in 1986, owners of IBM PC/XT and AT type systems (8088 and 80286 systems) were left a bit in the dust.  This was a concern (or opportunity) for Intel as well. They designed an upgrade solution at the same time as the 386, to be able to be used in the now obsolete computers.  This was the Intel InBoard 386 series of upgrade cards.

InBoard 386 AT with 1MB of RAM and 80287 FPU Option (very unusualy on a late model Inboard, this one from 1990, but the FPU is from 1986)

The InBoard, as its name implies, was a internal 16-bit ISA card that was used to upgrade these systems.  It included a 386DX processor running at 16MHz, 64K of cache, and (optionally) 1-3MB of additional RAM.  Two version of the board were made: the PC/XT version was designed for 8088 processor based systems, and the AT version was for the 286 systems.  These boards required the removal of the original processor, and then a cable was ran from the old CPU socket, to the the InBoard 386 board.  On system start up the original BIOS booted the system, and loaded the DOS operating system.  The config.sys file would then call on the drivers to load the InBoard 386 specific features.  The original system was essentially unaware of the new processor, instructions were executed by the InBoard transparently.

Flat Ribbon Cable used for connecting the board to the old CPU socket. If the cable could not reach the socket, your system was not compatible. Cable length was restricted by signal timing, rather then the common complaint of Intel being ‘stingy’

Early AT systems used a 6MHz CPU and ISA bus speed, so Intel provided a 8MHz crystal to replace the original on the motherboard. This ensured the ISA bus that the InBoard used to communicate with the original memory and peripherals ran fast enough and did not become such a huge bottle neck.   The base model InBoard did not come with any RAM, it could use your existing system RAM just fine.  Adding RAM, however, was a worthwhile upgrade.  The Board itself supports 1M (36 100ns 256 kbit chips, including parity) and a daughter card could add another 1M or 2M.  This RAM was accessed via the 80386s 32-bit address bus so was much quicker.  It also was a single wait state access.  You could configure the InBoard to backfill (take over for) your existing system RAM, at least down to 256K, so that the computer would only use the first 256K of the slower RAM before moving to the RAM on the InBoard.  If your system had 512K of RAM you would ‘waste’ half of it but at the benefit of much faster access times.  The Inboard 386 had another trick up its sleeve to improve speed…

The Inboard included 64K of 35-45ns cache for its 386 processor, which contributed greatly to its performance increase.  These were implemented by 8x Motorola MCM6290P35 (or similar) 16k x 4 SRAMs and 3x TC55416P-25 16kx4 SRAMs for Cache Tags.  Early boards had unmarked gold cap chips implementing the cache, this type and speed of SRAM was relatively expensive in 1986.  This allowed the Inboard to execute most (Intel claimed 90%) of memory request instructions without having to go to main memory (on the original motherboard and on the Inboard card itself). Today cache is a common feature on almost all processors, with L1, L2 L3 and sometimes even fourth level caches, but back in 1986 it was a relatively new thing.  This cache was accessed by the 386 with 0 wait states, making it very fast. In some tests this allowed the InBoard 386/AT to actually outperform a native 386 system (without cache).

Intel A80386DX-16 SX213 from 1990 and 1986 80287 FPU option

The InBoard also supported an optional 80387 coprocessor for even faster performance.  There was one problem with this when the InBoards were released…The 80387 was not yet ready, so instead Intel made an adapter to plug into the 80387 socket, and run a 10MHz 80287 coprocessor instead.  This was adequate considering that most 286s at the time were 8-10MHz chips (which meant if they DID have a coprocessor it would run at 5-7MHz, 2/3s the CPU speed).

The software that came with the inBoard 386 had some interesting features.  It could be configured to change the board speed at boot up, and on the fly.  Originally this allowed 2 options, 8MHz and the full 16MHz.  Later versions allowed 4 different speed steps  The onboard cache could also be enabled/disabled on the fly, and a disk cache could be enabled to use extended memory as a hard drive cache, further increasing performance.

When the Inboards came out in 1986/87 they were, at least by today’s standards, rather expensive.  The base model with no RAM and no FPU was $1995, add 1MB of RAM and that hits $2495 (and nearly $3000 for the 2M version.  The FPU option (the 287 based one at least) was another $495.  The installation kit (there was different ones for PLCC and PGA based 286s) were $200 each.  The ribbon cable to connect the board to the

A ‘PLUG’ was required for the 287/387 option to go in the existing motherboard 287 socket. Its only purpose is to tell the system that it does have an FPU (as some software required a hardware FPU)

CPU socket, was a $200 item, so careful installation was a must.  Consider though that a  basic PC at the time was $5000, about the same as the price of a new economy car ($11000 in 2019 dollars), so that made the InBoard an appealing upgrade for those with 286 based systems.  For half the money of a new PC one could have nearly the same (and in some cases more) performance.  By 1989, and the release of the i486, the base price of the InBoard 386/AT had dropped to $1295, or $1995 for the 1MB version.  The AT version didn’t become as popular as the PC/XT version, perhaps because the performance gains of going from an 8088, even on an 8-bit bus system, to a 386 were greater then that of a 286->386.

Benchmarks from PC Mag (and similar ones from InfoWorld) showed the InBoard 386, could be 10-12% FASTER then a standard 386 system. This is largely due to the cache.

As computers have gotten so much less expensive, and obsolete so much faster, upgrades like these have become a thing of the past, but in the 80’s and 90’s they were incredibly popular in their many forms.  After the InBoard series, Intel made a line of Overdrive processors throughout the 1990’s and companies such as Everygreen, PNY, Kingston, Powerleap and many others made entire businesses out of designing upgrades for older computers.

]]> 1
Xeon Overclocking: Making Gallatin Gallop Thu, 13 Jun 2019 05:15:04 +0000

This article is part of The CPU Shack’s continued partnership with guest author max1024, hailing from Belarus. I have provided some minor edits/tweaks in the translation from Belorussian to English.

If you still remember the times of the Pentium 4 running on Socket 478 with the Northwood, Prescott and Gallatin cores, then you should remember what about these processor cores were different from each other. Northwood was fast like a mountain doe due to a shorter 20-stage pipeline that allowed it to perform many operations very quickly without tremendous losses due to branch mis-predictions etc. , but inferior to Prescott frequency potential in overclocking, which in turn was as strong as a buffalo, due to twice the L2 cache memory(1M vs 512K) and finer tech process (90nm vs 130nm). But like any hoofed animal, it was not agile, to achieve the higher clock speeds its pipeline was extended to 31-stages, resulting in some cases, clock for clock out performing Northwood, But doing so at the expense of much heat.

A separate niche in the food chain was occupied by “Gallatin”, which combined the properties of the two previous iterations, a shorter 20-stage pipeline, with the high clock speed of the Prescott, but in its arsenal it also had a very formidable weapon, which was the presence of an additional L3 cache of 2 MB. The price of ownership of this “beast” was high, and in the literal sense of the word, it was equal, like any other representative of the Extreme Edition series – $ 999. I resisted this extreme processor, choosing  hero from AMD, the FX-51, which I consider to be one of the most outstanding processors of all times and peoples.

Xeon Universal Chip Analyzer by

What could be better, cooler or faster? I’ve been looking for an answer to this question for a long time, until I became acquainted with the Intel Xeon server processors on Socket 604 and in particular with processors based on the Prescott 2M core, which have twice the cache size compared to their desktop counterparts and can run on ASUS production boards.

As everybody knows, it is the advanced desktop flagships of both processor manufacturers that originate from the server segment. So from the Opteron’s turned out the AMD Athlon FX-51, and from the Intel Xeon MP – the Pentium Extreme Edition. This parity of events has been preserved until now.

Xeon Gallatin MP

The server representatives of Intel Xeon processors on the Gallatin core are divided into two branches: Xeon MP (Gallatin) and simply Xeon (Gallatin). The differences are in the number of simultaneously supported processors in the system. So Xeon MP supported running up to four processors  usual Xeon could be installed in servers only in pairs. There is also a difference in steppings of the processor core itself. Let me remind you that the desktop version of “Gallatin” were the M0 stepping, just like the regular Intel Xeon series.

The Xeon MP line, by contrast, is based on an earlier stepping from A0 to C0. Among the representatives of M0 stepping, you can find four Xeon models (Gallatin) with 1M of L3 cache, with frequencies from 2.4 GHz to 3.2 GHz, and one model with a doubled  L3 cache to 2 MB, pretty much the same as a Pentium 4 Extreme Edition. This model gave rise to the first “extreme” Pentium.

The most powerful representative of the Netburst microarchitecture on the Gallatin core belongs to the Xeon MP line, which also has models with a 2 MB L3 cache, but there is another processor model with twice the L3 cache compared to Pentium Extreme Edition – This is a Xeon MP (Gallatin) with a frequency of 3 GHz and a sSPEC code – SL79V. Such a “Gallatin” with 4M of L3 should be called “Ultra Gallatin”!

I will dwell a little more on this processor. This processor is based on the C0 stepping, it has an effective frequency of 3 GHz, while the FSB frequency is 400 MHz (Quad pumped so 100MHz base). The processor is compatible with its three similar counterparts in Socket 603 motherboards. The cost in batches of 1000 pieces began at $ 3,692. As you can see, for it you can purchase as many as three Intel Pentium Extreme Editions and have money left over for the mainboard with RAM. This processor was introduced on March 2, 2004, while the Pentium 4 Extreme Edition with a frequency of 3.2 GHz on a more recent stepping a little earlier – on November 3, 2003.

Apparently, to produce such a monster was not quite an easy thing for Intel. The processor is visually different from all the other representatives of the Xeon, firstly it has a much larger mass, unfortunately it does not its part number on top (Xeons of this era had the markings on the BOTTOM on the PCB), as well as a more massive heat spreader. For clarity, I made a photo of five similar processors from my collection. In the top row on the left is the classic Socket 423 Intel Pentium 4  on the left, a Socket 603 Intel Xeon DP running the Prestonia core or a sort of Northwood in the server version.

In the bottom row from left to right: today’s hero is Xeon MP (Gallatin) with L3 = 4 MB, then the Socket 604 Intel Xeon – Irwindale core (essentially a 2M Prescott) and the usual representative of Socket 478. Socket 423, 478, 603 and 604 are all similar Netburst architectures. But the Xeon MP (Gallatin) can be seen with the naked eye that the cover is just huge. The height of the cover is also higher than any of the other representatives of  the Intel Xeon.

Even if you put any Socket 478 processor on the cover of the Xeon MP, it will fit in its entirety.

On the reverse side, the processors look like this (after the S423 Intel stopped using the stagger PGA arrangement and went for a higher density non staggered PGA arrangement):

True, this processor has one weak point, no, this is not a cost, although at the time of its appearance a ready to go four-processor server cost tens of thousands of dollars, but still they did not reach five-digit amounts, as in the case of six Pentium Pros. The Achilles heel was the FSB frequency. I have not yet met a single processor product where everything would be perfect, at least one minus there, such as, for example, support only for registered memory from AMD Athlon FX-51 and so on. The FSB frequency was equal to a modest 400 megahertz, whereas for the Pentium 4 Extreme Edition it was 800 MHz.

Naturally, there were limitations of a purely technical nature. FOr a single processor a 800MHz FSB was possible, but to design a 4-way system with such a FSB was not possible at the time, due to the architecture. Since April 2003, the Intel Xeon line has a very serious competitor – the 1st generation 64-bit AMD Opteron server processor, which by its characteristics was superior to the competitor. If you look at the organization of a four-processor AMD Opteron-based server, you can see that there are no bottlenecks in the interaction of processors, chipset and RAM.

AMD Opterons at the time were an advanced processor, each Opteron had its own built-in north bridge and memory controller, so there were no problems with interaction with the external chipset. In addition, each Opteron had three point-to-point HyperTransport buses providing 3.2 GB / s bandwidth in each direction (full duplex 6.4 GB / s). Due to this scheme of interaction of processors, the main advantage was achieved – Scalability, which is a very serious argument for building any server platform.

Now let’s look at a similar scheme for Intel server processors:

So, as the concept of “chipset” in its original form then meant what was supposed to be, and nothing in the processor core was “superfluous” yet, except for the ALU, FPU and cache, the interaction with the north bridge, memory and other processors was carried out over a single 64-bit bus with 3.2 GB/s of bandwidth. If one processor was used, then there were no particular problems with insufficient bandwidth. For a pair of processors, it can be tolerated, but when the system already has four Xeon MPs installed, then each processor could be satisfied with only an exchange rate of 800 MB / s, which was clearly not enough. That’s precisely why, in order to exclude, as far as possible, frequent access to data in RAM, Intel decided to increase the cache, which in essence was the only right decision then.

Looking ahead, I note that I do not have a four-processor server, but there are a couple of processors with the SL79V marking. I wanted to compare “Gallatin” with 4 MB L3 with desktop extreme “Gallatin” for a long time and now there was such an opportunity, it’s good that computer components become obsolete so rapidly and from the cost of one processor that was $ 3,692, can now be had for but a mere fraction of that, or about $5 apiece.

Motherboard Selection

As it is known, the absolute majority of server motherboards and workstation boards are of no interest to enthusiasts due to the complete absence of any settings in BIOS, and often AGP or PCI-Express interfaces are missing. Below is an example of such a boring motherboard.

The Xeon MP “Gallatin” processors with 4 MB L3 are not difficult to find, finding the right motherboard is not so simple. Sometimes it takes a year, and sometimes even more, to find the right hardware, a specific motherboard or a rare processor. To implement this idea, I was looking for only one motherboard model – ASUS PC-DL Deluxe, which is ideally suited for overclocking and fits into my concept of building such a system, and also supports the Xeon MP “Gallatin” with 4 MB L3! although there is no official mention of this even on the official Asus website.

ASUS PC-DL Deluxe – Dual Gallatin Xeon MP Support (unofficially)

I searched for it for more than a year, but in the end it was two of my comrades who helped me get a brand new Old Stock box. If we were talking about motherboards that still can overclock, my list of recommended motherboards on Socket 603/604 is as follows: ASUS PC-DL Deluxe, ASUS NCT-D, ASUS NCCH-DL and Iwill DH800. That’s all. Only four models, three of which are based on the familiar Intel 875P logic set from the desktop Socket 478 motherboard and just one, the ASUS NCT-D, on the Intel E7525. So if someone wants to collect such a monster, then I give you reference points for searching;-)

Turning on the board with a single Xeon MP “Gallatin” processor with 4 MB L3 on the monitor screen you can see the following POST screen:

I used the latest BIOS version 1009, which correctly identifies the fastest Gallatin, pointing to the processor’s clock speed and the distinctive feature – a very large third-level cache. So, as this processor supports Hyper-Threading technology, the line with the model name appears twice on the screen. If two processors are installed, the processor records are doubled.

Enter the BIOS Setup Standard by pressing the “Del” key. I will give you the main settings screens that are interesting from the enthusiast’s point of view.

The “Advanced” menu contains all motherboard devices that are implemented using separate controllers, or are part of the system logic of the motherboard and which can be turned off or on if necessary.

All the most interesting settings are concentrated in the section – Advanced Chipset Features:

This section contains settings that affect the operation of the processor and memory subsystem.

The Asus motherboard allows you to change the FSB to 165 MHz through the BIOS Setup settings. The board also has a jumper, which is responsible for selecting the FSB frequency of 100/133 MHz. If the switch is set to 100 MHz, then a maximum FSB frequency of 132 MHz is available in the BIOS. This is sometimes enough to overclock 400FSB Xeons, since they have high multipliers, like my 3 GHz processor, the multiplier is x30. But sometimes this is not enough. But there is a way out, thanks to the use of SetFSB overclocking utility.

So, as this motherboard, due to the system logic used, is closer to the desktop options on Socket 478 with the Intel 875P chipset, the special version of the utility for the Asus P4C800 motherboard, which allows you to overclock the FSB even further than the physical capabilities of the motherboard themselves. If you have a question: “And what if using the jumper to immediately switch the FSB to the position of 133 MHz?”, Then I will answer – the board refuses to start completely, although with the help of SetFSB I overclocked it on the bus to 140 MHz.

The Xeon MP processor allows you to change the multiplier in the BIOS, so purely theoretically at the system bus frequency of 140 MHz and the x30 multiplier, its clock frequency would be equal to 4200 MHz, but in reality this is not achievable (something would probably melt if it were). Go to the settings of the RAM above, you could already see that in the Advanced Chipset Features section it is possible to change the main four primary timings of the main memory, but as regards the memory clock frequency, the choice here is not great:

The entire choice comes down to DDR266 and Auto, which also corresponds to PC2100 or DDR266 mode. But still, you can squeeze something else out of the board, resorting to the “software-overclock” method of overclocking and help us in memory settings by another utility, MemSet.

Although it is a mistakenly determines the frequency of RAM, it does make it possible to change the timings “on the fly” in the OS. By default, the board takes inf from the SPD (Serial Presence Detect, a eway for DIMMs to tell the BIOS what they can do), but for this test I use a couple of sticks based on Winbond BH-5 chips, which differ in their ability to work at the lowest possible timings. Therefore, I naturally changed the memory timings to all twos. Summarizing all the above, we can conclude that this motherboard allows you to flexibly configure the system, and for a motherboard with two processors, this is the exception rather than the rule. The only thing that it lacks is the choice of settings for the voltage supply of the processor.


Gallatin the Great

As the main air cooling system, I used a classic cooler – Thermaltake Big Typhoon for the S478 P4 EE and a server heatsink for the Socket 603 made of aluminum. I had a copper radiator, inherited from ASUS NCT-D on Socket 604, but the dimensions of the mounting holes in the ASUS PC-DL Deluxe motherboard are different, and it simply does not fit there, we will consider it a nuance. Overclocking was performed using the SetFSB, MemSet and BIOS Setup utilities.

This is how the processor is seen by the CPU-Z and AIDA64 utilities.

And this is how a pair of Gallatins, one Socket 478, the second Socket 603, appear. The strengths of one are the third-level cache, the other has the FSB frequency twice as high.

And the complete system configuration according to CPU-Z utility:

The system turned out to be interesting, it remains to check it for overclocking, and at the same time find out what the system bus frequency or memory cache is better. Both test systems can not run at the same bus, primarily due to different frequencies of RAM, but it will be more interesting to find out where the “bottleneck” is in each of the systems.

As I mentioned above, the system bus frequency with the fastest Gallatin at its peak was 140 MHz. The maximum processor validation was 3597 MHz, there is no talk about stability here, but this figure gives an idea of the overclocking potential. The system could pass short tests at a frequency of 3500 MHz, but was completely stable around 3400 MHz. In principle, this will give us the opportunity to compare both “Gallatin’s” at equal frequencies.

To get 3200 MHz, the FSB must be increased to 106.8 MHz and in order to get 3400 MHz, the FSB increased to 113.4 MHz, while the RAM in the first case worked in DDR285 mode, and in the second, as DDR302.

Test Configuration

The main components of the system:
• Intel Xeon MP, 3000 MГц «Gallatin», L3=4 Mb, SL79V;
• ASUS PC-DL Deluxe, Socket 603, chipset Intel 875P;
• DDR1 SDRAM, 256 МБ х2, 400 МГц (Winbond BH-5) CL=2;
• Gainward GeForce 6800 Ultra, AGP, 256 Мб (Forceware 81.85).

Testing was conducted in Windows XP Sp3 and Windows 7 x64 using the following software:
• Super Pi mod. 1.5XS (task 1M);
• PiFast v.4.1;
• wPrime v.1.43;
• AIDA64 5.50.3600;
• WinRAR x86 v. 5.40;
• Cinebench 2003;
• Cinebench 11.5;
• PCMark 2005 v.1.20;
• 3Dmark2001SE Pro b330;
• 3DMark 2005 v.1.3.1;
• Far Cry


Super Pi mod. 1.5XS (task 1M)
Seconds (less is better)

In a single-threaded test, the presence of an impressive cache, from which data is clearly drawn, clearly gives an advantage. The 3 GHz Xeon MP outpaced the higher-frequency Pentium 4 Extreme Edition and, at the same time, all other Socket 478 representatives as well as other Xeons. Overclocked to 3400 MHz, the Xeon MP also left behind all of the above representatives, in order to catch up with it you will need either a Pentium 4 Extreme Edition running on ~ 3700 MHz, or ~ 3.9 GHz Prescott. Separately, representatives from AMD look very good, in order to catch up with the AMD FX-51, an Intel needs an extra 1GHz.

PiFast v.4.1
Seconds (less is better)

The situation in this test has changed a bit, since the Pentium 4 Extreme Edition overtook the Xeon MP at an equivalent frequency, apparently the cache size is not critical here. at 3.4 GHz it is no longer dominant, but occupies a middle position.

AIDA64 5.50.3600
Reading from memory, MB/s
AIDA64 5.50.3600
Writing to memory, MB/s

So we got to the weak point – the memory subsystem. In terms of speed, the Xeon MP-based system doesn’t have enough stars from the sky, and if it were given a full-featured DDR400, the situation could be completely different …


Cache and Memory benchmark from the AIDA64 test package: Pentium 4 Extreme Edition and Xeon MP with L3 = 4 MB at the same clock frequency – 3.2 GHz.

The cache latency of the 1st and 2nd level is identical for both processors. But the third-level cache of the Xeon MP is a little slower than its counterpart brother, and in terms of speed performance of the RAM between them there is a general gap or almost two-fold superiority of the desktop system over the server one.

And relative results to other platforms
(Reading from memory, MB/s)
And relative results to other platforms
(Write to memory, MB/s)

AIDA64 5.50.3600 CPU Queen Test
score (more is better)

WinRAR x86 v. 5.40
Kb / sec (more is better)

A large third-level cache saves the situation in some way, but the frequency of the RAM in this test is also very important.

PCMark 2005 v.1.20
score (more is better)

For the PCMark 2005 Xeon MP test, this is clearly not the best candidate, although on the same frequency as the Pentium 4 Extreme Edition it scores almost the same amount of final points, you can say the bus versus the cache is equal to parity.

3DMark 2001SE Pro b330
Total Score (more is better)

But this popular “game”  from 2001, “Gallatin 4M” came to the liking. At an equal frequency, we overtook both the Pentium 4 Extreme Edition and other representatives, despite the low frequency of the RAM.

3Dmark 2005 v.1.3.1
Total Score (more is better)

In a more modern test, the clock frequency and memory bandwidth is not enough, despite all the advantages of a large cache, the result was the worst among equals.

DOOM III (1024×768, High Quality, AA4x, timedemo1, 3x loop)
Average result, frames per seconds

Is there something missing for this system in DOOM III, or is the game not able to use the entire cache, or is there not enough clock frequency? Raising the frequency from 3000 to 3400MHz (a 13% increase) yields only a ~4% increase in frame rate. This frequency increase also is increasing the FSB, so if the memory was the bottle neck, we would expect to see performance increase inline with that, so there must be another limit here (AGP bandwidth limitation or something perhaps)

Far Cry (1024×768, Max Quality, demo 3DNews – Research, 3x loop)
Average result, frames per seconds

In Far Cry, the situation is not much improved, literally a couple of FPS difference.  In this case we start to see similar diminishing returns though for even the P4EE.  Going from 3600-4100MHz (~14%) results in only a 5.4% gain, Video card bandwidth may be limiting things here.

We turn to multi-threaded tests, see what a single Xeon MP (Gallatin) with L3 = 4 MB will be capable of, as well as the work of two processors in a pair.

wPrime v.1.43
Seconds (less is better)

This multi-threaded and simple test loves the clock frequency in the first place, so a single Xeon MP lost to everyone, even 4 MB L3 cache did not save it. At a frequency of 3.2 GHz, the presence of a larger cache allowed it to become one of the leaders among the equal-frequency models, which means there is still a certain advantage behind the cache.  There is a critical point here where the clock speed hits a speed high enough to make cache important. At slower speed the CPU isn’t completing the instructions fast enough to empty the L1/L2 caches, but at over 3GHz the CPU is able to process more data then the L1/L2 alone can fulfill, making the larger L3 cache helpful once again. Along the same lines, dual 3400 Gallatins are almost twice as fast as a single.  Slow it down to 3000, and the gap, while small, widens a bit (from 0.3% to 0.7%).

Cinebench 2003
points (more is better)

In this 2003 sample rendering task, the Xeon MP also looks decent, and a pair of processors at 3.4 GHz are matched close to the next generation Xeon “Irwindale” with a frequency of 3.8 GHz, and in uniprocessor configuration, actually beats the 3.8GHz Irwindale.

Cinebench 11.5
points (more is better)

In the 2010 release of Cinebench, the Gallatin, while still quick, isn’t quite as close as in the 2003, but still manages to compete with higher clocked Irwindales in dual configurations.


Undoubtedly, the system based on two Xeon MP “Gallatin” and ASUS PC-DL Deluxe is of interest. And if I was asked which system is better and more interesting in terms of retro-build PCs, a pair of Xeon MP “Gallatin” with ASUS PC-DL Deluxe and AGP or a pair of Intel Xeon DP with ASUS NCT-D with PCI-Express ?, I would not hesitate to answer – the first. Let it lose in terms of pure performance, but it is much more interesting to deal with it, it remains to make a pin-mod and bring the system to a new level of performance, but this is some other time. Once again, I can say that I am not ashamed of this Asus, its convenient layout, thoughtful design and overclocking abilities of this two-socket board make it the best choice for retro-clocking.

As for the processor itself, on the one hand, it surprised, and on the other, it left a good impression, despite such an impressive third-level cache, everything was fine with overclocking. The only thing I would like to receive is a photograph of what is under such a massive lid of the heat spreader. (coming soon 🙂 )

]]> 0
All Boxed up: Retail Boxed CPU’s Sat, 01 Jun 2019 06:28:18 +0000 NIB MOS 6502 CPU

New In Box MOS MCS6502 CPU from 1975 (Michael Steil –

Today most all processors are permanently installed in their device (soldered in) or were taken from a bulk tray and installed by the OEM such as Dell or HP.  AMD has, at least with their higher end CPU’s gotten quite creative with the marking on the chip itself, and both AMD and Intel still offer some pretty amazing retail packaging for their enthusiast processors (the i9 in a dodecahedron package is pretty cool).  There was a time when almost all processors were available in retail packaging.  This was the time of physical computer shops, largely bypassed now by the Internet, where the packaging of a processor helped sell it.

I collect such New In Box (NIB) processors as they are pretty need to see the branding/marketing that went with the CPU’s of years past, and was reminded of this when I saw perhaps one of the oldest NIB CPU’s I have ever seen on Michael Steil’s blog.  An original MOS 6502 processor from 1975 in its original shipping box, as close to NIB as one can get.  MOS’s packaging would make Apple proud with its simplicity and design keeping everything tidy and the MCS6502 visible as soon as the box is opened (I am happy they didn’t use miserable black foam either, so the CPU is pristine after 45 years).  Even the original invoice is included.  $25 for the CPU ($118 in 2019 dollars) and $10 (nearly half the cost of the CPU ($47 in 2019)) for documentation)

Cyrix 83D87 386 FPU

Cyrix 83D87 386 FPU Bundled with Borland Quattro PRO Spreadsheet software (a big thing back in 1992)

Intel started offering retail boxed CPUs with the 8087 coprocessor.  This was really the first chip designed as a user upgrade to their PC (a new thing back then).  Before this Intel’s closest thing to a NOB was University Kits or Dev Kits for various chips/processors.  With the introduction of the PC, and the many thousands of beige box clones that followed, people themselves began buying processors and building computers for themselves at a much greater pace then before.  There was many companies making compatible processors at the time so packaging helped set them apart.  This began with upgrade products, math coprocessors for the 808x, 286 and 386 were the most common (by Intel, AMD, IIT, ULSI. Cyrix and more), but eventually processors themselves started getting the NIB treatment, Intel made OverDrive processors (still technically an upgrade product) for the 486. followed by actual Pentium CPUs in the retail box. By the late 1990’s everything from Celerons to Xeon server processors could be had in Retail box.  Buying a retail boxed Xeon for your rackmount server seems like an odd thing to do, but apparently Intel figured it would need to be done.

Quad AMD Opteron 6128s in Retail Box

Quad AMD Opteron 6128s in Retail Box

Other companies such as AMD, Cyrix and VIA made NIB processors but they are much less common, and in a lot of ways more interesting.  AMD made retail Durons, Athlons, and Opterons, and in one of the most unusual things I have seen for a NIB, an actual 4-pack of Opteron 6128s (pictured). The Opteron 6128 is a 8 core Magny-Cours server processor introduced in 2009 and cost $266 each at that time.  This NIB set is dated late 2011, so would probably be a bit cheaper, but still $800 or so, and the large SWATX motherboards needed to run 4 socket G34 processors require somewhat special cases and PSU’s, but at least you can have  a half terabyte of RAM.  Inside the retail box is 4 smaller boxes, each containing an Opteron 6128 CPU, installation instructions, warranty info, and a case badge (you get 4 total case badges).  It seems this packaging was designed to support different configurations (probable a single Opteron 6128, and duals).

]]> 0
Tiered up for 3D-FPGAs: The Story of the Tier Logic FPGA-ASIC Thu, 18 Apr 2019 19:56:20 +0000

100K LUT Tier Logic FPGA TL1F100 on the left and TL1A100 ASIC on the right

This is the CPU Shack Museum, but occasionally I find a chip thats not really a CPU but is of such interest that I keep it, especially if its novel and relatively unknown.  So today we have a bit of the story of Tier Logic.  Tier Logic set out to make FPGA (Field Programmable Gate Arrays) better, and to make the transition (or choice) between them and ASICs (Application Specific Integrated Circuit) easier.

FPGA’s are great for smaller product runs, they are configurable, and relatively easy to reprogram, designs can easily be updated/tested with no additional cost.  FPGA’s however are large in terms of die area, power budgets, and cost per chip.  ASIC’s on the other hand, take longer to develop (re-spinning silicon every time an error is found) and have a much larger upfront cost, as well as an entirely different tool chain to design with. They are however smaller, use less power, and once the design is finalized, the per unit cost is very low.  This presents a dilemma in design, which should one choose for a project?  What if you didn’t have to choose? What if you could have the flexibility of an FPGA, and the benefits of an ASIC all at once?

It is exactly this that Tier Logic set out to do.  Tier Logic was founded by FPGA process-technology pioneer Raminda Madurawe (from Altera) in 2003 and was led by Doug Laird, a founder of Transmeta (famous for the Crusoe VLIW processors).  For 7 years they worked to design a solution, working in what is known as ‘stealth mode.’  Stealth mode is a way for companies to work quietly, with little to know PR, until they have a product ready to release.  Often the company exists but is completely unknown to outsiders.  This has some definite benefits, there is no constant barrage of having to answer/report to the media and others, and their is less risk of someone seeing what you are doing and trying to beat you to market to it.  Seven years, however, is a very long time to be in stealth mode, and the reason for this is Tier Logic not only was inventing a new style of FPGA/ASIC, they had to develop a new silicon process to make it work.

Cross section showing TFT layer (180nm) on top of 9 90nm CMOS layers with Cu-interconnects)

In most traditional FPGAs SRAM is used to store configuration information (essentially Look up Tables that determine where all the gates are connected/routed).  These SRAM cells are rather large on a standard CMOS process, and are integrated amongst all the logic on the die.  This results in larger dies, larger power consumption, and more expense. Tier Logic wanted to seperate the SRAM from the rest of the design, allowing all the logic to be much more compact.  The process they developed (with Toshiba, their fab partner) was to use a standard 90nm CMOS 9 Metal layer process, and use 8 of those layers for the FPGAs logic, on top of this they added a layer of TFT (thin Film Transistors) based SRAM cells (the 9th layer as then used to connect these to the logic below).  Each TFT SRAM bit uses 9 transistors, and the Tier Logic design used 230 million of them (to give around 26Mb of SRAM) to support is 100k LUT design.  TFT SRAM isn’t anything new, but integrating it onto a CMOS process was.  TFT SRAM is usually made at over 400C in manufacturing, which is more than  standard CMOS with Cu interconnected can handle without degradation, so Toshiba found a way around this, allowing the TFT transistors to be built on the same die with no ill-effects. The TFT layer was done on a 180nm size (twice as large as the underlying metal layers) using technology from Toshiba’s 65nm node.  This made for a FPGA that was much smaller, and had much smaller power budgets then a traditional FPGA as well as being much faster, 1.8-3.5 times faster than a traditional FPGA on the same process.  That’s 1-2 process nodes of improvement for free.  Tier Logic wasn’t done however, having the SRAM completely separate (top layer, thus being a 3-D part) allowed them to do something quite remarkable.

Only one layer has to change to switch from FPGA to ASIC

If a customer wanted to convert their FPGA design to an ASIC, the TFT SRAM layer is omitted and the 9th layer becomes a MaskROM layer, essentially hard coding what would have been stored in the SRAM cells.  Timing and logic remains identical, and it could be done for less then $50,000 and 4 weeks time.  Customers could also develop an ASIC using the FPGA version, and FPGA tool chains (specifically Mentor tools) which are industry standard, and then when they are ready for the final version, have the ASIC created.   Tier Logic held wafers at Metal Layer 8, so that they could be finished as an FPGA or ASIC without having to fab/test all the logic for every device.  For large orders (over $100,000) Tier Logic even offers to custom package the parts to match the pin-out of your existing FPGA design (whether it was a Xilinx, Altera, or other FPGA part).

Tier Logic had plans for larger and faster designs as well, including integrating switch fabric into the 3-D mix.

In March of 2010 Tier Logic exited stealth mode, and with early orders in hand, went public with their technology and products.  They needed funding of $20 million to ramp up the business and begin the 3D-FPGA revolution, but for reasons that are still unclear, that funding never came, and on July 16th of that year, only 5 months after exiting stealth mode, Tier Logic was closed.  Its possible this was a result of having been unknown for so long, one wonders if they would have been more successful if they had been public earlier.  Its also possible that the existing FPGA companies were not a fan of what Tier Logics technology could have caused in the market and pressured customers/investors away form them, but we likely will never know. Perhaps most tragically is the technology was never marketed.  The relevant patents ended up in the hands of Callahan Cellular LLC, a shell company of Intellectual Ventures, which makes the majority of its money not from making actual products, but by licensing and litigation the vast portfolio of patents they have acquired. They are the 5th largest patent holder in the US and often referred to as a ‘Patent Troll’ due to their techniques for extracting revenue by litigation.

FOr a company that exited in the public eye for but a fleeting moment, it is nice that the memory and history can be preserved in having a few of the only samples made of their technology.  Truly it was top tier engineering.

]]> 3