Section One: Before the Great Dark Cloud.
The 4004 had 46 instructions, using only 2,300 transistors in a 16-pin DIP. It ran at a clock rate of 740kHz (eight clock cycles per CPU cycle of 10.8 microseconds) - the original goal was 1MHz, to allow it to compute BCD arithmetic as fast (per digit) as a 1960's era IBM 1620.
The 4040 (1972) was an enhanced version of the 4004, adding 14 instructions, larger (8 level) stack, 8K program space, and interrupt abilities (including shadows of the first 8 registers). Should Pioneer 10 and Pioneer 11 ever be found by an extraterrestrial species, the 4004 will represent an example of Earth's technology.
[for additional information, see Appendix E]
. . Some have suggested that the MP944 digital processor used for the F-14 Tomcat aircraft of the U.S Navy qualifies as the "first microprocessor". Although interesting, it was not a single-chip processor, and was not general purpose - it was more like a set of parallel building blocks you could use to make a special purpose digital signal processor from (in the form of one or more data pipelines in parallel). It's only included here because at least two people asked me about it.
It was bit serial to reduce connections between chips, with highly parallel design and high clock rate to compensate. Words were 20 bits (required by the precision of the sensor and control values) and ALU units could perform operations on input bits as they were read in, while bits of the previous result was read out. "Steering Logic" (SL) units switched input signals to output lines (and added or subtracted, if two inputs went to one output), which could be directed to multiplication, division, and special logic units (which acted a little like a Transfer Triggered Architecture). Bits read serially from the ROMS (eight banks with 128 20-bit words, each with its own program counter) directed the data movement and unit operations, but had to be synchronized with data movement making programming difficult (basically microcode). RAM (called Random Access Storage) consisted of units with sixteen 20-bit words. Programming consisted of using the SLs to direct instruction and data words to the function units, which could be hooked to other function units in a pipeline, along with other pipelines in parallel. A separate set of eight ROMs could be used for data.
It took until 1998 to declassify a paper on the 1970 design. Although impressively elegant, it probably didn't warrant that length of secrecy.
Part II: TMS 1000, First microcontroller (1974) .Texas Instruments followed the Intel 4004/4040 closely with the 4-bit TMS 1000, which was the first microprocessor to include enough RAM, and space for a program ROM, and I/O support on a single chip to allow it to operate without multiple external support chips, making it the first microcontroller. It also featured an innovative feature to add custom instructions to the CPU.
It included a 4-bit accumulator, 4-bit Y register and 2 or 3-bit X register, which combined to create a 6 or 7 bit index register for the 64 or 128 nybbles of on chip RAM. A 1-bit status register was used for various purposes in different contexts. The 6-bit PC combined with a 4 bit page register and an optional 1 bit bank ('chapter') register to produce 10 or 11 address bits to 1KB or 2KB of on-chip program ROM. There was also a 6-bit subroutine return register and 4-bit page buffer, used as the destination on a branch, or exchanged with the PC and page registers for a subroutine (amounting to a 1-element stack, branches could not be performed within a subroutine).
An interesting feature of the PC is it was incremented using a feedback shift register, not a counter, so instructions were not consecutive in memory, but since all memory was internal, this was not a problem. Instructions were 8 bits with twelve hardwired, and with a 31X16 element PLA allowing 31 custom microprogrammed instructions. All hardwired instructions were single cycle, and no interrupts were allowed.
It gained fame in the movie "ET: The Extraterrestrial" as the brains in the Texas Instruments "Speak and Spell" educational toy.
4040). While the 8008 had 14 bit PC and addressing, the 8080 had a 16 bit address bus and an 8 bit data bus. Internally it had seven 8 bit registers (A-E, H, L - pairs BC, DE and HL could be combined as 16 bit registers), a 16 bit stack pointer to memory which replaced the 8 level internal stack of the 8008, and a 16 bit program counter. It also had several I/O ports - 256 of them, so I/O devices could be hooked up without taking away or interfering with the addressing space, and a signal pin that allowed the stack to occupy a separate bank of memory.
The 8080 was used in the Altair 8800, the first widely-known personal computer (though the definition of 'first PC' is fuzzy. Some claim that the 12-bit LINC (Laboratory INstruments Computer) was the first 'personal computer'. Developed at MIT (Lincoln Labs) in 1963 using DEC components, it inspired DEC to design its own PDP-8 in 1965, also considered an early 'personal computer'). 'Home computer' would probably be a better term here, though).
Intel updated the design with the 8085 (1976), which added two instructions to enable/disable three added interrupt pins (and the serial I/O pins), and simplified hardware by only using +5V power, and adding clock generator and bus controller circuits on-chip.
8080 (designed by ex-Intel engineers), and it was - vastly improved. It also used 8 bit data and 16 bit addressing, and could execute all of the 8080 (but not 8085) op codes, but included 80 more, instructions (1, 4, 8 and 16 bit operations and even block move and block I/O). The register set was doubled, with two banks of data registers (including A and F) that could be switched between. This allowed fast operating system or interrupt context switches. The Z-80 also added two index registers (IX and IY) and 2 types of relocatable vectored interrupts (direct or via the 8-bit I register).
Clock speeds ranged from the original Z-80 2.5MHz to the Z80-H (later called Z80-C) at 8MHz, and later a CMOS version at 10MHz.
Like many processors (including the 8085), the Z-80 featured many undocumented instructions. In some cases, they were a by-product of early designs (which did not trap invalid op codes, but tried to interpret them as best they could), and in other cases chip area near the edge was used for added instructions, but fabrication made the failure rate high. Instructions that often failed were just not documented, increasing chip yield. Later fabrication made these more reliable.
But the thing that really made the Z-80 popular in designs was the memory interface - the CPU generated its own RAM refresh signals, which meant easier design and lower system cost, the deciding factor in its selection for the TRS-80 Model 1. That and its 8080 compatibility, and CP/M, the first standard microprocessor operating system, made it the first choice of many systems.
Embedded varients of the Z-80 were also produced. Hitachi produced the 64180 (1984) with added components (two 16 bit timers, two DMA controllers, three serial ports, and a segmented MMU mapping a 20 bit (1M) address space to any three variable sized segments in the 16 bit (64K) Z-80 memory map), a design Zilog and Hitachi later refined to produce the Z-180 and HD64180Z (1987?) which were compatible with Z-80 peripheral chips, plus variants (Z-181, Z-182). The Z-280 was a 16 bit version introduced about July, 1987 (loosely based on the ill-fated Z-800), with a paged (like Z-180) 24 bit (16M) MMU (8 or 16 bit bus resizing), user/supervisor modes and features for multitasking, a 256 byte (4-way) cache, 4 channel DMA, and a huge number of new op codes tacked on (total of almost 3,500, including previously undocumented Z-80 instructions), though the size made some very slow. Internal clock could be run at twice the external clock (ex. 16MHz CPU with a 8MHz bus), and additional on-chip components were available. A 16/32 bit Z-380 version also exists (1994) with added 32-bit linear addressing mode (16-bit mode is Z-80 and Z-180 binary compatible, but not Z-280 compatible).
Rabbit Semiconductor's Rabbit 2000 (1999/2000?) with a Z-80 derived instruction set which drops some instructions (mostly I/O, some less useful instructions), and adds others (16-bit data, computed address). It also drops dynamic RAM support, because embedded systems more often use stgatic RAM, and adds serial, parallel, and inter-processor communication units. Program space is extended to 20 bits using an 8-bit page register, rather than the Z-180's MMU.
The Z-8 (1979) was an embedded processor with on-chip RAM (actually a set of 124 general and 20 special purpose registers) and ROM (often a BASIC interpreter), and is available in a variety of custom configurations up to 20MHz. Not actually related to the Z-80.
8080, Motorola introduced the 6800. Some of the designers (notably Chuck Peddle) left to start MOS Technologies (later bought by Commodore), which introduced the 650x series which included the 6501 (pin compatible with the 6800, taken off the market almost immediately for legal reasons) and the 6502 (used in early Commodores, Apples and Ataris). Like the 6800 series, varients were produced which added features like I/O ports (6510 in the Commodore 64) or reduced costs with smaller address buses (6507 13-bit 8K address bus in the Atari 2600). The 650x was little endian (lower address byte could be added to an index register while higher byte was fetched) and had a completely different instruction set from the big endian 6800. Apple designer Steve Wozniak described it as the first chip you could get for less than a hundred dollars (actually a quarter of the 6800 price) - it became the CPU of choice for many early home computers (8 bit Commodore and Atari products).
Unlike the 8080 and its kind, the 6502 (and 6800) had very few registers. It was an 8 bit processor, with 16 bit address bus. Inside was one 8 bit data register, two 8 bit index registers, and an 8 bit stack pointer (stack was preset from address 256 ($100 hex) to 511 ($1FF)). It used these index and stack registers effectively, with more addressing modes, including a fast zero-page mode that accessed memory addresses from address 0 to 255 ($FF) with an 8-bit address that speeded operations (it didn't have to fetch a second byte for the address).
Back when the 6502 was introduced, RAM was actually faster than microprocessors, so it made sense to optimize for RAM access rather than increase the number of registers on a chip. It also had a lower gate count (and cost) than its competitors.
The 650x also had undocumented instructions, including JAM, which simply causes the CPU to freeze, requiring a hardware reset or power cycle to restart.
The CMOS 65C02/65C02S fixed some original 6502 design flaws, and the 65816 (officially W65C816S, both designed by Bill Mensch of Western Design Center Inc.) extended the 650x to 16 bits internally, including index and stack registers, with a 16-bit direct page register (similar to the 6809), and 24-bit address bus (16 bit registers plus 8 bit data/program bank registers). It included an 8-bit emulation mode. Microcontroller versions of both exist, and a 32-bit version (the 65832) is planned. Various licensed versions are supplied by GTE (16 bit G65SC802 (pin compatible with 6502), and G65SC816 (support for VM, I/D cache, and multiprocessing)) and Rockwell (R65C40), and Mitsubishi has a redesigned compatible version. The 6502 remains surprisingly popular largely because of the variety of sources and support for it.
The 6502-based Apple II line (not backwards compatible with the Apple I) was among the first microcomputers introduced and became the longest running PC line, eventually including the 65816-based Apple IIgs The 6502 was also used in the Nintendo entertainment system (NES), and the 65816 was in the 16-bit successor, the Super NES, before Nintendo switched to MIPS embedded processors.
6502, the 6809 was based on the Motorola 6800 (August 1974), though the 6809 expanded the design significantly. The 6809 had two 8 bit accumulators (A & B) and could combine them into a single 16 bit register (D). It also featured two index registers (X & Y) and two stack pointers (S & U), which allowed for some very advanced addressing modes (The 6800 had A & B (and D) accumulators, one index register and one stack register). The 6809 was source compatible with the 6800, even though the 6800 had 78 instructions and the 6809 only had around 59. Some instructions were replaced by more general ones which the assembler would translate, and some were even replaced by addressing modes. While the 6800 and 6502 both had a fast 8 bit mode to address the first 256 bytes of RAM, the 6809 had an 8 bit Direct Page register to locate this fast address page anywhere in the 64K address space.
Other features were one of the first multiplication instructions of the time, 16 bit arithmetic, and a special fast interrupt. But it was also highly optimized, gaining up to five times the speed of the 6800 series CPU. Like the 6800, it included the undocumented HCF (Halt Catch Fire) instruction to incrementally strobe the address lines for bus testing ("jump to accumulator (A or B)" in the 6800, implemented and documented as $00 in the 68HC11 which is described below).
The 6800 and 6809, like the 6502 series, used a single clock cycle to generate the timing for four internal execution stages by using the rising and falling edges of the base cycle (not just rising edges), and another clock 90 degrees out of phase (giving two rising and two falling edges per cycle) - this allowed instructions to execute in one external 'cycle' rather than four for most CPUs, such as the 8080, which used the external clock directly, so an equivalent instruction would take four cycles, meaning a 2MHz 6809 would be roughly equivalent to a 8MHz 8080. This is different from clock-doubling, which uses a phase-locked-loop to generate a faster internal clock (for the CPU) which is synchronised with an external clock (for the bus). Motorola later produced CPUs in this line with a standard four-cycle clock. The 680x and 650x only accessed memory every other cycle, allowing a peripheral (such as video, or even a second cpu) to access the same memory without conflict.
The 6800 lived on as well, becoming the 6801/3, which included ROM, some RAM, a serial I/O port, and other goodies on the chip (as an embedded controller, minimizing part counts - but expensive at 35,000 transistors. The 6805 was a cheaper 6801/3, dropping seldom used instructions and features). Later the 68HC11 version (two 8 bit/one 16 bit data register, two 16 bit index, and one 16 bit stack register, and an expanded instruction set with 16 bit multiply operations) was extended to 16 bits as the 68HC16 (additional 16-bit accumulator E, three index registers IX, IY, IZ, plus extension registers to add 4 bits to addresses and accumulator E for a 1M address space, plus 16-bit multiply registers HR and IR and 36-bit AM accumulator), and a lower cost 16 bit 68HC12 (May 1996). It remains a popular embedded processor (with over 2 billion 6800 varients sold), and radiation hardened versions of the 68HC11 have been used in communications satellites. But the 6809 was a very fast and flexible chip for its time, particularly with the addition of the OS-9 operating system.
Of course, I'm a 6809 fan myself...
As a note, Hitachi produced a version called the 6309. Compatible with the 6809, it added 2 new 8-bit registers (E and F) that could be combined to form a second 16 bit register (W), and all four 8-bit registers could form a 32 bit register (Q). It also featured hardware division, and some 32 bit arithmetic, a zero register (always 0 on read), block move, and was generally 30% faster in native mode. Also, unlike the 6809, the 6309 could trap on an illegal instruction. These enhancements, surprisingly, never appeared in official Hitachi documentation.
The Am2901, from Advanced Micro Devices, was a popular 4-bit-slice processor. It featured sixteen 4-bit registers and a 4-bit ALU, and operation signals to allow carry/borrow or shift operations and such to operate across any number of other 2901s. An address sequencer (such as the 2910) could provide control signals with the use of custom microcode in ROM.
The Am2903 featured hardware multiply.
Legend holds that some Soviet clones of the PDP-11 were assembled from Soviet clones of the Am2901.
Since it doesn't fit anywhere else in this list, I'll mention it here...
AMD also produced what is probably the first floating point "coprocessor" for microprocessors, the AMD 9511 "arithmetic circuit" (1979), which performed 32 bit (23 + 7 bit floating point) RPN-style operations (4 element stack) under CPU control - the 64-bit 9512 (1980) lacked the transcendental functions. It was based on a 16-bit ALU, performed add, subtract, multiply, and divide (plus sine and cosine), and while faster than software on microprocessors of the time (about 4X speedup over a 4MHz Z-80), it was much slower (at 200+ cycles for 32*32->32 bit multiply) than more modern math coprocessors are.
It was used in some CP/M (Z-80) systems (I heard
it was used on an S-100 bus math card for NorthStar systems, but that
was in fact used a 74181 BCD (Binary Coded Decimal) ALU, and ten PROM
chips for microcode). Calculator circuits (such as the National Semiconductor
MM57109 (1980), actually a 4-bit NS COP400 processor with floating
point routines in ROM) were also sometimes used, with emulated keypresses
sent to it and results read back, to simplify programming rather than
Fairchild F8, the Intel 8048 was also designed as a microcontroller rather than a microprocessor - low cost and small size was the main goal. For this reason, data is stored on-chip, while program code is external (a true Harvard architecture, although program and data use the same address lines). The 8048 was eventually replaced by the very popular but bizarre 8051 and 8052, available with on-chip program ROMs (the 8031 version still used external ROMs).
While the 8048 used 1-byte instructions, the 8051 has a more flexible 2-byte instruction set. It has eight 8-bit registers, plus an accumulator A. Data space is 128 bytes accessed directly or indirectly by a register, plus another 128 above that in the 8052 which can only be accessed indirectly (usually for a stack). External memory occupies the same address space, and can be accessed directly (in a 256 byte page via I/O ports) or through the 16 bit DPTR address register much like in the RCA 1802. Direct data above location 32 is bit-addressable. Data and program memory share the address space (and address lines, when using external memory). Although complicated, these memory models allow flexibility in embeded designs, making the 8051 very popular (over 1 billion sold since 1988).
The Siemens 80C517 adds a math coprocessor to the CPU which provides 16 and 32 bit integer support plus basic floating point assistance (32 bit normalise and shift), reminiscent of the old AMD 9511. The Texas Instruments TMS370 is similar to the 8051, Adding a B accumulator and some 16 bit support.
Harvard Architecture) for a Defense Department project, but was beaten by a simpler (and more reliable at the time) single memory design from Princeton. Harvard Architecture was first used in the Signetics 8x300, and was adapted by General Instruments for use as a peripheral interface controller (PIC) which was designed to compensate for poor I/O in its 16 bit CP1600 CPU. The microelectronics division was eventually spun off into Arizona Microchip Technology (around 1985), with the PIC as its main product.
The PIC has a large register set (from 25 to 192 8-bit registers, compared to the Z-8's 144). There are up to 31 direct registers, plus an accumulator W, though R1 to R8 also have special functions - R2 is the PC (with implicit stack (2 to 16 level)), and R5 to R8 control I/O ports. R0 is mapped to the register R4 (FSR) points to (similar to the ISAR in the F8, it's the only way to access R32 or above).
The 16x is very simple and RISC-like (but less so than the RCA 1802 or the more recent Atmel AVR microcontroller. It has only 33 fixed length 12-bit instructions, including several with a skip-on-condition flag to skip the next instruction (for loops and conditional branches), producing tight code important in embedded applications. It's marginally pipelined (2 stages - fetch and execute) - combined with single cycle execution (except for branches - 2 cycles), performance is very good for its processor catagory.
The 17x has more addressing modes (direct, indirect, and relative - indirect mode instructions take 2 execution cycles), more instructions (58 16-bit), more registers (232 to 454), plus up to 64K-word program space (2K to 8K on chip). The high end versions also have single cycle 8-bit unsigned multiply instructions.
The PIC 16x is an interesting look at an 8 bit design made with slightly newer design techniques than other 8 bit CPUs in this list - around 1978 by General Instruments (the 1650, a successor to the more general 1600). It lost out to more popular CPUs and was later sold to Microchip Technology, which still sells it for small embedded applications. An example of this microprocessor is a small PC board called the BASIC Stamp, consisting of 2 ICs - an 18-pin PIC 16C56 CPU (with a BASIC interpreter in 512 word ROM (yes, 512)) and 8-pin 256 byte serial EEPROM (also made by Microchip) on an I/O port where user programs (about 80 tokenized lines of BASIC) are stored.
Part X: Atmel AVR - RISC ridiculously small (June 1997) .There's not much to say about the 8-bit Atmel AVR microcontroller, an attempt to bring RISC design down to 8-bit levels. It's a canonical simple load-store design - 16-bit instructions, 2-stage pipeline, thirty-two 8-bit data registers (six usable as three 16-bit X, Y, and Z address registers), load/store architecture (plus data/subroutine stack).
Table of Contents