DEC Alpha

	DEC Alpha registers
	General-purpose registers ;
R0	R0
R1	R1
R2	R2
•
•
•
R29	R29
R30	R30
R31 (zero)	R31, always zero
	Floating-point registers ;
F0	F0
F1	F1
F2	F2
•
•
•
F29	F29
F30	F30
F31 (zero)	F31, always zero
	Program counter
	Control registers ;
LR0	Lock Register 0
LR1	Lock Register 1
FPCR	FP Control Register

Alpha
Designer	Digital Equipment Corporation
Bits	64-bit
Introduced	1992
Design	RISC
Type	Register-Register
Encoding	Fixed
Endianness	Bi
Extensions	Byte/Word Extension (BWX), Square-root and Floating-point Convert Extension (FIX), Count Extension (CIX), Motion Video Instructions (MVI)
Open	Yes
Registers
General-purpose	31 plus always-zero R31
Floating point	31 plus always-0.0 F31

Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computing (RISC) instruction set developed by Digital Equipment Corporation (DEC), designed to replace their 32-bit VAX complex instruction set computer (CISC) instruction set. Alpha was implemented in microprocessors originally developed and fabricated by DEC. These microprocessors were most prominently used in a variety of DEC workstations and servers, which eventually formed the basis for almost all of their mid-to-upper-scale lineup. Several third-party vendors also produced Alpha systems, including PC form factor motherboards.

Operating systems that supported Alpha included OpenVMS (previously known as OpenVMS AXP), Tru64 UNIX (previously known as DEC OSF/1 AXP and Digital UNIX), Windows NT (discontinued after 4.0 SP6), and Windows 2000 RC1),^[2] Linux (Debian GNU/Linux, SUSE Linux,^[3] Gentoo Linux and Red Hat Linux), BSD UNIX (NetBSD, OpenBSD and FreeBSD up to 6.x), Plan 9 from Bell Labs, as well as the L4Ka::Pistachio kernel.

The Alpha architecture was sold, along with most parts of DEC, to Compaq in 1998. Compaq, already an Intel customer, decided to phase out Alpha in favor of the forthcoming Hewlett-Packard/Intel Itanium architecture, and sold all Alpha intellectual property to Intel in 2001, effectively killing the product. Hewlett-Packard purchased Compaq later that same year, continuing development of the existing product line until 2004, and promising to continue selling Alpha-based systems, largely to the existing customer base, until October 2006 (later extended to April 2007).^[4]

History

PRISM

Alpha was born out of an earlier RISC project named PRISM, itself the final product of several earlier projects. PRISM was intended to be a flexible design, supporting both Unix-like applications, as well as Digital's existing VMS programs from the VAX after minor conversion. A new Unix-like^{[citation needed]} operating system known as Mica would run applications natively, supporting VMS under emulation running at the same time.

During development, the Palo Alto design team were working on a Unix-only workstation that originally included the PRISM. However, development of the workstation was well ahead of the PRISM, and the engineers proposed that they release the machines using the MIPS R2000 processor instead, moving its release date up considerably. DEC management doubted the need to produce a new computer architecture to replace their existing VAX and DECstation lines, and eventually ended the PRISM project in 1988.

By the time of cancellation, however, second-generation RISC chips (such as the newer SPARC architecture) were offering much better price/performance ratios than the VAX lineup. It was clear a third generation would completely outperform the VAX in all ways, not just on cost.

Alpha

Another study was started to see if a new RISC architecture could be defined that could directly support the VMS operating system. The new design used most of the basic PRISM concepts, but was re-tuned to allow VMS and VMS programs to run at reasonable speed with no conversion at all. The decision was also made to upgrade the design to a full 64-bit implementation from PRISM's 32-bit, a conversion all of the major RISC vendors were undertaking. Eventually that new architecture became Alpha. The primary Alpha instruction set architects were Richard L. Sites and Richard T. Witek. The PRISM's Epicode was developed into the Alpha's PALcode, providing an abstracted interface to platform- and processor implementation-specific features.

The main contribution of Alpha to the microprocessor industry, and the main reason for its performance, was not so much the architecture but rather its implementation.^{[citation needed]} At that time (as it is now), the microchip industry was dominated by automated design and layout tools. The chip designers at Digital continued pursuing sophisticated manual circuit design in order to deal with the overly complex VAX architecture. The Alpha chips showed that manual circuit design applied to a simpler, cleaner architecture allowed for much higher operating frequencies than those that were possible with the more automated design systems. These chips caused a renaissance of custom circuit design within the microprocessor design community.

Originally, the Alpha processors were designated the DECchip 21x64 series, with "DECchip" replaced in the mid-1990s with "Alpha". The first two digits, "21" signifies the 21st century, and the last two digits, "64" signifies 64 bits. The Alpha was designed as 64-bit from the start and there is no 32-bit version. The middle digit corresponded to the generation of the Alpha architecture. Internally, Alpha processors were also identified by EV numbers, EV officially standing for "Extended VAX" but having an alternative humorous meaning of "Electric Vlasic", giving homage to the Electric Pickle experiment at Western Research Lab.^[5]

Improved models

The first few generations of the Alpha chips were some of the most innovative of their time. The first version, the Alpha 21064 or EV4, was the first CMOS microprocessor whose operating frequency rivalled higher-powered ECL minicomputers and mainframes. The second, 21164 or EV5, was the first microprocessor to place a large secondary cache on chip. The third, 21264 or EV6, was the first microprocessor to combine both high operating frequency and the more complicated out-of-order execution microarchitecture. The 21364 or EV7 was the first high performance processor to have an on-chip memory controller. The unproduced 21464 or EV8 would have been the first to include simultaneous multithreading, but this version was canceled after the sale of DEC to Compaq. The Tarantula research project, which most likely would have been called EV9, would have been the first Alpha processor to feature a vector unit.^[6]

A persistent report attributed to DEC insiders suggests the choice of the AXP tag for the processor was made by DEC's legal department, which was still smarting from the VAX trademark fiasco.^{[citation needed]} After a lengthy search the tag "AXP" was found to be entirely unencumbered. Within the computer industry, a joke got started that the acronym AXP meant "Almost eXactly PRISM".

Design principles

The Alpha architecture was intended to be a high-performance design. Digital intended the architecture to support a one-thousandfold increase in performance over twenty-five years. To ensure this, any architectural feature that impeded multiple instruction issue, clock rate or multiprocessing was removed. As a result, the Alpha does not have:

Branch delay slots
Suppressed instructions
Byte load or store instructions (later added with the Byte Word Extensions (BWX))

Condition codes

The Alpha does not have condition codes for integer instructions to remove a potential bottleneck at the condition status register. Instructions resulting in an overflow, such as adding two numbers whose result does not fit in 64 bits, write the 32 or 64 least significant bits to the destination register. The carry is generated by performing an unsigned compare on the result with either operand to see if the result is smaller than either operand. If the test was true, the value one is written to the least significant bit of the destination register to indicate the condition.

Registers

The architecture defined a set of 32 integer registers and a set of 32 floating-point registers in addition to a program counter, two lock registers and a floating-point control register (FPCR). It also defined registers that were optional, implemented only if the implementation required them. Lastly, registers for PALcode were defined.

The integer registers were denoted by R0 to R31 and floating-point registers were denoted by F0 to F31. The R31 and F31 registers were hardwired to zero and writes to those registers by instructions are ignored. Digital considered using a combined register file, but a split register file was determined to be better as it enabled two-chip implementations to have a register file located on each chip and integer-only implementations to omit the floating-point register file containing the floating point registers. A split register file was also determined to be more suitable for multiple instruction issue due to the reduced number of read and write ports. The number of registers per register file was also considered, with 32 and 64 being contenders. Digital concluded that 32 registers was more suitable as it required less die space, which improved clock frequencies. This number of registers was deemed not to be a major issue in respect to performance and future growth, as thirty-two registers could support at least eight-way instruction issue.

The program counter is a 64-bit register which contains a longword-aligned virtual byte address, that is, the low two bits of the program counter are always zero. The PC is incremented by four to the address of the next instruction when an instruction is decoded. A lock flag and locked physical address register are used by the load-locked and store-conditional instructions for multiprocessor support. The floating-point control register (FPCR) is a 64-bit register defined by the architecture intended for use by Alpha implementations with IEEE 754-compliant floating-point hardware.

Data types

In the Alpha architecture, a byte was defined as an 8-bit datum (octet), a word as a 16-bit datum, a longword as a 32-bit datum, a quadword as a 64-bit datum, and an octaword as a 128-bit datum.

The Alpha architecture originally defined six data types:

Quadword (64-bit) integer
Longword (32-bit) integer
IEEE T-floating-point (double precision, 64-bit)
IEEE S-floating-point (single precision, 32-bit)

To maintain a level of compatibility with the VAX, the 32-bit architecture that preceded the Alpha, two other floating-point data types were included:

VAX G-floating point (double precision, 64-bit)
VAX F-floating point (single precision, 32-bit)

The Alpha had some provision for future expansion of the instruction set to include 128-bit data types.

Memory

The Alpha has a 64-bit linear virtual address space with no memory segmentation. Implementations can implement a smaller virtual address space with a minimum size of 43 bits. Although the unused bits were not implemented in hardware such as TLBs, the architecture required implementations to check whether they are zero to ensure software compatibility with implementations with a larger (or full) virtual address space.

Instruction formats

The Alpha ISA has a fixed instruction length of 32 bits. It has six instruction formats.

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	Type
Opcode						Ra					Rb					Unused			0	Function							Rc					Integer operate
Opcode						Ra					Literal								1	Function							Rc					Integer operate, literal
Opcode						Ra					Rb					Function											Rc					Floating-point operate
Opcode						Ra					Rb					Displacement																Memory format
Opcode						Ra					Displacement																					Branch format
Opcode						Function																										CALL_PAL format

The integer operate format is used by integer instructions. It contains a 6-bit opcode field, followed by the Ra field, which specifies the register containing the first operand and the Rb field, specifies the register containing the second operand. Next is a 3-bit field which is unused and reserved. A 1-bit field contains a "0", which distinguished this format from the integer literal format. A 7-bit function field follows, which is used in conjunction with the opcode to specify an operation. The last field is the Rc field, which specifies the register which the result of a computation should be written to. The register fields are all 5 bits long, required to address 32 unique locations, the 32 integer registers.

The integer literal format is used by integer instructions which use a literal as one of the operands. The format is the same as the integer operate format except for the replacement of the 5-bit Rb field and the 3 bits of unused space with an 8-bit literal field which is zero-extended to a 64-bit operand.

The floating-point operate format is used by floating-point instructions. It is similar to the integer operate format, but has an 11-bit function field made possible by using the literal and unused bits which are reserved in integer operate format.

The memory format is used mostly by load and store instructions. It has a 6-bit opcode field, a 5-bit Ra field, a 5-bit Rb field and a 16-bit displacement field.

Branch instructions have a 6-bit opcode field, a 5-bit Ra field and a 21-bit displacement field. The Ra field specifies a register to be tested by a conditional branch instruction, and if the condition is met, the program counter is updated by adding the contents of the displacement field with the program counter. The displacement field contains a signed integer and if the value of the integer is positive, if the branch is taken then the program counter is incremented. If the value of the integer is negative, then program counter is decremented if the branch is taken. The range of a branch thus is ±1 Mi instructions, or ±4 MiB. The Alpha Architecture was designed with a large range as part of the architecture's forward-looking goal.

The CALL_PAL format is used by the CALL_PAL instruction, which is used to call PALcode subroutines. The format retains the opcode field but replaces the others with a 26-bit function field, which contains an integer specifying a PAL subroutine.

Instruction set

Control instructions

The control instructions consist of conditional and unconditional branches, and jumps. The conditional and unconditional branch instructions use the branch instruction format, while the jump instructions use the memory instruction format.

Conditional branches test whether the least significant bit of a register is set or clear, or compare a register as a signed quadword to zero, and branch if the specified condition is true. The conditions available for comparing a register to zero are equality, inequality, less than, less than or equal to, greater than or equal to, and greater than. The new address is computed by longword aligning and sign extending the 21-bit displacement and adding it to the address of the instruction following the conditional branch.

Unconditional branches update the program counter with a new address computed in the same way as conditional branches. They also save the address of the instruction following the unconditional branch to a register. There are two such instructions, and they differ only in the hints provided for the branch prediction hardware.

There are four jump instructions. These all perform the same operation, saving the address of the instruction following the jump, and providing the program counter with a new address from a register. They differ in the hints provided to the branch prediction hardware. The unused displacement field is used for this purpose.

Integer arithmetic

The integer arithmetic instructions perform addition, multiplication, and subtraction on longwords and quadwords; and comparison on quadwords. There is no instruction(s) for division as the architects considered the implementation of division in hardware to be adverse to simplicity. In addition to the standard add and subtract instructions, there are scaled versions. These versions shift the second operand to the left by two or three bits before adding or subtracting. The Multiply Longword and Multiply Quadword instructions write the least significant 32 or 64 bits of a 64- or 128-bit result to the destination register, respectively. Since it is useful to obtain the most significant half, the Unsigned Multiply Quadword High (UMULH) instruction is provided. UMULH is used for implementing multi-precision arithmetic and division algorithms. The concept of a separate instruction for multiplication that returns the most significant half of a result was taken from PRISM.

The instructions that operate on longwords ignore the most significant half of the register and the 32-bit result is sign-extended before it is written to the destination register. By default, the add, multiply, and subtract instructions, with the exception of UMULH and scaled versions of add and subtract, do not trap on overflow. When such functionality is required, versions of these instructions that perform overflow detection and trap on overflow are provided.

The compare instructions compare two registers or a register and a literal and write '1' to the destination register if the specified condition is true or '0' if not. The conditions are equality, inequality, less than or equal to, and less than. With the exception of the instructions that specify the former two conditions, there are versions that perform signed and unsigned compares.

The integer arithmetic instructions use the integer operate instruction formats.

Logical and shift

The logical instructions consist of those for performing bitwise logical operations and conditional moves on the integer registers. The bitwise logical instructions perform AND, NAND, NOR, OR, XNOR, and XOR between two registers or a register and literal. The conditional move instructions test a register as a signed quadword to zero and move if the specified condition is true. The specified conditions are equality, inequality, less than or equal to, less than, greater than or equal to, and greater than. The shift instructions perform arithmetic right shift, and logical left and right shifts. The shift amount is given by a register or literal. Logical and shift instructions use the integer operate instruction formats.

Extensions

Byte-Word Extensions (BWX)

Later, the Alpha included byte-word extensions, a set of instructions to manipulate 8-bit and 16-bit data types. These instructions were first introduced in the 21164A (EV56) microprocessor and are present in all subsequent implementations. These instructions performed operations that previously required multiple instructions to implement, which improved code density and the performance of certain applications. BWX also made the emulation of x86 machine code and the writing of device drivers easier.^[7]

Mnemonic	Instruction
`LDBU`	Load Zero-Extended Byte from Memory to Register
`LDWU`	Load Zero-Extended Word from Memory to Register
`SEXTB`	Sign Extend Byte
`SEXTW`	Sign Extend Word
`STB`	Store Byte from Register to Memory
`STW`	Store Word from Register to Memory

Motion Video Instructions (MVI)

Motion Video Instructions (MVI) was an instruction set extension to the Alpha ISA that added instructions for single instruction, multiple data (SIMD) operations.^[8] Alpha implementations that implement MVI, in chronological order, are the Alpha 21164PC (PCA56 and PCA57), Alpha 21264 (EV6) and Alpha 21364 (EV7). Unlike most other SIMD instruction sets of the same period, such as MIPS' MDMX or SPARC's Visual Instruction Set, but like PA-RISC's Multimedia Acceleration eXtensions, MVI was a simple instruction set composed of a few instructions that operate on integer data types stored in existing integer registers.

MVI's simplicity was due to two reasons. Firstly, Digital had determined that the Alpha 21164 was already capable of performing DVD decoding through software, therefore not requiring hardware provisions for the purpose, but was inefficient in MPEG-2 encoding. The second reason was the requirement to retain the fast cycle times of implementations. Adding many instructions would have complicated and enlarged the instruction decode logic, reducing an implementation's clock frequency.

MVI consisted of 13 instructions:

Mnemonic	Instruction
`MAXSB8`	Vector Signed Byte Maximum
`MAXSW4`	Vector Signed Word Maximum
`MAXUB8`	Vector Unsigned Byte Maximum
`MAXUW4`	Vector Unsigned Word Maximum
`MINSB8`	Vector Signed Byte Minimum
`MINSW4`	Vector Signed Word Minimum
`MINUB8`	Vector Unsigned Byte Minimum
`MINUW4`	Vector Unsigned Word Minimum
`PERR`	Pixel Error
`PKLB`	Pack Longwords to Bytes
`PKWB`	Pack Words to Bytes
`UNPKBL`	Unpack Bytes to Longwords
`UNPKBW`	Unpack Bytes to Words

Floating-point Extensions (FIX)

Floating-point extensions (FIX) was an extension the Alpha Architecture. It introduced nine instructions for floating-point square-root and for transferring data to and from the integer registers and floating-point registers. The Alpha 21264 (EV6) was the first microprocessor to implement these instructions.

Mnemonic	Instruction
`FTOIS`	Floating-point to Integer Register Move, S_floating
`FTOIT`	Floating-point to Integer Register Move, T_floating
`ITOFF`	Integer to Floating-point Register Move, F_floating
`ITOFS`	Integer to Floating-point Register Move, S_floating
`ITOFT`	Integer to Floating-point Register Move, T_floating
`SQRTF`	Square root F_floating
`SQRTG`	Square root G_floating
`SQRTS`	Square root S_floating
`SQRTT`	Square root T_floating

Count Extensions (CIX)

Count Extensions (CIX) was an extension to the architecture which introduced three instructions for counting bits. These instructions were categorized as integer arithmetic instructions. They were first implemented on the Alpha 21264A (EV67).

Mnemonic	Instruction
`CTLZ`	Count Leading Zero
`CTPOP`	Count Population
`CTTZ`	Count Trailing Zero

Implementations

At the time of its announcement, Alpha was heralded as an architecture for the next 25 years. While this was not to be, Alpha has nevertheless had a reasonably long life. The first version, the Alpha 21064 (otherwise known as the EV4) was introduced in November 1992 running at up to 192 MHz; a slight shrink of the die (the EV4S, shrunk from 0.75 µm to 0.675 µm) ran at 200 MHz a few months later. The 64-bit processor was a superpipelined and superscalar design, like other RISC designs, but nevertheless outperformed them all and DEC touted it as the world's fastest processor. Careful attention to circuit design, a hallmark of the Hudson design team, like a huge centralized clock circuitry, allowed them to run the CPU at higher speeds, even though the microarchitecture was fairly similar to other RISC chips. In comparison, the less expensive Intel Pentium ran at 66 MHz when it was launched the following spring.

The Alpha 21164 or EV5 became available in 1995 at processor frequencies of up to 333 MHz. In July 1996 the line was speed bumped to 500 MHz, in March 1998 to 666 MHz. Also in 1998 the Alpha 21264 (EV6) was released at 450 MHz, eventually reaching (in 2001 with the 21264C/EV68CB) 1.25 GHz. In 2003, the Alpha 21364 or EV7 Marvel was launched, essentially an EV68 core with four 1.6 GB/s^[9] inter-processor communication links for improved multiprocessor system performance, running at 1 or 1.15 GHz.

In 1996, the production of Alpha chips was licensed to Samsung Electronics Company. Following the purchase of Digital by Compaq the majority of the Alpha products were placed with API NetWorks, Inc. (previously Alpha Processor Inc.), a private company funded by Samsung and Compaq. In October 2001, Microway became the exclusive sales and service provider of API NetWorks' Alpha-based product line.

On June 25, 2001, Compaq announced that Alpha would be phased out by 2004 in favor of Intel's Itanium, canceled the planned EV8 chip, and sold all Alpha intellectual property to Intel.^[10] HP, new owner of Compaq later the same year, announced that development of the Alpha series would continue for a few more years, including the release of a 1.3 GHz EV7 variant called the EV7z. This would be the final iteration of Alpha, the 0.13 µm EV79 also being canceled.

Alpha was also implemented in the Piranha, a research prototype developed by Compaq's Corporate Research and Nonstop Hardware Development groups at the Western Research Laboratory and Systems Research Center. Piranha was a multicore design for transaction processing workloads that contained eight simple cores. It was described at the 27th Annual International Symposium on Computer Architecture in June 2000.^[11]

Model history

Model	Model number	Year	Frequency [MHz]	Process [µm]	Transistors [millions]	Die size [mm²]	IO Pins	Power [W]	Voltage	Dcache [KB]^[12]	Icache [KB]	Scache	Bcache	ISA
EV4	21064	1992	100–200	0.75	1.68	234	290	30	3.3	8	8	–	128 KB–16 MB
EV4S	21064	1993	100–200	0.675	1.68	186	290	27	3.3	8	8	–	128 KB–16 MB
EV45	21064A	1994	200–300	0.5	2.85	164		33	3.3	16	16	–	256 KB–16 MB
LCA4	21066	1993	100–166	0.675	1.75	209		21	3.3	8	8	–
LCA4	21068	1994	66	0.675	1.75	209		9	3.3	8	8	–
LCA45	21066A	1994	100–266	0.5	1.8	161		23	3.3	8	8	–
LCA45	21068A	1994	100	0.5	1.8	161			3.3	8	8	–
EV5	21164	1995	266–500	0.5	9.3	299	296	56	3.3/2.5	8	8	96 KB	Up to 64 MB	R
EV56	21164A	1996	366–666^[1]	0.35	9.66^[1]	209		31–55^[1]	3.3/2.5^[1]	8	8	96 KB	Up to 64 MB	R,B
PCA56	21164PC	1997	400–533	0.35	3.5	141	264	26–35	3.3/2.5	8	16	–	512 KB–4 MB	R,B,M
PCA57	21164PC		600–666	0.28	5.7	101	283	18–23	2.5/2.0	16	32^[1]	–	512 KB–4 MB	R,B,M
EV6	21264	1998	450–600	0.35	15.2	314	389	73	2.0	64	64	–	2–8 MB	R,B,M,F
EV67	21264A	1999	600–750	0.25	15.2	210	389		2.0	64	64	–	2–8 MB	R,B,M,F,C
EV68AL	21264B	2001	800–833	0.18	15.2	125			1.7	64	64	–	2–8 MB	R,B,M,F,C,T
EV68CB	21264C	2001	1000–1250	0.18	15.2	125		65–75	1.65	64	64	–	2–8 MB	R,B,M,F,C,T
EV68CX	21264D								1.65	64	64	–	2–8 MB	R,B,M,F,C,T
EV7	21364	2003	1000–1150	0.18	130	397		125	1.5	64	64	1.75 MB	–	R,B,M,F,C,T
EV7z	21364	2004	1300	0.18	130	397		125	1.5	64	64	1.75 MB	–	R,B,M,F,C,T
Cancelled
EV78/EV79	21364A	Slated for 2004	1700	0.13	152	300		120	1.2	64	64	1.75 MB	–	R,B,M,F,C,T
EV8	21464	Slated for 2003	1200–2000	0.125	250	420	1800	??	1.2	64	64	3 MB	–	R,B,M,F,C,T
Model	Model number	Year	Frequency [MHz]	Process [µm]	Transistors [millions]	Die size [mm²]	IO Pins	Power [W]	Voltage	Dcache [KB]	Icache [KB]	Scache	Bcache	ISA

ISA extensions

R – Hardware support for rounding to infinity and negative infinity.^[13]
B – BWX, the "Byte/Word Extension", adding instructions to allow 8- and 16-bit operations from memory and I/O
M – MVI, "multimedia" instructions
F – FIX, instructions to move data between integer and floating point registers and for square root
C – CIX, instructions for counting and finding bits
T – support for prefetch with modify intent to improve the performance of the first attempt to acquire a lock

Performance

To illustrate the comparative performance of Alpha-based systems, some SPEC performance numbers (SPECint95, SPECfp95) are listed below. Note that the SPEC results claim to report the measured performance of a whole computer system (CPU, bus, memory, compiler optimizer), not just the CPU. Also note that the benchmark and scale changed from 1992 to 1995. However, the figures give a rough impression of the performance of the Alpha architecture (64-bit), compared with the contemporary HP (64-bit) and Intel-based offerings (32-bit). Perhaps the most obvious trend is that while Intel could always get reasonably close to Alpha in integer performance, in floating point performance the difference was considerable. On the other side, HP (PA-RISC) is also reasonably close to Alpha, but these CPUs are running at significantly lower clock rates (MHz). The tables lack two important values: the power consumption and the price of a CPU.

SPEC benchmark 1995 performance comparison (using *SPECint95* and *SPECfp95* results [1])
System	CPU	MHz	integer	floating point
AlphaServer 8400 5/350	21164 (EV5)	350	10.1	14.2
Intel Alder System (200 MHz, 256 KB L2)	Pentium Pro	200	8.9	6.75
HP 9000 C160	PA 8000	160	10.4	16.3

2000 performance comparison (using *SPECint95* and *SPECfp95* results)
System	CPU	MHz	integer	floating point
AlphaServer ES40 6/833	21264 (EV6)	833	50.0	100.0
Intel VC820 motherboard	Pentium III	1000	46.8	31.9
HP 9000 C3600	PA-8600	552	42.1	64.0

Alpha-based systems

The first generation of DEC Alpha-based systems comprised the DEC 3000 AXP series workstations and low-end servers, DEC 4000 AXP series mid-range servers, and DEC 7000 AXP and 10000 AXP series high-end servers. The DEC 3000 AXP systems used the same TURBOchannel bus as the previous MIPS-based DECstation models, whereas the 4000 was based on FutureBus+ and the 7000/10000 shared an architecture with corresponding VAX models.

DEC also produced a PC-like Alpha workstation with an EISA bus, the DECpc AXP 150 (codename "Jensen", also known as the DEC 2000 AXP). This was the first Alpha system to support Windows NT. DEC later produced Alpha versions of their Celebris XL and Digital Personal Workstation PC lines, with 21164 processors.

Digital also produced single board computers based on the VMEbus for embedded and industrial use. The first generation included the 21068-based AXPvme 64 and AXPvme 64LC, and the 21066-based AXPvme 160. These were introduced on March 1, 1994. Later models such as the AXPvme 100, AXPvme 166 and AXPvme 230 were based on the 21066A processor, while the Alpha VME 4/224 and Alpha VME 4/288 were based on the 21064A processor. The last models, the Alpha VME 5/352 and Alpha VME 5/480, were based on the 21164 processor.

The 21066 chip was used in the DEC Multia VX40/41/42 compact workstation and the ALPHAbook 1 laptop from Tadpole Technology.

In 1994, DEC launched a new range of AlphaStation and AlphaServer systems. These used 21064 or 21164 processors and introduced the PCI bus, VGA-compatible frame buffers and PS/2-style keyboards and mice. The AlphaServer 8000 series superseded the DEC 7000/10000 AXP and also employed XMI and FutureBus+ buses.

The AlphaStation XP1000 was the first workstation based on the 21264 processor. Later AlphaServer/Station models based on the 21264 were categorised into DS (departmental server), ES (enterprise server) or GS (global server) families.

The final 21364 chip was used in the AlphaServer ES47, ES80 and GS1280 models and the AlphaStation ES47.

A number of OEM motherboards were produced by DEC, such as the 21066 and 21068-based AXPpci 33 "NoName", which was part of a major push into the OEM market by the company,^[14] the 21164-based AlphaPC 164 and AlphaPC 164LX, the 21164PC-based AlphaPC 164SX and AlphaPC 164RX and the 21264-based AlphaPC 264DP. Several third parties such as Samsung and API also produced OEM motherboards such as the API UP1000 and UP2000.

To assist third parties in developing hardware and software for the platform, DEC produced Evaluation Boards, such as the EB64+ and EB164 for the Alpha 21064A and 21164 microprocessors respectively.

The 21164 and 21264 processors were used by NetApp in various Network Attached Storage systems, while the 21064 and 21164 processors were used by Cray in their T3D and T3E massively parallel supercomputers.

Supercomputers

The fastest supercomputers based on Alpha processors:

Sunway TaihuLight at Chinese National Supercomputing Center in Wuxi. Machine: Sunway TaihuLight MPP. CPU: 40,960 SW26010 (256+4 cores/CPU, 1.45 GHz). Rmax=93PFlops, Rpeak=125PFlops.
Sunway BlueLight at Chinese National Supercomputing Center in Jinan. Machine: Sunway BlueLight MPP. CPU: 8575 SW1600 (16 cores/CPU, 21164A EV-56, 975 MHz). Rmax=795.9TFlops, Rpeak=1070.2TFlops.^[15]

ASCI Q at Los Alamos National Laboratory. Machine: HP AlphaServer SC45/GS Cluster. CPU: 4096 Alpha (21264 EV-68, 1.25 GHz). Rmax: 7.727 Teraflops.^[16]

References

^ ^a ^b ^c ^d ^e ^f Paul V. Bolotoff (21 April 2007). "Alpha: The History in Facts and Comments". Retrieved Nov 22, 2008.
^ Aaron Sakovich (2001). "Windows 2000?". The AlphaNT Source. Retrieved 2007-01-01.
^ "SUSE Linux 7.0 Alpha Edition". SUSE. 2000. Retrieved 2014-01-08.
^ "Transforming your AlphaServer environment". HP. Retrieved 2007-01-11.
^ Bill Hamburgen; Jeff Mogul; Brian Reid; Alan Eustace; Richard Swan; Mary Jo Doherty; Joel Bartlett (1989). "WRL Technical Note TN-13: Characterization of Organic Illumination Systems" (PDF). Digital Equipment Corporation. Retrieved 2007-10-04. {{cite journal}}: Cite journal requires |journal= (help)
^ Roger Espasa; Federico Ardanaz; Julio Gago; Roger Gramunt; Isaac Hernandez; Toni Juan; Joel Emer; Stephen Felix; Geoff Lowney; Matthew Mattina; Andre Seznec (2002). "Tarantula: A Vector Extension to the Alpha Architecture" (PDF). In Danielle C. Martin (ed.). Proceedings: 29th Annual International Symposium on Computer Architecture (ISCA '02). 29th Annual International Symposium on Computer Architecture (ISCA '02). Joe Daigle/Studio Productions. Los Alamitos, Calif: IEEE Computer Society. pp. Page(s): 281–292. doi:10.1109/ISCA.2002.1003586. ISBN 0-7695-1605-X. Retrieved 2007-10-04. {{cite conference}}: External link in |conferenceurl= (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |conferenceurl= ignored (|conference-url= suggested) (help)
^ Gronowski, P. E.; Bowhill, W. J.; Donchin, D. R.; Blake-Campos, R. P.; Carlson, D. A.; Equi, E. R.; Loughlin, B. J.; Mehta, S.; Mueller, R. O.; Olesin, A.; Noorlag, D. J. W.; Preston, R. P. (1996). "A 433-MHz 64-b quad-issue RISC microprocessor". IEEE Journal of Solid-State Circuits. 31 (11): 1687–1696. doi:10.1109/JSSC.1996.542313.
^ Gwennap, Linley (18 November 1996). "Digital, MIPS Add Multimedia Extensions". Microprocessor Report.
^ In the context of data transfer, 1 GB is used to mean 1 billion bytes
^ Popovich, Ken (2001-06-28). "Alpha proved costly for Compaq". www.zdnet.com. ZDNet. Retrieved 2016-03-02.
^ Luiz André Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese (2000). "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing". Proceedings of the 27th Annual International Symposium on Computer Architecture.
^ In the context of cache memory, 1 KB = 1024 bytes; 1 MB = 1024 KB
^ David Mosberger. "Overview of Alpha Family". Retrieved Dec 9, 2009.
^ Reinhardt Krause. "DEC launching Alpha board push". Electronic News, April 4, 1994.
^ TOP500 (2011). "Sunway Blue Light - Sunway BlueLight MPP, ShenWei processor SW1600 975.00 MHz, Infiniband QDR". Retrieved 2012-09-15.{{cite web}}: CS1 maint: numeric names: authors list (link)
^ Los Alamos National Laboratories (2002). "The ASCI Q System: 30 TeraOPS Capability at Los Alamos National Laboratory" (PDF). Retrieved 2010-06-06.

External links

The Alpha Architecture Handbook, Version 4
The Alpha Architecture Handbook, Version 3
Digital Technical Journal, Volume 4, Number 4, Special Issue 1992 Alpha AXP Architecture and Systems This issue contains several articles from Alpha's Architects
Archived technical documentation library This link features the hardware reference manuals and datasheets for Alpha microprocessors, chipsets and OEM motherboards. Includes the Alpha Architecture Handbook and various programming manuals.
A Conversation with Dan Dobberpuhl (October 1, 2003)
Dr. Bruce Hutton's excellent lecture notes on Computer Architecture

[facts_and_comments-1] ^ ^a ^b ^c ^d ^e ^f Paul V. Bolotoff (21 April 2007). "Alpha: The History in Facts and Comments". Retrieved Nov 22, 2008.

[alphant-2] Aaron Sakovich (2001). "Windows 2000?". The AlphaNT Source. Retrieved 2007-01-01.

[alphaSUSE-3] "SUSE Linux 7.0 Alpha Edition". SUSE. 2000. Retrieved 2014-01-08.

[alphasupportend-4] "Transforming your AlphaServer environment". HP. Retrieved 2007-01-11.

[pickle-5] Bill Hamburgen; Jeff Mogul; Brian Reid; Alan Eustace; Richard Swan; Mary Jo Doherty; Joel Bartlett (1989). "WRL Technical Note TN-13: Characterization of Organic Illumination Systems" (PDF). Digital Equipment Corporation. Retrieved 2007-10-04. {{cite journal}}: Cite journal requires |journal= (help)

[Tarantula-6] Roger Espasa; Federico Ardanaz; Julio Gago; Roger Gramunt; Isaac Hernandez; Toni Juan; Joel Emer; Stephen Felix; Geoff Lowney; Matthew Mattina; Andre Seznec (2002). "Tarantula: A Vector Extension to the Alpha Architecture" (PDF). In Danielle C. Martin (ed.). Proceedings: 29th Annual International Symposium on Computer Architecture (ISCA '02). 29th Annual International Symposium on Computer Architecture (ISCA '02). Joe Daigle/Studio Productions. Los Alamitos, Calif: IEEE Computer Society. pp. Page(s): 281–292. doi:10.1109/ISCA.2002.1003586. ISBN 0-7695-1605-X. Retrieved 2007-10-04. {{cite conference}}: External link in |conferenceurl= (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |conferenceurl= ignored (|conference-url= suggested) (help)

[7] Gronowski, P. E.; Bowhill, W. J.; Donchin, D. R.; Blake-Campos, R. P.; Carlson, D. A.; Equi, E. R.; Loughlin, B. J.; Mehta, S.; Mueller, R. O.; Olesin, A.; Noorlag, D. J. W.; Preston, R. P. (1996). "A 433-MHz 64-b quad-issue RISC microprocessor". IEEE Journal of Solid-State Circuits. 31 (11): 1687–1696. doi:10.1109/JSSC.1996.542313.

[8] Gwennap, Linley (18 November 1996). "Digital, MIPS Add Multimedia Extensions". Microprocessor Report.

[9] In the context of data transfer, 1 GB is used to mean 1 billion bytes

[zdnet-compaq-10] Popovich, Ken (2001-06-28). "Alpha proved costly for Compaq". www.zdnet.com. ZDNet. Retrieved 2016-03-02.

[11] Luiz André Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese (2000). "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing". Proceedings of the 27th Annual International Symposium on Computer Architecture.

[12] In the context of cache memory, 1 KB = 1024 bytes; 1 MB = 1024 KB

[13] David Mosberger. "Overview of Alpha Family". Retrieved Dec 9, 2009.

[14] Reinhardt Krause. "DEC launching Alpha board push". Electronic News, April 4, 1994.

[15] TOP500 (2011). "Sunway Blue Light - Sunway BlueLight MPP, ShenWei processor SW1600 975.00 MHz, Infiniband QDR". Retrieved 2012-09-15.{{cite web}}: CS1 maint: numeric names: authors list (link)

[16] Los Alamos National Laboratories (2002). "The ASCI Q System: 30 TeraOPS Capability at Los Alamos National Laboratory" (PDF). Retrieved 2010-06-06.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

v t e Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015) AMX (2022) AVX10 (2023)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	Thumb MIPS16e ASE RVC
Security and cryptography	PadLock (2003) AES-NI (2008); ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015) TDX (2021)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006) VT-d (AMD-Vi)
Suspended extensions' dates are ~~struck through~~.

v t e Reduced instruction set computer (RISC) architectures
Origins	IBM 801 Berkeley RISC Stanford MIPS
Active	Analog Devices Blackfin ARC ARM AVR eSi-RISC LatticeMico8, LatticeMico32 MIPS OpenRISC Power ISA Renesas M32R, SuperH, V850 RISC-V SPARC Sunway Unicore Xilinx MicroBlaze, PicoBlaze
Discontinued	Alpha AMD Am29000 Apollo PRISM Atmel AVR32 Clipper CR16 CRISP DEC PRISM Intel i860, i960 META MIPS-X Motorola 88000, M·CORE PA-RISC POWER, PowerPC, ROMP