Comparison of instruction set architectures
This article provides insufficient context for those unfamiliar with the subject.(April 2009) |
This article needs attention from an expert in Computing. Please add a reason or a talk parameter to this template to explain the issue with the article.(April 2009) |
Factors
Bits
Computer architectures are often described as n-bit architectures. Today n is often 8, 16, 32, or 64, but other sizes have been used. This is actually a strong simplification. A computer architecture often has a few more or less "natural" datasizes in the instruction set, but the hardware implementation of these may be very different. Many architectures have instructions operating on half and/or twice the size of respective processors major internal datapaths. Examples of this are the 8080, Z80, MC68000 as well as many others. On this type of implementations, a twice as wide operation typically also takes around twice as many clock cycles (which is not the case on high performance implementations). On the 68000, for instance, this means 8 instead of 4 clock ticks, and this particular chip may be described as a 32-bit architecture with a 16-bit implementation. The external databus width is often not useful to determine the width of the architecture; the NS32008, NS32016 and NS32032 were basically the same 32-bit chip with different external data buses. The NS32764 had a 64-bit bus, but used 32-bit registers.
The width of addresses may or may not be different than the width of data. Early 32-bit microprocessors often had a 24-bit address, as did the System/360 processors.
Operands
The number of operands is one of the factors that may give an indication about the performance of the instruction set. A three-operand architecture will allow
A := B + C
to be computed in one instruction.
A two-operand architecture will allow
A := A + B
to be computed in one instruction, so two instructions will need to be executed to simulate a single three-operand instruction
A := B A := A + C
Endianness
An architecture may use "big" or "little" endianness, or both, or be configurable to use either. Little endian processors order bytes in memory with the least significant byte of a multi-byte value in the lowest-numbered memory location. Big endian architectures instead order them with the most significant byte at the lowest-numbered address. The x86 and the ARM architectures as well as several 8-bit architectures are little endian. Most RISC architectures (SPARC, Power, PowerPC, MIPS) were originally big endian, but many (including ARM) are now configurable.
Architectures
The table below compares basic information about CPU architectures.
Architecture | Bits | Version | Introduced | Max # Operands | Type | Design | Registers | Instruction encoding | Branch Evaluation | Endianness | Extensions | Open | Royalty-free |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alpha | 64 | 1992 | 3 | Register-Register | RISC | 32 | Fixed | Condition register | Bi | Motion Video Instructions, Byte-Word Extensions, Floating-point Extensions, Count Extensions | No | Unknown | |
ARM | 32 | ARMv7 and earlier | 1983 | 3 | Register-Register | RISC | 16 | Fixed (32-bit), Thumb: Fixed (16-bit), Thumb-2: Variable (16 and 32-bit) | Condition code | Bi | NEON, Jazelle, Vector Floating Point, TrustZone, LPAE | Unknown | No |
ARM | 64 | ARMv8[1] | 2011[2] | 3 | Register-Register | RISC | 30 | Fixed (32-bit), Thumb: Fixed (16-bit), Thumb-2: Variable (16 and 32-bit), A64 | Condition code | Bi | NEON, Jazelle, Vector Floating Point, TrustZone | Unknown | No |
AVR32 | 32 | Rev 2 | 2006 | 2-3 | RISC | 15 | Variable[3] | Big | Java Virtual Machine | Unknown | Unknown | ||
Blackfin | 32 | 2000 | RISC[4] | 8 | Little[5] | Unknown | Unknown | ||||||
DLX | 32 | 1990 | 3 | RISC | 32 | Fixed (32-bit) | Big | Unknown | Unknown | ||||
eSi-RISC | 16/32 | 2009 | 3 | Register-Register | RISC | 8-72 | Variable(16 or 32-bit) | Compare and branch and condition register | Bi | User-defined instructions | No | No | |
Itanium (IA-64) | 64 | 2001 | Register-Register | EPIC | 128 | Condition register | Bi (selectable) | Intel Virtualization Technology | Yes | Yes | |||
M32R | 32 | 1997 | RISC | 16 | Fixed (16- or 32-bit) | Bi | Unknown | Unknown | |||||
m68k | 16/32 | 1979 | CISC | 16 | Big | Unknown | Unknown | ||||||
Mico32 | 32 | 2006 | 3 | Register-Register | RISC | 32[6] | Fixed (32-bit) | Compare and branch | Big | User-defined instructions | Yes[7] | Yes | |
MIPS | 64 (32→64) | 5 | 1981 | 3 | Register-Register | RISC | 32 | Fixed (32-bit) | Condition register | Bi | MDMX, MIPS-3D | Unknown | No |
MMIX | 64 | 1999 | 3 | Register-Register | RISC | 256 | Fixed (32-bit) | Big | Yes | Yes | |||
PA-RISC (HP/PA) | 64 (32→64) | 2.0 | 1986 | 3 | RISC | 32 | Fixed | Compare and branch | Big | Multimedia Acceleration eXtensions (MAX), MAX-2 | No | Unknown | |
PowerPC | 32/64 (32→64) | 2.06[8] | 1991 | 3 | Register-Register | RISC | 32 | Fixed, Variable | Condition code | Big/Bi | AltiVec, APU, VSX, Cell | Yes[9] | No |
S+core | 16/32 | 2005 | RISC | Little | Unknown | Unknown | |||||||
Series 32000 | 32 | 1982 | 5 | Memory-Memory | CISC | 8 | Variable Huffman coded, up to 23 bytes long | Condition Code | Little | BitBlt instructions | Unknown | Unknown | |
SPARC | 64 (32→64) | V9 | 1985 | 3 | Register-Register | RISC | 31 (of at least 55) | Fixed | Condition code | Big → Bi | VIS 1.0, 2.0, 3.0 | Yes | Yes[10] |
SuperH (SH) | 32 | 1990s | 2 | Register-Register/ Register-Memory | RISC | 16 | Fixed | Condition Code (Single Bit) | Bi | Unknown | Unknown | ||
System/360 / System/370 / z/Architecture | 64 (32→64) | 3 | 1964 | Register-Memory/Memory-Memory | CISC | 16 | Fixed | Condition code | Big | Unknown | Unknown | ||
VAX | 32 | 1977 | 6 | Memory-Memory | CISC | 16 | Variable | Compare and branch | Little | VAX Vector Architecture | Unknown | Unknown | |
x86 | 32 (16→32) | 1978 | 2 | Register-Memory | CISC | 8 | Variable | Condition code | Little | MMX, 3DNow!, SSE, PAE, | No | No | |
x86-64 | 64 | 2003 | 2 | Register-Memory | CISC | 16 | Variable | Condition code | Little | MMX, 3DNow!, PAE, AVX | No | No |
Microarchitectures
The following table compares specific microarchitectures.
Microarchitecture | Pipeline stages | Misc |
---|---|---|
AMD K5 | Out-of-order execution, register renaming, speculative execution | |
AMD K6 | Superscalar, branch prediction | |
AMD K6-III | Branch prediction, speculative execution, out-of-order execution[11] | |
AMD K7 | Out-of-order execution, branch prediction, Harvard architecture | |
AMD K8 | 64-bit, integrated memory controller, 16 byte instruction prefetching | |
AMD K10 | Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching | |
ARM7TDMI(-S) | 3 | |
ARM7EJ-S | 5 | |
ARM810 | 5 | |
ARM9TDMI | 5 | |
ARM1020E | 6 | |
XScale PXA210/PXA250 | 7 | |
ARM1136J(F)-S | 8 | |
ARM1156T2(F)-S | 9 | |
ARM Cortex-A5 | 8 | |
ARM Cortex-A8 | 13 | |
ARM Cortex-A9 | Out-of-order, speculative issue, superscalar | |
ARM Cortex-A15 | Multicore (up to 16) | |
AVR32 AP7 | 7 | |
AVR32 UC3 | 3 | Harvard architecture |
Bobcat | Out-of-order execution | |
Bulldozer | Shared L3 cache, multithreading, multicore, integrated memory controller | |
Crusoe | In-order execution, 128-bit VLIW, integrated memory controller | |
Efficeon | In-order execution, 256-bit VLIW, fully integrated memory controller | |
Cyrix Cx5x86 | 6[12] | Branch prediction |
Cyrix 6x86 | Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution | |
DLX | 5 | |
eSi-3200 | 5 | In-order, speculative issue |
eSi-3250 | 5 | In-order, speculative issue |
EV4 (Alpha 21064) | Superscalar | |
EV7 (Alpha 21364) | Superscalar design with out-of-order execution, branch prediction, 4-way SMT, integrated memory controller | |
EV8 (Alpha 21464) | Superscalar design with out-of-order execution | |
P5 (Pentium) | 5 | Superscalar |
P6 (Pentium Pro) | 14 | Speculative execution, Register renaming, superscalar design with out-of-order execution |
P6 (Pentium II) | Branch prediction | |
P6 (Pentium III) | 10 | |
Itanium | 8[13] | Speculative execution, branch prediction, register renaming, 30 execution units, multithreading |
NetBurst (Willamette) | 20 | Simultaneous multithreading |
NetBurst (Northwood) | 20 | Simultaneous multithreading |
NetBurst (Prescott) | 31 | Simultaneous multithreading |
NetBurst (Cedar Mill) | 31 | Simultaneous multithreading |
Core | 14 | |
Intel Atom | 16 | Simultaneous multithreading, in-order. No instruction reordering, speculative execution, or register renaming. |
Nehalem | Simultaneous multithreading, integrated memory controller, L1/L2/L3 cache | |
Sandy Bridge | Simultaneous multithreading, multicore, integrated memory controller, L1/L2/L3 cache. 2 threads per core. | |
Haswell | 14 | Multicore |
LatticeMico32 | 6 | Harvard architecture |
POWER1 | Superscalar, out-of-order execution | |
POWER3 | Superscalar, out-of-order execution | |
POWER4 | Superscalar, speculative execution, out-of-order execution | |
POWER5 | Simultaneous multithreading, out-of-order execution, integrated memory controller | |
POWER6 | 2-way simultaneous multithreading, in-order execution | |
POWER7 | 4 SMT threads per core, 12 execution units per core | |
PowerPC 401 | 3 | |
PowerPC 405 | 5 | |
PowerPC 440 | 7 | |
PowerPC 470 | 9 | SMP |
PowerPC A2 | 15 | |
PowerPC e300 | 4 | Superscalar, Branch prediction |
PowerPC e500 | Dual 7 stage | Multicore |
PowerPC e600 | 3-issue 7 stage | Superscalar out-of-order execution, branch prediction |
PowerPC e5500 | 4-issue 7 stage | Out-of-order, multicore |
PowerPC 603 | 4 | 5 execution units, branch prediction. No SMP. |
PowerPC 603q | 5 | In-order |
PowerPC 604 | 6 | Superscalar, out-of-order execution, 6 execution units. SMP support. |
PowerPC 620 | 5 | Out-of-order execution- SMP support. |
PWRficient | Superscalar, out-of-order execution, 6 execution units | |
R4000 | 8 | Scalar |
StrongARM SA-110 | 5 | Scalar, in-order |
SuperH SH2 | 5 | |
SuperH SH2A | 5 | Superscalar, Harvard architecture |
SPARC | Superscalar | |
HyperSPARC | Superscalar | |
SuperSPARC | Superscalar, in-order | |
SPARC64 VI/VII/VII+ | Superscalar, out-of-order[14] | |
UltraSPARC | 9 | |
UltraSPARC T1 | 6 | Open source, multithreading, multi-core, 4 threads per core, integrated memory controller |
UltraSPARC T2 | 8 | Open source, multithreading, multi-core, 8 threads per core |
SPARC T3 | 8 | Multithreading, multi-core, 8 threads per core, SMP |
SPARC T4 | 16 | Multithreading, multi-core, 8 threads per core, SMP, out-of-order |
VIA C7 | In-order execution | |
VIA Nano (Isaiah) | Superscalar out-of-order execution, branch prediction, 7 execution units | |
WinChip | 4 | In-order execution |
See also
- Central processing unit (CPU)
- CPU design
- Instruction set
- List of instruction sets
- Microprocessor
- Benchmark (computing)
References
- ^ ARMv8 Technology Preview
- ^ "ARM goes 64-bit with new ARMv8 chip architecture". Retrieved 26 May 2012.
- ^ "AVR32 Architecture Document" (PDF). Atmel. Retrieved 2008-06-15.
- ^ "Blackfin Processor Architecture Overview". Analog Devices. Retrieved 2009-05-10.
- ^ "Blackfin memory architecture". Analog Devices. Retrieved 2009-12-18.
- ^ "LatticeMico32 Architecture". Lattice Semiconductor. Retrieved 2009-12-18.
- ^ "Open Source Licensing". Lattice Semiconductor. Retrieved 2009-12-18.
- ^ "Power ISA V2.06" (PDF). IBM. Retrieved 2009-07-04. [dead link]
- ^ http://www.ibm.com/developerworks/power/newto/#2 New to Cell/B.E., multicore, and Power Architecture technology
- ^ http://www.sparc.org/specificationsDocuments.html##ArchLic SPARC Architecture License
- ^ http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_1288%5E1295,00.html
- ^ http://www.pcguide.com/ref/cpu/fam/g4C5x86-c.html
- ^ Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. <http://www.intel.com/design/itanium2/manuals/25110901.pdf> (2002) [Retrieved November 28, 2011]
- ^ http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/performance/processor.html