Comparison of instruction set architectures: Difference between revisions

Content deleted Content added

Inline

Revision as of 12:58, 23 July 2013

Factors

Bits

Computer architectures are often described as n-bit architectures. Today n is often 8, 16, 32, or 64, but other sizes have been used. This is actually a strong simplification. A computer architecture often has a few more or less "natural" datasizes in the instruction set, but the hardware implementation of these may be very different. Many architectures have instructions operating on half and/or twice the size of respective processors major internal datapaths. Examples of this are the 8080, Z80, MC68000 as well as many others. On this type of implementations, a twice as wide operation typically also takes around twice as many clock cycles (which is not the case on high performance implementations). On the 68000, for instance, this means 8 instead of 4 clock ticks, and this particular chip may be described as a 32-bit architecture with a 16-bit implementation. The external databus width is often not useful to determine the width of the architecture; the NS32008, NS32016 and NS32032 were basically the same 32-bit chip with different external data buses. The NS32764 had a 64-bit bus, but used 32-bit registers.

The width of addresses may or may not be different than the width of data. Early 32-bit microprocessors often had a 24-bit address, as did the System/360 processors.

Operands

The number of operands is one of the factors that may give an indication about the performance of the instruction set. A three-operand architecture will allow

A := B + C

to be computed in one instruction.

A two-operand architecture will allow

A := A + B

to be computed in one instruction, so two instructions will need to be executed to simulate a single three-operand instruction

A := B
A := A + C

Endianness

An architecture may use "big" or "little" endianness, or both, or be configurable to use either. Little endian processors order bytes in memory with the least significant byte of a multi-byte value in the lowest-numbered memory location. Big endian architectures instead order them with the most significant byte at the lowest-numbered address. The x86 and the ARM architectures as well as several 8-bit architectures are little endian. Most RISC architectures (SPARC, Power, PowerPC, MIPS) were originally big endian, but many (including ARM) are now configurable.

Architectures

The table below compares basic information about CPU architectures.

Architecture	Bits	Version	Introduced	Max # Operands	Type	Design	Registers	Instruction encoding	Branch Evaluation	Endianness	Extensions	Open	Royalty-free
Alpha	64		1992	3	Register-Register	RISC	32	Fixed	Condition register	Bi	Motion Video Instructions, Byte-Word Extensions, Floating-point Extensions, Count Extensions	No	Unknown
ARM	32	ARMv7 and earlier	1983	3	Register-Register	RISC	16	Fixed (32-bit), Thumb: Fixed (16-bit), Thumb-2: Variable (16 and 32-bit)	Condition code	Bi	NEON, Jazelle, Vector Floating Point, TrustZone, LPAE	Unknown	No
ARM	64	ARMv8^[1]	2011^[2]	3	Register-Register	RISC	31	Fixed (32-bit), Thumb: Fixed (16-bit), Thumb-2: Variable (16 and 32-bit), A64	Condition code	Bi	NEON, Jazelle, Vector Floating Point, TrustZone	Unknown	No
AVR32	32	Rev 2	2006	2-3		RISC	15	Variable^[3]		Big	Java Virtual Machine	Unknown	Unknown
Blackfin	32		2000			RISC^[4]	8			Little^[5]		Unknown	Unknown
DLX	32		1990	3		RISC	32	Fixed (32-bit)		Big		Unknown	Unknown
eSi-RISC	16/32		2009	3	Register-Register	RISC	8-72	Variable(16 or 32-bit)	Compare and branch and condition register	Bi	User-defined instructions	No	No
Itanium (IA-64)	64		2001		Register-Register	EPIC	128		Condition register	Bi (selectable)	Intel Virtualization Technology	Yes	Yes
M32R	32		1997			RISC	16	Fixed (16- or 32-bit)		Bi		Unknown	Unknown
MC68K	32		1979	2	Register-Memory	CISC	16	Variable	Condition register	Big		Unknown	Unknown
Mico32	32		2006	3	Register-Register	RISC	32^[6]	Fixed (32-bit)	Compare and branch	Big	User-defined instructions	Yes^[7]	Yes
MIPS	64 (32→64)	5	1981	3	Register-Register	RISC	32	Fixed (32-bit)	Condition register	Bi	MDMX, MIPS-3D	Unknown	No
MMIX	64		1999	3	Register-Register	RISC	256	Fixed (32-bit)		Big		Yes	Yes
PA-RISC (HP/PA)	64 (32→64)	2.0	1986	3		RISC	32	Fixed	Compare and branch	Big	Multimedia Acceleration eXtensions (MAX), MAX-2	No	Unknown
PowerPC	32/64 (32→64)	2.06^[8]	1991	3	Register-Register	RISC	32	Fixed, Variable	Condition code	Big/Bi	AltiVec, APU, VSX, Cell	Yes^[9]	No
S+core	16/32		2005			RISC				Little		Unknown	Unknown
Series 32000	32		1982	5	Memory-Memory	CISC	8	Variable Huffman coded, up to 23 bytes long	Condition Code	Little	BitBlt instructions	Unknown	Unknown
SPARC	64 (32→64)	V9	1985	3	Register-Register	RISC	31 (of at least 55)	Fixed	Condition code	Big → Bi	VIS 1.0, 2.0, 3.0	Yes	Yes^[10]
SuperH (SH)	32		1990s	2	Register-Register/ Register-Memory	RISC	16	Fixed	Condition Code (Single Bit)	Bi		Unknown	Unknown
System/360 / System/370 / z/Architecture	64 (32→64)	3	1964		Register-Memory/Memory-Memory	CISC	16	Fixed	Condition code	Big		Unknown	Unknown
VAX	32		1977	6	Memory-Memory	CISC	16	Variable	Compare and branch	Little	VAX Vector Architecture	Unknown	Unknown
x86	32 (16→32)		1978	2	Register-Memory	CISC	8	Variable	Condition code	Little	MMX, 3DNow!, SSE, PAE,	No	No
x86-64	64		2003	2	Register-Memory	CISC	16	Variable	Condition code	Little	MMX, 3DNow!, PAE, AVX	No	No
Architecture	Bits	Version	Introduced	Max # Operands	Type	Design	Registers	Instruction encoding	Branch Evaluation	Endianness	Extensions	Open	Royalty-free

Microarchitectures

The following table compares specific microarchitectures.

Microarchitecture	Pipeline stages	Misc
AMD K5		Out-of-order execution, register renaming, speculative execution
AMD K6		Superscalar, branch prediction
AMD K6-III		Branch prediction, speculative execution, out-of-order execution^[11]
AMD K7		Out-of-order execution, branch prediction, Harvard architecture
AMD K8		64-bit, integrated memory controller, 16 byte instruction prefetching
AMD K10		Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching
ARM7TDMI(-S)	3
ARM7EJ-S	5
ARM810	5
ARM9TDMI	5
ARM1020E	6
XScale PXA210/PXA250	7
ARM1136J(F)-S	8
ARM1156T2(F)-S	9
ARM Cortex-A5	8	Single issue, in-order
ARM Cortex-A7 MPCore	8	Partial dual-issue, in-order
ARM Cortex-A8	13	Dual-issue
ARM Cortex-A9 MPCore	8-11	Out-of-order, speculative issue, superscalar
ARM Cortex-A15 MPCore	15	Multicore (up to 16), out-of-order, speculative issue, 3-way superscalar
ARM Cortex-A53		Partial dual-issue, in-order
ARM Cortex-A57		Deeply out-of-order, wide multi-issue, 3-way superscalar,RISC
AVR32 AP7	7
AVR32 UC3	3	Harvard architecture
Bobcat		Out-of-order execution
Bulldozer		Shared multithreaded L2 cache, multithreading, multicore, around 20 stage long pipeline,integrated memory controller,out-of-order,superscalar,up to 16 cores per chip,up to 16mb lv3 cache,Virtualization,Turbo Core,FlexFPU which use simultaneous multithreading.^[12]
Piledriver		Shared multithreaded L2 cache, multithreading, multicore, around 20 stage long pipeline,integrated memory controller,out-of-order,superscalar,up to 16mb lv2 cache,up to 16mb lv3 cache,Virtualization,FlexFPU which use simultaneous multithreading^[13] ,up to 16 cores per chip,up to 5Ghz clock speed,up to 220w TDP,Turbo Core,CISC
Crusoe		In-order execution, 128-bit VLIW, integrated memory controller
Efficeon		In-order execution, 256-bit VLIW, fully integrated memory controller
Cyrix Cx5x86	6^[14]	Branch prediction
Cyrix 6x86		Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution
DLX	5
eSi-3200	5	In-order, speculative issue
eSi-3250	5	In-order, speculative issue
EV4 (Alpha 21064)		Superscalar
EV7 (Alpha 21364)		Superscalar design with out-of-order execution, branch prediction, 4-way SMT, integrated memory controller
EV8 (Alpha 21464)		Superscalar design with out-of-order execution
P5 (Pentium)	5	Superscalar
P6 (Pentium Pro)	14	Speculative execution, Register renaming, superscalar design with out-of-order execution
P6 (Pentium II)		Branch prediction
P6 (Pentium III)	10
Intel Itanium	8^[15]	Speculative execution, branch prediction, register renaming, 30 execution units, multithreading,multicore,coarse-grained mutithreading,2-way simultaneous multithreading,Turbo Boost,Virtualization,VLIW
Intel NetBurst (Willamette)	20	2-way Simultaneous multithreading (Hyperthreading),Rapid Execution Engine,Execution Trace Cache,quad-pumped Front-Side Bus,Hyper-pipelined Technology,superscalar,out-of order,CISC
NetBurst (Northwood)	20	2-way Simultaneous multithreading
NetBurst (Prescott)	31	2-way Simultaneous multithreading
NetBurst (Cedar Mill)	31	2-way Simultaneous multithreading
Core	14	multicore,out-of-order,superscalar
Intel Atom	16	2-way Simultaneous multithreading, in-order. No instruction reordering, speculative execution, or register renaming,CISC
Intel Atom Oak Trail		2-way Simultaneous multithreading,in-order,Burst mode,512kb lv2 Cache.
Intel Atom Silvermont		Out-of-order execution
Nehalem		2-way Simultaneous multithreading,out-of-order,superscalar,integrated memory controller, L1/L2/L3 cache,Turbo Boost
Sandy Bridge		2-way Simultaneous multithreading, multicore, integrated memory controller, L1/L2/L3 cache. 2 threads per core,Turbo Boost
Intel Haswell	14	Multicore,multithreading,2-way simultaneous multithreading,Transactional memory(in selected models),LV4 Cache(in GT3 models),Turbo Boost,out-of-order,superscalar,up to 8mb lv3 cache(mainstream),up to 20mb lv3 cache(Extreme),Virtualization,[CISC]]
Intel Xeon Phi 7120x		Multicore,multithreading,4 hardware based simultaneous threads per core which cant be disabled unlike regular Hyperthreading,61 cores per chip,244 threads per chip,30.5mb lv2 cache,300w TDP ,Turbo Boost,in-order,CISC
LatticeMico32	6	Harvard architecture
POWER1		Superscalar, out-of-order execution
POWER3		Superscalar, out-of-order execution
POWER4		Superscalar, speculative execution, out-of-order execution
POWER5		2-way Simultaneous multithreading, out-of-order execution, integrated memory controller
IBM POWER6		2-way simultaneous multithreading, in-order execution,up to 5ghz,RISC
IBM POWER7+		multicore,multithreading,out-of-order,superscalar,4 intelligent simultaneous threads per core, 12 execution units per core,8 cores per chip,80mb lv3 cache,Virtualization,true hardware entropy generator,hardware-assisted cryptographic acceleration,Fixed-point unit,Decimal-Fixed Unit,Decimal Floating-Point Unit,RISC
PowerPC 401	3
PowerPC 405	5
PowerPC 440	7
PowerPC 470	9	SMP
PowerPC A2	15
PowerPC e300	4	Superscalar, Branch prediction
PowerPC e500	Dual 7 stage	Multicore
PowerPC e600	3-issue 7 stage	Superscalar out-of-order execution, branch prediction
PowerPC e5500	4-issue 7 stage	Out-of-order, multicore
PowerPC e6500		multicore
PowerPC 603	4	5 execution units, branch prediction. No SMP.
PowerPC 603q	5	In-order
PowerPC 604	6	Superscalar, out-of-order execution, 6 execution units. SMP support.
PowerPC 620	5	Out-of-order execution- SMP support.
PWRficient		Superscalar, out-of-order execution, 6 execution units
R4000	8	Scalar
StrongARM SA-110	5	Scalar, in-order
SuperH SH2	5
SuperH SH2A	5	Superscalar, Harvard architecture
SPARC		Superscalar
HyperSPARC		Superscalar
SuperSPARC		Superscalar, in-order
SPARC64 VI/VII/VII+		Superscalar, out-of-order^[16]
UltraSPARC	9
UltraSPARC T1	6	Open source, multithreading, multi-core, 4 threads per core, integrated memory controller
UltraSPARC T2	8	Open source, multithreading, multi-core, 8 threads per core,
SPARC T3	8	Multithreading, multi-core, 8 threads per core, SMP,16 cores per chip,2mb lv3 cache
Oracle SPARC T4	16	Multithreading, multi-core,8 fine-grained threads per core of which 2 can be executed simultaneously,2-way simultaneous multithreading,, SMP,16 cores per chip,out-of-order,4mb lv3 cache,In-order
Oracle Corporation SPARC T5	16	Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously,2-way simultaneous multithreading,16 cores per chip,out-of-order,16-way associative shared 8mb lv3 cache,hardware-assisted cryptographic acceleration,Stream-Processing unit,out-of order execution,Virtualization,RISC
Oracle Sparc M5	16	Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously,2-way simultaneous multithreading,6 Cores per chip,out-of-order,48mb lv3 cache,out-of order execution,Virtualization
Fujitsu Sparc64 X		Multithreading,multicore,2-way SMT,16 cores per chip,out-of order,24mb lv2 cache,out-of order,Virtualization,RISC
Imagination Technologies Mips Warrior
VIA C7		In-order execution
VIA Nano (Isaiah)		Superscalar out-of-order execution, branch prediction, 7 execution units
WinChip	4	In-order execution

References

^ ARMv8 Technology Preview
^ "ARM goes 64-bit with new ARMv8 chip architecture". Retrieved 26 May 2012.
^ "AVR32 Architecture Document" (PDF). Atmel. Retrieved 2008-06-15.
^ "Blackfin Processor Architecture Overview". Analog Devices. Retrieved 2009-05-10.
^ "Blackfin memory architecture". Analog Devices. Retrieved 2009-12-18.
^ "LatticeMico32 Architecture". Lattice Semiconductor. Retrieved 2009-12-18.
^ "Open Source Licensing". Lattice Semiconductor. Retrieved 2009-12-18.
^ "Power ISA V2.06" (PDF). IBM. Retrieved 2009-07-04. ^{[dead link]}
^ http://www.ibm.com/developerworks/power/newto/#2 New to Cell/B.E., multicore, and Power Architecture technology
^ http://www.sparc.org/specificationsDocuments.html##ArchLic SPARC Architecture License
^ http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_1288%5E1295,00.html
^ http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg
^ http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg
^ http://www.pcguide.com/ref/cpu/fam/g4C5x86-c.html
^ Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. <http://www.intel.com/design/itanium2/manuals/25110901.pdf> (2002) [Retrieved November 28, 2011]
^ http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/performance/processor.html

[1] ARMv8 Technology Preview

[2] "ARM goes 64-bit with new ARMv8 chip architecture". Retrieved 26 May 2012.

[3] "AVR32 Architecture Document" (PDF). Atmel. Retrieved 2008-06-15.

[4] "Blackfin Processor Architecture Overview". Analog Devices. Retrieved 2009-05-10.

[5] "Blackfin memory architecture". Analog Devices. Retrieved 2009-12-18.

[6] "LatticeMico32 Architecture". Lattice Semiconductor. Retrieved 2009-12-18.

[7] "Open Source Licensing". Lattice Semiconductor. Retrieved 2009-12-18.

[8] "Power ISA V2.06" (PDF). IBM. Retrieved 2009-07-04. ^{[dead link]}

[9] ttp://www.ibm.com/developerworks/power/newto/#2 New to Cell/B.E., multicore, and Power Architecture technology

[10] ttp://www.sparc.org/specificationsDocuments.html##ArchLic SPARC Architecture License

[11] ttp://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_1260_1288%5E1295,00.html

[12] ttp://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg

[13] ttp://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg

[14] ttp://www.pcguide.com/ref/cpu/fam/g4C5x86-c.html

[15] Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. <http://www.intel.com/design/itanium2/manuals/25110901.pdf> (2002) [Retrieved November 28, 2011]

[16] ttp://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/performance/processor.html

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

@@ Line 501: / Line 501: @@
 | ARM Cortex-A57
 |
-| Deeply out-of-order, wide multi-issue, 3-way superscalar
+| Deeply out-of-order, wide multi-issue, 3-way superscalar,[[RISC]]
 |-
 | [[AVR32#AP7 Core|AVR32 AP7]]
@@ Line 522: / Line 522: @@
 |
 |Shared multithreaded L2 cache, multithreading, multicore, around 20 stage long pipeline,integrated memory controller,out-of-order,superscalar,up to 16mb lv2 cache,up to 16mb lv3 cache,Virtualization,FlexFPU which use [[simultaneous multithreading]]<ref>http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg</ref>
-,up to 16 cores per chip,up to 5Ghz clock speed,up to 220w TDP,Turbo Core.
+,up to 16 cores per chip,up to 5Ghz clock speed,up to 220w TDP,Turbo Core,[[CISC]]
 |-
 | [[Transmeta Crusoe|Crusoe]]
@@ Line 586: / Line 586: @@
 |[[Intel]] [[NetBurst (microarchitecture)|NetBurst]] ([[Pentium 4#Willamette|Willamette]])
 | 20
-| 2-way [[Simultaneous multithreading]][[(Hyperthreading)]],Rapid Execution Engine,Execution Trace Cache,quad-pumped Front-Side Bus,Hyper-pipelined Technology,superscalar,out-of order
+| 2-way [[Simultaneous multithreading]][[(Hyperthreading)]],Rapid Execution Engine,Execution Trace Cache,quad-pumped Front-Side Bus,Hyper-pipelined Technology,superscalar,out-of order,[[CISC]]
 |-
 | NetBurst ([[Pentium 4#Northwood|Northwood]])
@@ Line 606: / Line 606: @@
 | [[Intel Atom]]
 | 16
-|2-way Simultaneous multithreading, in-order. No instruction reordering, speculative execution, or register renaming.
+|2-way Simultaneous multithreading, in-order. No instruction reordering, speculative execution, or register renaming,CISC
 |-
 |[[Intel Atom]] Oak Trail
@@ Line 626: / Line 626: @@
 |[[Intel]] [[Haswell (microarchitecture)|Haswell]]
 | 14
-| Multicore,multithreading,2-way simultaneous multithreading,[[Transactional memory]](in selected models),LV4 Cache(in GT3 models),Turbo Boost,out-of-order,superscalar,up to 8mb lv3 cache(mainstream),up to 20mb lv3 cache(Extreme),Virtualization
+| Multicore,multithreading,2-way simultaneous multithreading,[[Transactional memory]](in selected models),LV4 Cache(in GT3 models),Turbo Boost,out-of-order,superscalar,up to 8mb lv3 cache(mainstream),up to 20mb lv3 cache(Extreme),Virtualization,[CISC]]
 |-
 |[[Intel]] [[Xeon Phi]] 7120x
 |
-| Multicore,[[multithreading]],4 hardware based simultaneous threads per core which cant be disabled unlike regular [[Hyperthreading]],61 cores per chip,244 threads per chip,30.5mb lv2 cache,300w TDP ,Turbo Boost,in-order
+| Multicore,[[multithreading]],4 hardware based simultaneous threads per core which cant be disabled unlike regular [[Hyperthreading]],61 cores per chip,244 threads per chip,30.5mb lv2 cache,300w TDP ,Turbo Boost,in-order,[[CISC]]
 |-
 | [[LatticeMico32]]
@@ Line 654: / Line 654: @@
 |[[IBM]] [[POWER6]]
 |
-| 2-way simultaneous multithreading, in-order execution,up to 5ghz
+| 2-way simultaneous multithreading, in-order execution,up to 5ghz,[[RISC]]
 |-
 |[[IBM]] [[POWER7+]]
 |
-| multicore,multithreading,out-of-order,superscalar,4 intelligent simultaneous threads per core, 12 execution units per core,8 cores per chip,80mb lv3 cache,Virtualization,true hardware entropy generator,hardware-assisted cryptographic acceleration,Fixed-point unit,Decimal-Fixed Unit,Decimal Floating-Point Unit
+| multicore,multithreading,out-of-order,superscalar,4 intelligent simultaneous threads per core, 12 execution units per core,8 cores per chip,80mb lv3 cache,Virtualization,true hardware entropy generator,hardware-assisted cryptographic acceleration,Fixed-point unit,Decimal-Fixed Unit,Decimal Floating-Point Unit,[[RISC]]
 |-
 | [[PowerPC 400#PowerPC 401|PowerPC 401]]
@@ Line 774: / Line 774: @@
 |[[Oracle Corporation]] [[SPARC T5]]
 | 16
-| Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously,2-way [[simultaneous multithreading]],16 cores per chip,out-of-order,16-way associative shared 8mb lv3 cache,hardware-assisted cryptographic acceleration,Stream-Processing unit,out-of order execution,Virtualization
+| Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously,2-way [[simultaneous multithreading]],16 cores per chip,out-of-order,16-way associative shared 8mb lv3 cache,hardware-assisted cryptographic acceleration,Stream-Processing unit,out-of order execution,Virtualization,[[RISC]]
 |-
 |Oracle [[Sparc M5]]
@@ Line 782: / Line 782: @@
 |[[Fujitsu]] Sparc64 X
 |
-|Multithreading,multicore,2-way SMT,16 cores per chip,out-of order,24mb lv2 cache,out-of order,Virtualization
+|Multithreading,multicore,2-way SMT,16 cores per chip,out-of order,24mb lv2 cache,out-of order,Virtualization,[[RISC]]
 |-
 |[[Imagination Technologies]] Mips Warrior