Comparison of ARMv8-A cores

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This is a table of 64/32-bit ARMv8-A architecture cores comparing microarchitectures which implement the AArch64 instruction set and mandatory or optional extensions of it. Most chips support 32-bit AArch32 for legacy applications, while the Falkor data center chip does not. All chips of this type have a floating-point unit (FPU) that is better than the one in older ARMv7 and NEON (SIMD) chips. Some of these chips have coprocessors, such as the AppliedMicro Helix that also includes cores from the older 32-bit architecture (ARMv7). Some of the chips are SoCs and can combine both ARM Cortex-A53 and ARM Cortex-A57, such as the Samsung Exynos 7 Octa.

Table[edit]

Company Core Released Revision Decode Pipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE role Execution
ports
Fab
(in nm)
L1 cache
Instr + Data
(in KiB)
L2 cache L3 cache Core
configu-
rations
DMIPS/
MHz
ARM Holdings Cortex-A32 (32-bit)[1] ARMv8.0-A
(only 32-bit)
? LITTLE 28[2] 8–32 + 8–32 0–1 MiB No 1-4+
Cortex-A35[3] ARMv8.0-A 2-wide[4] 8 No Yes LITTLE ? 28 / 16 / 14 / 10 8–64 + 8–64 0 / 128 KiB–1 MiB No 1–4+ 1.78
Cortex-A53[5] ARMv8.0-A 2-wide 8 No Conditional+
Indirect branch
prediction
big/LITTLE 2 28 / 20 / 16 / 14 / 10 8–64 + 8–64 128 KiB–2 MiB No 1–4+ 2.24
Cortex-A55[6] ARMv8.2-A 2-wide 8 No big/LITTLE 2 28 / 20 / 16 / 14 / 10 16–64 + 16–64 0–256 KiB/core 0–4 MiB 1–8+ ?
Cortex-A57 ARMv8.0-A 3-wide 15 Yes
8-wide dispatch
Two-level big 8 28 / 20 / 16[7] / 14 48 + 32 0.5–2 MiB No 1–4+ 4.6
Cortex-A72[8] ARMv8.0-A 3-wide 15 Yes
8-wide dispatch
Two-level big 8 28 / 16 48 + 32 0.5–4 MiB No 1–4+ 4.72
Cortex-A73[9] ARMv8.0-A 2-wide 11–12 Yes
7-wide dispatch
Two-level big 7 28 / 16 / 10 64 + 32/64 1–8 MiB No 1–4+ ~6.35
Cortex-A75[6] ARMv8.2-A 3-wide 11–13 Yes
8-wide dispatch
Two-level big 8 28 / 16 / 10 64 + 64 256–512 KiB/core 0–4 MiB 1–8+ ?
Cortex-A76[10] ARMv8.2-A 4-wide 11–13 Yes
8-wide dispatch
Yes big 8 7 64 + 64 256–512 KiB/core 1–4 MiB 1–4 ?
Apple Inc. Cyclone[11] ARMv8.0-A 6-wide[12] 16[12] Yes[12] Yes No 9[12] 28[13] 64 + 64[12] 1 MiB[12] 4 MiB[12] 2[14] ?
Typhoon ARMv8.0‑A 6-wide[15] 16[15] Yes[15] Yes No 9 20 64 + 64[12] 1 MiB[15] 4 MiB[12] 2, 3 (A8X) ?
Twister ARMv8.0‑A 6-wide[15] 16[15] Yes[15] Yes No 9 16 / 14 64 + 64[15] 3 MiB[15] 4 MiB[15] 2 ?
Hurricane ARMv8.0‑A 7-wide[16] 16 Yes Yes "big" (In A10/A10X paired with "LITTLE" Zephyr
cores)
9 16 (A10)
10 (A10X)
64 + 64[17] 3 MiB[17] (A10)
8 MiB (A10X)
4 MiB[17] (A10)
No (A10X)
2 + 2× Zephyr (A10)
3 + 3x Zephyr (A10X)
?
Monsoon ARMv8.2‑A[18] 7-wide 16 Yes Yes "big" (In Apple A11 paired with "LITTLE" Mistral
cores)
9 10 64 + 64[19] 8 MiB No 2 + 4× Mistral ?
Vortex ARMv8.3‑A[20] 7-wide 16 Yes Yes "big" (In Apple A12 paired with "LITTLE" Tempest
cores)
9 7 128 + 128[19] 8 MiB No 2 + 4x Tempest ?
Nvidia Denver[21][22] ARMv8‑A 2-wide hardware
decoder, up to
7-wide variable-
length VLIW
micro-ops
13 Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
No 7 28 128 + 64 2 MiB No 2 ?
Denver 2[23] ARMv8‑A ? 13 ? ? "Super" Nvidia's own implementation ? 16 128+64 2 MiB No 2 ?
Cavium ThunderX[24][25] ARMv8-A 2-wide ? No Two-level ? 28 78 + 32[26][27] 16 MiB[26][27] No 8–16, 24–48 ?
ThunderX2[25]
(ex. Broadcom Vulcan[28])
May 2018[29] ARMv8.1-A
[30]
8-wide
"4 μops"[31][32]
"quad-threaded"
? Yes[33] Multi-level ? ? 16[34] 32 + 32
(data 8-way)
256KB
per core[35]
1MB
per core[35]
16-32[35] ?
AppliedMicro Helix ? ? ? ? ? ? ? ? 40 / 28 32 + 32 (per core;
write-through
w/parity)[36]
256 KiB shared
per core pair (with ECC)
1 MiB/core 2, 4, 8 ?
X-Gene ? 4-wide 15 Yes ? ? ? 40[37] 8 MiB 8 4.2
X-Gene 2 ? 4-wide 15 Yes ? ? ? 28[38] 8 MiB 8 4.2
X-Gene 3[38] ? ? ? ? ? ? ? 16 ? ? 32 MiB 32 ?
Qualcomm Kryo ARMv8-A ? ? Yes Two-level? "big" or "LITTLE"
Qualcomm's own similar implementation
? 14[39] 32+32[40] 0.5–1 MiB 2, 4 6.3
Kryo 2XX ARMv8-A yes 10 LPE[41]
Kryo 3XX ARMv8.2-A dynamiQ 10 LPP[41] 64+64[41] 0.5 + 1 MiB 2 MiB 4+4
Falkor[42][43] 11-8-2017[44] "ARMv8.1-A features";[43] AArch64 only (not 32-bit)[43] 4-wide 10–15 Yes
8-wide dispatch
Yes ? 8 10 88[43] + 32 500KiB 1.25MiB 40-48 ?
Samsung M1/M2[45][46] 2015 ARMv8-A 4-wide 13[47] Yes
9-wide dispatch[48]
Two-level big 8 14 / 10 64 + 32 2 MiB[49] no 4 ?
M3[50][47] 2018 ARMv8-A 6-wide 15 Yes
12-wide dispatch
Two-level big 12 10 Unknown 512 KiB per core 4096KB 4 ?
Company Core Released Revision Decode Pipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE role Execution
ports
Fab
(in nm)
L1 cache
Instr + Data
(in KiB)
L2 cache L3 cache Core
configu-
rations
DMIPS/
MHz

As Dhrystone (implied in "DMIPS") is a synthetic benchmark developed in 1980s, it is no longer representative of prevailing workloads – use with caution.

See also[edit]

References[edit]

  1. ^ Frumusanu, Andrei (22 February 2016). "ARM Announces Cortex-A32 IoT and Embedded Processor". Anandtech.com. Retrieved 13 June 2016.
  2. ^ "New Ultra-efficient ARM Cortex-A32 Processor Expands… - ARM". www.arm.com. Retrieved 2016-10-01.
  3. ^ "Cortex-A35 Processor". ARM. ARM Ltd.
  4. ^ Frumusanu, Andrei. "ARM Announces New Cortex-A35 CPU - Ultra-High Efficiency For Wearables & More".
  5. ^ "Cortex-A53 Processor". ARM. ARM Ltd.
  6. ^ a b Matt, Humrick (29 May 2017). "Exploring DynamIQ and ARM's New CPUs: Cortex-A75, Cortex-A55". Anandtech.com. Retrieved 29 May 2017.
  7. ^ "TSMC Delivers First Fully Functional 16FinFET Networking Processor". TSMC. 25 September 2014. Retrieved 19 February 2015.
  8. ^ Frumusanu, Andrei. "ARM Reveals Cortex-A72 Architecture Details". Anandtech. Retrieved 25 April 2015.
  9. ^ Frumusanu, Andrei (29 May 2016). "The ARM Cortex A73 - Artemis Unveiled". Anandtech.com. Retrieved 31 May 2016.
  10. ^ Frumusanu, Andrei (31 May 2018). "ARM Cortex-A76 CPU Unveiled". Anandtech. Retrieved 1 June 2018.
  11. ^ Lal Shimpi, Anand (17 September 2013). "The iPhone 5s Review: The Move to 64-bit". AnandTech. Retrieved 3 July 2014.
  12. ^ a b c d e f g h i Lal Shimpi, Anand (31 March 2014). "Apple's Cyclone Microarchitecture Detailed". AnandTech. Retrieved 3 July 2014.
  13. ^ Dixon-Warren, Sinjin (20 January 2014). "Samsung 28nm HKMG Inside the Apple A7". Chipworks. Retrieved 3 July 2014.
  14. ^ Lal Shimpi, Anand (17 September 2013). "The iPhone 5s Review: A7 SoC Explained". AnandTech. Retrieved 3 July 2014.
  15. ^ a b c d e f g h i j Ho, Joshua; Smith, Ryan (2 Nov 2015). "The Apple iPhone 6s and iPhone 6s Plus Review". AnandTech. Retrieved 13 Feb 2016.
  16. ^ "Apple had shifted the microarchitecture in Hurricane (A10) from a 6-wide decode from to a 7-wide decode". AnandTech. October 5, 2018.
  17. ^ a b c "Apple A10 Fusion". system-on-a-chip.specout.com. Retrieved 2016-10-01.
  18. ^ "Apple A11 New Instruction Set Extensions" (PDF). Apple Inc. June 8, 2018.
  19. ^ a b "Measured and Estimated Cache Sizes". AnandTech. October 5, 2018.
  20. ^ "Apple A12 Pointer Authentication Codes". Jonathan Levin, @Morpheus. September 12, 2018.
  21. ^ Stam, Nick (11 August 2014). "Mile High Milestone: Tegra K1 "Denver" Will Be First 64-bit ARM Processor for Android". NVidia. Retrieved 11 August 2014.
  22. ^ Gwennap, Linley. "Denver Uses Dynamic Translation to Outperform Mobile Rivals". The Linley Group. Retrieved 24 April 2015.
  23. ^ Ho, Joshua (25 August 2016). "Hot Chips 2016: NVIDIA Discloses Tegra Parker Details". Anandtech. Retrieved 25 August 2016.
  24. ^ De Gelas, Johan (16 December 2014). "ARM Challenging Intel in the Server Market". Anandtech. Retrieved 8 March 2017.
  25. ^ a b De Gelas, Johan (15 June 2016). "Investigating the Cavium ThunderX". Anandtech. Retrieved 8 March 2017.
  26. ^ a b "64-bit Cortex Platform To Take On x86 Servers In The Cloud". electronic design. 5 June 2014. Retrieved 7 February 2015.
  27. ^ a b "ThunderX_CP™ Family of Workload Optimized Compute Processors" (PDF). Cavium. 2014. Retrieved 7 February 2015.
  28. ^ "⚙ D30510 Vulcan is now ThunderX2T99". reviews.llvm.org.
  29. ^ Kennedy, Patrick (7 May 2018). "Cavium ThunderX2 256 Thread Arm Platforms Hit General Availability". Retrieved 10 May 2018.
  30. ^ "⚙ D21500 [AARCH64] Add support for Broadcom Vulcan". reviews.llvm.org.
  31. ^ https://hpcuserforum.com/presentations/santafe2014/Broadcom%20Monday%20night.pdf
  32. ^ "The Linley Group - Processor Conference 2013". www.linleygroup.com.
  33. ^ "ThunderX2 ARM Processors- A Game Changing Family of Workload Optimized Processors for Data Center and Cloud Applications - Cavium". www.cavium.com.
  34. ^ "Broadcom Announces Server-Class ARMv8-A Multi-Core Processor Architecture". Broadcom. 15 October 2013. Retrieved 11 August 2014.
  35. ^ a b c Kennedy, Patrick (9 May 2018). "Cavium ThunderX2 Review and Benchmarks a Real Arm Server Option". Serve the Home. Retrieved 10 May 2018.
  36. ^ Ganesh T S (3 October 2014). "ARMv8 Goes Embedded with Applied Micro's HeliX SoCs". AnandTech. Retrieved 9 October 2014.
  37. ^ Morgan, Timothy Prickett (12 August 2014). "Applied Micro Plots Out X-Gene ARM Server Future". Enterprisetech. Retrieved 9 October 2014.
  38. ^ a b De Gelas, Johan (15 March 2017). "AppliedMicro's X-Gene 3 SoC Begins Sampling". Anandtech. Retrieved 15 March 2017.
  39. ^ "Snapdragon 820 and Kryo CPU: heterogeneous computing and the role of custom compute". Qualcomm. 2 September 2015. Retrieved 6 September 2015.
  40. ^ Frumusanu, Ryan Smith, Andrei. "The Qualcomm Snapdragon 820 Performance Preview: Meet Kryo".
  41. ^ a b c Smith, Andrei Frumusanu, Ryan. "The Snapdragon 845 Performance Preview: Setting the Stage for Flagship Android 2018". Retrieved 2018-06-11.
  42. ^ Shilov, Anton (16 December 2016). "Qualcomm Demos 48-Core Centriq 2400 SoC in Action, Begins Sampling". Anandtech. Retrieved 8 March 2017. In 2015, Qualcomm teamed up with Xilinx and Mellanox to ensure that its server SoCs are compatible with FPGA-based accelerators and data-center connectivity solutions (the fruits of this partnership will likely emerge in 2018 at best).
  43. ^ a b c d Cutress, Ian (20 August 2017). "Analyzing Falkor's Microarchitecture". Anandtech. Retrieved 21 August 2017. The CPU cores, code named Falkor, will be ARMv8.0 compliant although with ARMv8.1 features, allowing software to potentially seamlessly transition from other ARM environments (or need a recompile). The Centriq 2400 family is set to be AArch64 only, without support for AArch32: Qualcomm states that this saves some power and die area, but that they primarily chose this route because the ecosystems they are targeting have already migrated to 64-bit. Qualcomm’s Chris Bergen, Senior Director of Product Management for the Centriq 2400, stated that the majority of new and upcoming companies have started off with 64-bit as their base in the data center, and not even considering 32-bit, which is a reason for the AArch64-only choice here. [..] Micro-op cache / L0 I-cache with Way prediction [..] The L1 I-cache is 64KB, which is similar to other ARM architecture core designs, and also uses 64-byte lines but with an 8-way associativity. To software, as the L0 is transparent, the L1 I-cache will show as an 88KB cache.
  44. ^ Shrout, Ryan (8 November 2017). "Qualcomm Centriq 2400 Arm-based Server Processor Begins Commercial Shipment". PC Per. Retrieved 8 November 2017.
  45. ^ Frumusanu, Andrei. "Samsung Announces Exynos 8890 with Cat.12/13 Modem and Custom CPU".
  46. ^ Ho, Joshua. "Hot Chips 2016: Exynos M1 Architecture Disclosed".
  47. ^ a b Frumusanu, Andrei (23 January 2018). "The Samsung Exynos M3 - 6-wide Decode with 50%+ IPC Increase". Anandtech. Retrieved 25 January 2018.
  48. ^ Frumusanu, Andrei. "Hot Chips 2016: Exynos M1 Architecture Disclosed". Anandtech. Retrieved 29 May 2017.
  49. ^ "'Neural network' spotted deep inside Samsung's Galaxy S7 silicon brain".
  50. ^ Howse, Brett; Frumusanu, Andrei (3 January 2018). "Samsung Announces New 9810 SoC: DynamiQ & 3rd Gen CPU". Anandtech. Retrieved 25 January 2018.