= ARM Cortex-A72 =

ARM Cortex-A72
- Pcode1: Maya
- Fastest: 2.5 GHz |slow-unit = |fast-unit = |fsb-slowest = |fsb-fastest = |fsb-slow-unit = |fsb-fast-unit = |size-from = 16 nm |size-to =
- Designfirm: ARM Holdings
- Arch: ARMv8-A
- Numcores: 1–4 per cluster, multiple clusters
- L1Cache: 80 KiB (48 KiB I-cache with parity, 32 KiB D-cache with ECC) per core
- L2Cache: 512 KiB to 4 MiB
- L3Cache: None
- Predecessor: ARM Cortex-A57
- Successor: ARM Cortex-A73

The ARM Cortex-A72 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Austin design centre. The Cortex-A72 is a 3-way decode out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC). The Cortex-A72 was announced in 2015 to serve as the successor of the Cortex-A57, and was designed to use 20% less power or offer 90% greater performance.

==Overview==
- Pipelined processor with deeply out-of-order, speculative issue 3-way superscalar execution pipeline
- DSP and NEON SIMD extensions are mandatory per core
- VFPv4 Floating Point Unit onboard (per core)
- Hardware virtualization support
- Thumb-2 instruction set encoding reduces the size of 32-bit programs with little impact on performance.
- TrustZone security extensions
- Program Trace Macrocell and CoreSight Design Kit for unobtrusive tracing of instruction execution
- 32 KiB data (2-way set-associative) + 48 KiB instruction (3-way set-associative) L1 cache per core
- Integrated low-latency level-2 (16-way set-associative) cache controller, 512 KB to 4 MB configurable size per cluster
- 48-entry fully associative L1 instruction translation lookaside buffer (TLB) with native support for 4 KiB, 64 KiB, and 1 MB page sizes
- 32-entry fully associative L1 data TLB with native support for 4 KiB, 64 KiB, and 1 MB page sizes
  - 4-way set-associative of 1024-entry unified L2 TLB per core, supports hit-under-miss
- Sophisticated branch prediction algorithm that significantly increases performance and reduces energy from misprediction and speculation
- Early IC tag –3-way L1 cache at direct-mapped power*
- Regionalized TLB and μBTB tagging
- Small-offset branch-target optimizations
- Suppression of superfluous branch predictor accesses

==Chips==
- Broadcom BCM2711 system on a chip with four A72 cores. Used in the Raspberry Pi 4.
- Qualcomm Snapdragon 650, 652, and 653
- NXP i.MX8, Layerscape LS1026A/LS1046A, LS2044A/LS2084A, LS2048A/LS2088A, LX2160A/LX2120A/LX2080A, LS1028A
- Texas Instruments Jacinto 7 family of automotive and industrial SoC processors.
- Rockchip RK3399, RK3576
- AWS Graviton

==See also==

- ARM Cortex-A57, predecessor
- ARM Cortex-A73, successor
- Comparison of ARMv8-A cores, ARMv8 family
