= ARM Cortex-X1 =

ARM Cortex-X1
- Designfirm: ARM Ltd.
- Fastest: 3.0 GHz in phones and 3.3 GHz in tablets/laptops | slow-unit = | fast-unit = | fsb-slowest = | fsb-fastest = | fsb-slow-unit = | fsb-fast-unit = | hypertransport-slowest = | hypertransport-fastest = | hypertransport-slow-unit = | hypertransport-fast-unit = | qpi-slowest = | qpi-fastest = | qpi-slow-unit = | qpi-fast-unit = | dmi-slowest = | dmi-fastest = | dmi-slow-unit = | dmi-fast-unit = | data-width = | address-width = 40-bit | virtual-width =
- L1Cache: I-cache with parity, D-cache) per core
- L2Cache: per core
- Application: | size-from = | size-to =
- Microarch: ARM Cortex-X1
- Arch: ARMv8-A: A64, A32, and T32
- Extensions: ARMv8.1-A, ARMv8.2-A, cryptography, RAS, ARMv8.3-A LDAPR instructions, ARMv8.4-A dot product
- Numcores: 1–4 per cluster
- Gpu: | co-processor =
- Pcode1: Hera
- Variant: ARM Cortex-A78, ARM Neoverse V1
- Successor: ARM Cortex-X2

The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program.

== Design ==
The Cortex-X1 design is based on the ARM Cortex-A78, but redesigned for purely performance instead of a balance of performance, power, and area (PPA).

The Cortex-X1 is a 5-wide decode out-of-order superscalar design with a 3K macro-OP (MOPs) cache. It can fetch 5 instructions and 8 MOPs per cycle, and rename and dispatch 8 MOPs, and 16 μOPs per cycle. The out-of-order window size has been increased to 224 entries. The backend has 15 execution ports with a pipeline depth of 13 stages and the execution latencies consists of 10 stages. It also features 4x128b SIMD units.

ARM claims the Cortex-X1 offers 30% faster integer and 100% faster machine learning performance than the ARM Cortex-A77.

The Cortex-X1 supports ARM's DynamIQ technology, expected to be used as high-performance cores when used in combination with the ARM Cortex-A78 mid and ARM Cortex-A55 little cores.

== Architecture changes in comparison with ARM Cortex-A78 ==

- Around 20% performance improvement (+30% from A77)
  - 30% faster integer
  - 100% faster machine learning performance
- Out-of-order window size has been increased to 224 entries (from 160 entries)
- Up to 4x128b SIMD units (from 2x128b)
- 15% more silicon area
- 5-way decode (from 4-way)
- 8 MOPs/cycle decoded cache bandwidth (from 6 MOPs/cycle)
- 64 KB L1D + 64 KB L1I (from 32/64 KB L1)
- Up to 1 MB/core L2 cache (from 512 KB/core max)
- Up to 8 MB L3 cache (from 4 MB max)

== Licensing ==
The Cortex-X1 is available as SIP core to partners of their Cortex-X Custom (CXC) program, and its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).

== Usage ==
- Samsung Exynos 2100
- Qualcomm Snapdragon 888(+)
- Google Tensor

== See also ==

- ARM Cortex-A78, related high performance microarchitecture
- ARM Neoverse V1 (Zeus), server sister core to the Cortex-X1
- Comparison of ARMv8-A cores, ARMv8 family
