= List of discontinued x86 instructions =

Instructions that have at some point been present as documented instructions in one or more x86 processors, but where the processor series containing the instructions are discontinued or superseded, with no known plans to reintroduce the instructions.

== Intel instructions ==

=== i386 instructions ===

The following instructions were introduced in the Intel 80386, but later discontinued:
| Instruction | Opcode | Description | Eventual fate |
| | | Extract Bit String | Discontinued from revision B1 of the 80386 onwards. |
| IBTS r/m, r | 0F A7 /r | Insert Bit String | |
| | 0F 24 /r | Move from test register | Present in Intel 386 and 486 − not present in Intel Pentium or any later Intel CPUs (except they're present in the i486-derived Quark X1000).<p>Present in all Cyrix CPUs.</p> |
| MOV TRx,r32 | 0F 26 /r | Move to test register | |

=== Itanium instructions ===

These instructions are only present in the x86 operation mode of early Intel Itanium processors with hardware support for x86. This support was added in "Merced" and removed in "Montecito", replaced with software emulation.
| Instruction | Opcode | Description |
| JMPE r/m16 JMPE r/m32 | NFx 0F 00 /6 | Jump To Intel Itanium Instruction Set. |

=== MPX instructions ===

These instructions were introduced in 6th generation Intel Core "Skylake" CPUs. The last CPU generation to support them was the 9th generation Core "Coffee Lake" CPUs.

Intel MPX adds 4 new registers, BND0 to BND3, that each contains a pair of addresses. MPX also defines a bounds-table as a 2-level directory/table data structure in memory that contains sets of upper/lower bounds.
| Instruction | Opcode | Description |
| BNDMK b, m | | Make lower and upper bound from memory address expression. |
| BNDCL b, r/m | F3 0F 1A /r | Check address against lower bound. |
| BNDCU b, r/m | F2 0F 1A /r | Check address against upper bound in 1's-complement form |
| BNDCN b, r/m | F2 0F 1B /r | Check address against upper bound. |
| | 66 0F 1A /r | Move a pair of memory bounds to/from memory or between bounds-registers. |
| BNDMOV b/m, b | | |
| BNDLDX b,mib | NP 0F 1A /r | Load bounds from the bounds-table, using address translation using an sib-addressing expression mib. |
| BNDSTX mib,b | NP 0F 1B /r | Store bounds into the bounds-table, using address translation using an sib-addressing expression mib. |
| BND | F2 | Instruction prefix used with certain branch instructions to indicate that they should not clear the bounds registers. |

The XSAVE area that was used to save MPX registers on context switches has been repurposed for APX.

=== Hardware Lock Elision ===

The Hardware Lock Elision feature of Intel TSX is marked in the Intel SDM as removed from 2019 onwards. This feature took the form of two instruction prefixes, XACQUIRE and XRELEASE, that could be attached to memory atomics/stores to elide the memory locking that they represent.

| Instruction prefix | Opcode | Description |
| XACQUIRE | F2 | Instruction prefix to indicate start of hardware lock elision, used with memory atomic instructions only (for other instructions, the F2 prefix may have other meanings). When used with such instructions, may start a transaction instead of performing the memory atomic operation. |
| XRELEASE | F3 | Instruction prefix to indicate end of hardware lock elision, used with memory atomic/store instructions only (for other instructions, the F3 prefix may have other meanings). When used with such instructions during hardware lock elision, will end the associated transaction instead of performing the store/atomic. |
The XACQUIRE and XRELEASE prefixes can be used with the following instructions:
- LOCK-prefixed memory read-modify-write forms of the instructions XCHG,ADD,ADC,SUB,SBB,AND,OR,XOR,INC,DEC,NEG,NOT,BTC,BTR,BTS,XADD,CMPXCHG and CMPXCHG8B.
- The memory read-modify-write form of the XCHG instruction, regardless of LOCK prefix.
- XRELEASE can also be used with some store instructions without LOCK prefix: MOV mem,reg (opcodes 88h/89h) and MOV mem,imm (opcodes C6h/C7h)
These prefixes cannot be used with the CMPXCHG16B instruction, nor with the direct-offset store opcodes (opcodes A2h/A3h).

=== VP2INTERSECT instructions ===
The VP2INTERSECT instructions (an AVX-512 subset) were introduced in Tiger Lake (11th generation mobile Core processors), but were never officially supported on any other Intel processors — they are now considered deprecated and are listed in the Intel SDM as removed from 2023 onwards.

As of July 2024, the VP2INTERSECT instructions have been re-introduced on AMD Zen 5 processors.
| Instruction | Opcode | Description |
| VP2INTERSECTD k1+1, ymm2, ymm3/m256/m32bcst VP2INTERSECTD k1+1, zmm2, zmm3/m512/m32bcst | | Store, in an even/odd pair of mask registers, the indicators of the locations of value matches between 32-bit lanes in the two vector source arguments. |
| VP2INTERSECTQ k1+1, ymm2, ymm3/m256/m64bcst VP2INTERSECTQ k1+1, zmm2, zmm3/m512/m64bcst | | Store, in an even/odd pair of mask registers, the indicators of the locations of value matches between 64-bit lanes in the two vector source arguments. |

=== Instructions specific to Xeon Phi processors ===

==== "Knights Corner" instructions ====

The first generation Xeon Phi processors, codenamed "Knights Corner" (KNC), supported a large number of instructions that are not seen in any later x86 processor. An instruction reference is available − the instructions/opcodes unique to KNC are the ones with VEX and MVEX prefixes (except for the KMOV, KNOT and KORTEST instructions − these are kept with the same opcodes and function in AVX-512, but with an added "W" appended to their instruction names).

Most of these KNC-unique instructions are similar but not identical to instructions in AVX-512 − later Xeon Phi processors replaced these instructions with AVX-512.

Early versions of AVX-512 avoided the instruction encodings used by KNC's MVEX prefix, however with the introduction of Intel APX (Advanced Performance Extensions) in 2023, some of the old KNC MVEX instruction encodings have been reused for new APX encodings. For example, both KNC and APX accept the instruction encoding as valid, but assign different meanings to it:
- KNC: - vector load with data conversion
- APX: - vector load with one of the new APX extended-GPRs used as scaled index

==== "Knights Landing" and "Knights Mill" instructions ====

Some of the AVX-512 instructions in the Xeon Phi "Knights Landing" and later models belong to the AVX-512 subsets "AVX512ER", "AVX512_4FMAPS", "AVX512PF" and "AVX512_4VNNIW", all of which are unique to the Xeon Phi series of processors. The ER and PF subsets were introduced in "Knights Landing" − the 4FMAPS and 4VNNIW instructions were later added in "Knights Mill".

The ER and 4FMAPS instructions are floating-point arithmetic instructions that all follow a given pattern where:
- EVEX.W is used to specify floating-point format (0=FP32, 1=FP64)
- The bottom opcode bit is used to select between packed and scalar operation (0: packed, 1:scalar)
- For a given operation, all the scalar/packed variants belong to the same AVX-512 subset.
- The instructions all support result masking by opmask registers. The AVX512ER instructions also all support broadcast of memory operands.
- The only supported vector width is 512 bits.

| Operation | AVX-512 subset | Basic opcode | | FP32 instructions (W=0) | | FP64 instructions (W=1) | |
| Packed | Scalar | Packed | Scalar | | | | |
| Xeon Phi specific instructions (ER, 4FMAPS) | | | | | | | |
| Reciprocal approximation with an accuracy of $2^{-28}$ | ER | | | | | | SAE |
| Reciprocal square root approximation with an accuracy of $2^{-28}$ | ER | | | | | | SAE |
| Exponential $2^{x}$ approximation with $2^{-23}$ relative error | ER | EVEX.66.0F38 C8 /r | VEXP2PS z,z/m512 | | VEXP2PD z,z/m512 | | SAE |
| Fused-multiply-add, 4 iterations | 4FMAPS | | | | | | |
| Fused negate-multiply-add, 4 iterations | 4FMAPS | | | | | | |

The AVX512PF instructions are a set of 16 prefetch instructions. These instructions all use VSIB encoding, where a memory addressing mode using the SIB byte is required, and where the index part of the SIB byte is taken to index into the AVX512 vector register file rather than the GPR register file. The selected AVX512 vector register is then interpreted as a vector of indexes, causing the standard x86 base+index+displacement address calculation to be performed for each vector lane, causing one associated memory operation (prefetches in case of the AVX512PF instructions) to be performed for each active lane. The instruction encodings all follow a pattern where:
- EVEX.W is used to specify format of the prefetchable data (0:FP32, 1:FP64)
- The bottom bit of the opcode is used to indicate whether the AVX512 index register is considered a vector of sixteen signed 32-bit indexes (bit 0 not set) or eight signed 64-bit indexes (bit 0 set)
- The instructions all support operation masking by opmask registers.
- The only supported vector width is 512 bits.

| Operation | Basic opcode | 32-bit indexes (opcode C6) | 64-bit indexes (opcode C7) | | |
| FP32 prefetch (W=0) | FP64 prefetch (W=1) | FP32 prefetch (W=0) | FP64 prefetch (W=1) | | |
| Prefetch into L1 cache (T0 hint) | | | | | |
| Prefetch into L2 cache (T1 hint) | | | | | |
| Prefetch into L1 cache (T0 hint) with intent to write | | | | | |
| Prefetch into L2 cache (T1 hint) with intent to write | | | | | |

The AVX512_4VNNIW instructions read a 128-bit data item from memory, containing 4 two-component vectors (each component being signed 16-bit). Then, for each of 4 consecutive AVX-512 registers, they will, for each 32-bit lane, interpret the lane as a two-component vector (signed 16-bit) and perform a dot-product with the corresponding two-component vector that was read from memory (the first two-component vector from memory is used for the first AVX-512 source register, and so on). These results are then accumulated into a destination vector register.

| Instruction | Opcode | Description |
| | | Dot-product of signed words with dword accumulation, 4 iterations |
| | | Dot-product of signed words with dword accumulation and saturation, 4 iterations |

Xeon Phi processors (from Knights Landing onwards) also featured the PREFETCHWT1 m8 instruction (opcode 0F 0D /2, prefetch into L2 cache with intent to write) − these were the only Intel CPUs to officially support this instruction, but it continues to be supported on some non-Intel processors (e.g. Zhaoxin YongFeng).

== AMD instructions ==

=== Am386 SMM instructions ===

A handful of instructions to support System Management Mode were introduced in the Am386SXLV and Am386DXLV processors. They were also present in the later Am486SXLV/DXLV and Elan SC300/310 processors.

The SMM functionality of these processors was implemented using Intel ICE microcode without a valid license, resulting in a lawsuit that AMD lost in late 1994. As a result of this loss, the ICE microcode was removed from all later AMD CPUs, and the SMM instructions removed with it.

| Instruction | Opcode | Description |
| SMI | F1 | Call SMM interrupt handler (only if DR7 bit 12 is set; not available on Am486SXLV/DXLV) |
| UMOV r/m8, r8 | 0F 10 /r | Move data between registers and main system memory |
| UMOV r/m, r16/32 | 0F 11 /r | |
| UMOV r8, r/m8 | 0F 12 /r | |
| RES3 | 0F 07 | Return from SMM interrupt handler (Am386SXLV/DXLV only) Takes a pointer in ES:EDI to a processor save state to resume from − this save state has format nearly identical to that of the undocumented Intel 386 LOADALL instruction. |
| RES4 | 0F 07 | Return from SMM interrupt handler (Am486SXLV/DXLV only). Similar to RES3, but with a different save state format. |

These SMM instructions were also present on the IBM 386SLC and its derivatives (albeit with the LOADALL-like SMM return opcode 0F 07 named ICERET), as well as on the UMC U5S processor.

=== 3DNow! instructions ===

The 3DNow! instruction set extension was introduced in the AMD K6-2, mainly adding support for floating-point SIMD instructions using the MMX registers (two FP32 components in a 64-bit vector register). The instructions were mainly promoted by AMD, but were supported on some non-AMD CPUs as well. The processors supporting 3DNow! were:
- AMD K6-2, K6-III, and all processors based on the K7, K8 and K10 microarchitectures. (Later AMD microarchitectures such as Bulldozer, Bobcat and Zen do not support 3DNow!)
- IDT WinChip 2 and 3
- VIA Cyrix III (both "Joshua" and "Samuel" variants), and the "Samuel" and "Ezra" revisions of VIA C3. (Later VIA CPUs, from C3 "Nehemiah" onwards, dropped 3DNow! in favor of SSE.)
- National Semiconductor Geode GX2; AMD Geode GX and LX.

| Instruction | Opcode | Instruction description |
| PFADD mm1,mm2/m64 | 0F 0F /r 9E | Packed floating-point addition: dst <- dst + src |
| PFSUB mm1,mm2/m64 | 0F 0F /r 9A | Packed floating-point subtraction: dst <- dst − src |
| PFSUBR mm1,mm2/m64 | 0F 0F /r AA | Packed floating-point reverse subtraction: dst <- src − dst |
| PFMUL mm1,mm2/m64 | 0F 0F /r B4 | Packed floating-point multiplication: dst <- dst * src |
| PFMAX mm1,mm2/m64 | 0F 0F /r A4 | Packed floating-point maximum: dst <- (dst > src) ? dst : src |
| PFMIN mm1,mm2/m64 | 0F 0F /r 94 | Packed floating-point minimum: dst <- (dst < src) ? dst : src |
| PFCMPEQ mm1,mm2/m64 | 0F 0F /r B0 | Packed floating-point comparison, equal: dst <- (dst == src) ? 0xFFFFFFFF : 0 |
| PFCMPGE mm1,mm2/m64 | 0F 0F /r 90 | Packed floating-point comparison, greater than or equal: dst <- (dst >= src) ? 0xFFFFFFFF : 0 |
| PFCMPGT mm1,mm2/m64 | 0F 0F /r A0 | Packed floating-point comparison, greater than: dst <- (dst > src) ? 0xFFFFFFFF : 0 |
| PF2ID mm1,mm2/m64 | 0F 0F /r 1D | Converts packed floating-point operand to packed 32-bit signed integer, with round-to-zero |
| PI2FD mm1,mm2/m64 | | Packed 32-bit signed integer to floating-point conversion, with round-to-zero |
| PFRCP mm1,mm2/m64 | 0F 0F /r 96 | Floating-point reciprocal approximation (at least 14 bit precision): |
| PFRSQRT mm1,mm2/m64 | 0F 0F /r 97 | Floating-point reciprocal square root approximation (at least 15 bit precision): |
| | 0F 0F /r A6 | Packed floating-point reciprocal, first iteration step |
| | 0F 0F /r A7 | Packed floating-point reciprocal square root, first iteration step |
| | 0F 0F /r B6 | Packed floating-point reciprocal/reciprocal square root, second iteration step |
| PFACC mm1,mm2/m64 | 0F 0F /r AE | Floating-point accumulate (horizontal add): <pre> |
| | 0F 0F /r B7 | Multiply signed packed 16-bit integers with rounding and store the high 16 bits: dst <- ((dst * src) + 0x8000) >> 16 |
| PAVGUSB mm1,mm2/m64 | 0F 0F /r BF | Average of unsigned packed 8-bit integers: dst <- (src+dst+1) >> 1 |
| FEMMS | 0F 0E | Faster Enter/Exit of the MMX or x87 floating-point state |

3DNow! also introduced a couple of prefetch instructions: (opcode ) and (opcode ). These instructions, unlike the rest of 3DNow!, are not discontinued but continue to be supported on modern AMD CPUs. The PREFETCHW instruction is also supported on Intel CPUs starting with Pentium 4, albeit executed as NOP until Broadwell.

==== 3DNow+ instructions added with Athlon and K6-2+ ====

| Instruction | Opcode | Instruction description |
| PF2IW mm1,mm2/m64 | 0F 0F /r 1C | Packed 32-bit floating-point to 16-bit signed integer conversion, with round-to-zero |
| PI2FW mm1,mm2/m64 | 0F 0F /r 0C | Packed 16-bit signed integer to 32-bit floating-point conversion |
| PSWAPD mm1,mm2/m64 | | |
