Streaming SIMD Extensions

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computing, Streaming SIMD Extensions (SSE) is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! (which had debuted a year earlier). SSE contains 70 new instructions.

It was originally known as KNI for Katmai New Instructions (Katmai being the code name for the first Pentium III core revision). During the Katmai project Intel was looking to distinguish it from their earlier product line, particularly their flagship Pentium II. It was later renamed ISSE, for Internet Streaming SIMD Extensions, then SSE. AMD eventually added support for SSE instructions, starting with its Athlon XP and Duron (Morgan core) processors.

Intel's first IA-32 SIMD effort, MMX, had two main problems: it re-used existing floating point registers making the CPU unable to work on both floating point and SIMD data at the same time, and it only worked on integers.

Contents

[edit] Registers

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The AMD64 extensions from AMD (originally called x86-64 and later duplicated by Intel) add a further eight registers XMM8 through XMM15. There is also a new 32-bit control/status register, MXCSR. The registers XMM8 through XMM15 are accessible only in 64-bit operating mode.

XMM registers.svg

Each register packs together:

The Integer operations have instructions for signed and unsigned variants. Integer SIMD operations may still be performed with the eight 64-bit MMX registers.

Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x86 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems.

Because SSE adds floating point support, it sees much more use than MMX. The addition of SSE2's integer support makes SSE even more flexible. While MMX is redundant, operations can be operated in parallel with SSE operations offering further performance increases in some situations.

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the FPU. While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock-cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching.

[edit] SSE Instructions

SSE introduced both scalar and packed floating point instructions.

Floating point instructions

Integer instructions

Other instructions

[edit] Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, 4-component vectors together using x86 requires four floating point addition instructions

vec_res.x = v1.x + v2.x;
vec_res.y = v1.y + v2.y;
vec_res.z = v1.z + v2.z;
vec_res.w = v1.w + v2.w;

This would correspond to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128 bit 'packed-add' instruction can replace the four scalar addition instructions.

movaps xmm0,address-of-v1          ;xmm0=v1.w | v1.z | v1.y | v1.x 
addps xmm0,address-of-v2 ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x movaps address-of-vec_res,xmm0

[edit] Later versions

[edit] References

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages