Gather/scatter is a type of memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse linear algebra operations,[1] sorting algorithms, fast Fourier transforms,[2] and some computational graph theory problems.[3] It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes. Vector processors (and some SIMD units in CPUs) have hardware support for gather and scatter operations.

## Definitions

### Gather

A sparsely populated vector ${\displaystyle y}$ holding ${\displaystyle N}$ non-empty elements can be represented by two densely populated vectors of length ${\displaystyle N}$; ${\displaystyle x}$ containing the non-empty elements of ${\displaystyle y}$, and ${\displaystyle idx}$ giving the index in ${\displaystyle y}$ where ${\displaystyle x}$'s element is located. The gather of ${\displaystyle y}$ into ${\displaystyle x}$, denoted ${\displaystyle x\leftarrow y|_{x}}$, assigns ${\displaystyle x(i)=y(idx(i))}$ with ${\displaystyle idx}$ having already been calculated.[4] Assuming no pointer aliasing between x[], y[],idx[], a C implementation is

```for (i = 0; i < N; ++i)
x[i] = y[idx[i]];
```

### Scatter

The sparse scatter, denoted ${\displaystyle y|_{x}\leftarrow x}$ is the reverse operation. It copies the values of ${\displaystyle x}$ into the corresponding locations in the sparsely populated vector ${\displaystyle y}$, i.e. ${\displaystyle y(idx(i))=x(i)}$.

```for (i = 0; i < N; ++i)
y[idx[i]] = x[i];
```

## Support

x86-64 CPUs which support the AVX2 instruction set can gather 32-bit and 64-bit elements with memory offsets from a base address. A second register determines whether the particular element is loaded, and faults occurring from invalid memory accesses by masked-out elements are suppressed.[5]: 503–4  The AVX-512 instruction set also contains (potentially masked) scatter operations.[5]: 539 [6] The ARM instruction set's Scalable Vector Extension includes gather and scatter operations on 8-, 16-, 32- and 64-bit elements.[7][8] InfiniBand has hardware support for gather/scatter.[9]

Without instruction-level gather/scatter, efficient implementations may need to be tuned for optimal performance, for example with prefetching; libraries such as OpenMPI may provide such primitives.[2][7]