Gather/scatter (vector addressing)

From Wikipedia, the free encyclopedia

Gather/scatter is a type of memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse linear algebra operations,[1] sorting algorithms, fast Fourier transforms,[2] and some computational graph theory problems.[3] It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes. Vector processors (and some SIMD units in CPUs) have hardware support for gather and scatter operations.



A sparsely populated vector holding non-empty elements can be represented by two densely populated vectors of length ; containing the non-empty elements of , and giving the index in where 's element is located. The gather of into , denoted , assigns with having already been calculated.[4] Assuming no pointer aliasing between x[], y[],idx[], a C implementation is

for (i = 0; i < N; ++i)
    x[i] = y[idx[i]];


The sparse scatter, denoted is the reverse operation. It copies the values of into the corresponding locations in the sparsely populated vector , i.e. .

for (i = 0; i < N; ++i)
    y[idx[i]] = x[i];


x86-64 CPUs which support the AVX2 instruction set can gather 32-bit and 64-bit elements with memory offsets from a base address. A second register determines whether the particular element is loaded, and faults occurring from invalid memory accesses by masked-out elements are suppressed.[5]: 503–4  The AVX-512 instruction set also contains (potentially masked) scatter operations.[5]: 539 [6] The ARM instruction set's Scalable Vector Extension includes gather and scatter operations on 8-, 16-, 32- and 64-bit elements.[7][8] InfiniBand has hardware support for gather/scatter.[9]

Without instruction-level gather/scatter, efficient implementations may need to be tuned for optimal performance, for example with prefetching; libraries such as OpenMPI may provide such primitives.[2][7]

See also[edit]


  1. ^ Lewis, John G.; Simon, Horst D. (1 March 1988). "The Impact of Hardware Gather/Scatter on Sparse Gaussian Elimination". SIAM Journal on Scientific and Statistical Computing. 9 (2): 304–311. doi:10.1137/0909019.
  2. ^ a b He, Bingsheng; Govindaraju, Naga K.; Luo, Qiong; Smith, Burton (2007). "Efficient gather and scatter operations on graphics processors" (PDF). Proceedings of the 2007 ACM/IEEE Conference on Supercomputing - SC '07: 1. doi:10.1145/1362622.1362684. ISBN 9781595937643. S2CID 2928233.
  3. ^ Kumar, Manoj; Serrano, Mauricio; Moreira, Jose; Pattnaik, Pratap; Horn, W P; Jann, Joefon; Tanase, Gabriel (September 2016). "Efficient implementation of scatter-gather operations for large scale graph analytics". 2016 IEEE High Performance Extreme Computing Conference (HPEC): 1–7. doi:10.1109/HPEC.2016.7761578. ISBN 978-1-5090-3525-0. S2CID 10566760.
  4. ^ BLAS Technical Forum standard, Chapter 3: Sparse BLAS.
  5. ^ a b Kusswurm, Daniel (2022). Modern parallel programming with C++ and Assembly language : X86 SIMD development using AVX, AVX2, and AVX-512. Apress Media. ISBN 978-1-4842-7917-5.
  6. ^ Hossain, Md Maruf; Saule, Erik (9 August 2021). "Impact of AVX-512 Instructions on Graph Partitioning Problems". 50th International Conference on Parallel Processing Workshop: 1–9. doi:10.1145/3458744.3473362. ISBN 9781450384414. S2CID 237350994.
  7. ^ a b Zhong, Dong; Shamis, Pavel; Cao, Qinglei; Bosilca, George; Sumimoto, Shinji; Miura, Kenichi; Dongarra, Jack (May 2020). "Using Arm Scalable Vector Extension to Optimize OPEN MPI" (PDF). 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID): 222–231. doi:10.1109/CCGrid49817.2020.00-71. ISBN 978-1-7281-6095-5. S2CID 220604878.
  8. ^ "What is the Scalable Vector Extension?". ARM Developer. Retrieved 19 November 2022.
  9. ^ Gainaru, Ana; Graham, Richard L.; Polyakov, Artem; Shainer, Gilad (25 September 2016). "Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All". Proceedings of the 23rd European MPI Users' Group Meeting: 167–179. doi:10.1145/2966884.2966918. ISBN 9781450342346. S2CID 15880901.