AOS and SOA

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In computing, AoS, SoA and SoS refer to contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD programming. SoS is the mix of the benefit of Array of Structure, and the benefits of Structure of array, it is Structure of Structure.

An Array of Structs (AoS) layout keeps each record as one contiguous block of memory, as is conventional in object-oriented systems. For example, to store N points in 3D space using AoS:

1 struct point3D {
2     float x;
3     float y;
4     float z;
5 };
6 struct point3D points[N];
7 float get_point_x(int i) { return points[i].x; }

A Struct of Arrays (SoA) instead stores each field's data contiguously across all records, which often simplifies SIMD programming:

1 struct pointlist3D {
2     float x[N];
3     float y[N];
4     float z[N];
5 };
6 struct pointlist3D points;
7 float get_point_x(int i) { return points.x[i]; }

A Struct of Structs (SoS) keeps only one SIMD register width of fields contiguous, which keeps the speed of SoA, but also gains the modular fixed-size structs of AoS. For example, if the SIMD register width is 4:

1 struct point3Dx4 {
2     float x[4];
3     float y[4];
4     float z[4];
5 };
6 struct point3Dx4 points[(N+3)/4];
7 float get_point_x(int i) { return points[i/4].x[i%4]; }

Structure of arrays[edit]

Structure of arrays (or SoA) is a layout separating elements of a record (or 'struct' in the C programming language) into one parallel array per field.[1] The motivation is easier manipulation with packed SIMD instructions in most instruction set architectures, since a single SIMD register can load homogeneous data, possibly transferred by a wide internal datapath (e.g. 128-bit). If only a specific part of the record is needed, only those parts need to be iterated over, allowing more data to fit onto a single cache line. The downside is requiring more cache ways when traversing data, and inefficient indexed addressing. (see also: planar image format)

Array of structures[edit]

Array of structures (or AoS) is the opposite (and more conventional) layout, in which data for different fields is interleaved. This is often more intuitive, and supported directly by most programming languages.

Structure of structures[edit]

Structure of structures (or SoS) is the merge of SoA and AoS layout, in which data for different fields is interleaved, but sorted by vector size equal to the SIMD vector size. This is often less intuitive, but a lot more friendly to modern cache and load port architectures of modern processors. programming languages.


Example[edit]

Here is an example of the two layouts written in C++:

  1 #include <iostream>
  2 #include <string>
  3 #include <vector>
  4 
  5 struct point
  6 {
  7     int x, y;
  8     point() = default;
  9     point(int x_, int y_) : x{x_}, y{y_} { }
 10     point& operator+=(point const& other)
 11     {
 12         x += other.x;
 13         y += other.y;
 14         return *this;
 15     }
 16 };
 17 
 18 // Array of structures approach
 19 namespace array_of_structures {
 20     // Each struct contains information about an individual ball.
 21     struct ball
 22     {
 23         point position;
 24         point velocity;
 25         std::string name;
 26     };
 27     using balls_t = std::vector<ball>;
 28 
 29     // Updating is as simple as looping over the array of balls.
 30     void update_positions(balls_t& bs)
 31     {
 32         for (auto& b : bs)
 33         {
 34             b.position += b.velocity;
 35         }
 36     }
 37 
 38     // If we care about a particular attribute (in this case, the name) we
 39     // have to load the entire ball into the function, which could be wasteful
 40     void print_names(balls_t const& bs)
 41     {
 42         for (ball const& b : bs)
 43         {
 44             std::cout << b.name << '\n';
 45         }
 46     }
 47 }
 48 
 49 // Structure of arrays approach
 50 namespace structure_of_arrays {
 51     using points_t = std::vector<point>;
 52 
 53     // The struct consists of arrays of each particular attribute of the ball.
 54     struct balls_t
 55     {
 56         balls_t() = default;
 57         balls_t(std::size_t count_)
 58           : count{count_}
 59           , positions{count_}
 60           , velocities{count_}
 61           , names{count_}
 62         {
 63         }
 64 
 65         std::size_t count;
 66 
 67         points_t positions;
 68         points_t velocities;
 69         std::vector<std::string> names;
 70     };
 71 
 72     // To iterate over the structure of arrays, we have to iterate over each of
 73     // the relevant arrays.
 74     void update_positions(balls_t& bs)
 75     {
 76         for (std::size_t i = 0; i < bs.count; ++i)
 77         {
 78             // Retrieve the information of the ball at this index
 79             auto& ip = bs.positions[i];
 80             auto& iv = bs.velocities[i];
 81             // Do stuff with it
 82             ip += iv;
 83         }
 84     }
 85 
 86     // If we only care about a particular attribute, we only have to load the
 87     // relevant arrays, and nothing else.
 88     void print_names(balls_t const& bs)
 89     {
 90         for (std::string const& bn : bs.names)
 91         {
 92             std::cout << bn << '\n';
 93         }
 94     }
 95 }
 96 
 97 int main()
 98 {
 99     constexpr std::size_t balls_count = 5;
100     {
101         array_of_structures::balls_t balls(balls_count);
102         for (std::size_t i = 0; i < balls_count; ++i)
103         {
104             array_of_structures::ball& b = balls[i];
105             b.position = point{0, 0};
106             b.velocity =
107                 point{static_cast<int>(i) + 1, static_cast<int>(i) + 1};
108             b.name = "AoS ball #" + std::to_string(i);
109         }
110         array_of_structures::update_positions(balls);
111 
112         std::cout << std::string(80, '=') << '\n';
113         for (auto const& b : balls)
114         {
115             std::cout << b.name << " x: " << b.position.x << ", "
116                       << " y: " << b.position.y << '\n';
117         }
118 
119         std::cout << std::string(80, '-') << '\n';
120         array_of_structures::print_names(balls);
121     }
122     {
123         structure_of_arrays::balls_t balls(balls_count);
124         for (std::size_t i = 0; i < balls_count; ++i)
125         {
126             balls.positions[i] = point{0, 0};
127             balls.velocities[i] =
128                 point{static_cast<int>(i) + 1, static_cast<int>(i) + 1};
129             balls.names[i] = "SoA ball #" + std::to_string(i);
130         }
131         structure_of_arrays::update_positions(balls);
132 
133         std::cout << std::string(80, '=') << '\n';
134         for (std::size_t i = 0; i < balls_count; ++i)
135         {
136             std::cout << balls.names[i] << " x: " << balls.positions[i].x
137                       << ", "
138                       << " y: " << balls.positions[i].y << '\n';
139         }
140 
141         std::cout << std::string(80, '-') << '\n';
142         structure_of_arrays::print_names(balls);
143     }
144 }

Alternatives[edit]

A hybrid approach is possible, e.g. de-interleaving a number of objects that correspond to the number of SIMD lanes.

It is also possible to split some subset of a structure (rather than each individual field) into a parallel array – this can actually improve locality of reference if different pieces of fields are used at different times in the program (see data oriented design).

Some SIMD architectures provide strided load/store instructions to load homogeneous data from SoA format. Yet another option used in some Cell libraries is to de-interleave data from AoS format when loading sources into registers, and interleave when writing out results (facilitated by superscalar issue of permutes). Some vector maths libraries align floating point 4D vectors with the SIMD register to leverage the associated data path and instructions, whilst still providing programmer convenience, although this does not scale to SIMD units wider than four lanes.

In the GPU stream processing model, SoA layout is generally not required, as they provide true vector addressing.

4D vectors[edit]

AoS vs SoA presents a choice when considering 3D or 4D vector data on machines with 4-lane SIMD hardware. SIMD ISAs are usually designed for homogeneous data, however some provide a dot product instruction[2] and additional permutes making the AoS case easier to handle. Although most GPU hardware has moved away from 4D instructions to scalar SIMT pipelines,[3] modern compute kernels using SoA can still give better performance due to memory coalescing [4].

Software support[edit]

Most languages support the AoS format more naturally by combining records and various array abstract data types. The experimental JAI programming language is a recent attempt to provide language level SoA support.[5] Julia language, supports multi-dimensional arrays, with AoS, or SoA (through a package). The Datadraw code generator creates SoA data structures for the C language.

References[edit]

  1. ^ "How to Manipulate Data Structure to Optimize Memory Use". Intel. 2012-02-09. Retrieved 2019-03-17.
  2. ^ "Intel SSE4 Floating Point Dot Product Intrinsics". Intel. Retrieved 2019-03-17.
  3. ^ "Modern GPU Architecture (See Scalar Unified Pipelines)" (PDF). NVIDIA. Retrieved 2019-03-17.
  4. ^ Kim, Hyesoon (2010-02-08). "CUDA Optimization Strategies" (PDF). CS4803 Design Game Consoles. Retrieved 2019-03-17.
  5. ^ Blow, Jonathan (2015-01-21). "Data oriented demo: SoA, composition". Retrieved 2019-03-17. Demonstration of data-oriented and SoA features in the JAI language, also explaining the motivation.