Heterogeneous System Architecture

From Wikipedia, the free encyclopedia
  (Redirected from Heterogeneous system architecture)
Jump to: navigation, search
Steps when offloading calculations to the GPU without HSA.
Steps when offloading calculations to the GPU and making use of HSA.

Heterogeneous System Architecture (HSA) is a type of computer processor architecture that integrates central processing units and graphics processors on the same bus, with shared tasking and memory.[1] The HSA is being developed by the HSA Foundation, which includes (among many others) AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective,[2]:3[3] relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must be done with OpenCL or CUDA).[4]

Heterogeneous computing is widely used in system-on-chip devices, such as tablets, smartphones, and other mobile devices.[5] HSA allows programs to use the graphics processor for floating point calculations without separate memory or scheduling.[6]


Heterogeneous System Architecture is a set of features that define a system architecture which intends to facilitate the propagation heterogeneous computing, i.e. operation of systems that contain multiples processing units, CPUs, GPUs, DSPs and/or any other type of ASICs. The system architecture allows any accelerator, for instance, graphics processor, to operate at the same processing level as the system's CPU.

Among its main features, HSA defines a unified virtual address space space for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, it requires these devices to share page tables so that devices can exchange data by sharing pointers. This is to be supported by custom memory management units.[2]:6–7

To render interoperability possible and also to ease various aspects of programming:

  • Be ISA-agnostic for both CPUs and accelerators.
  • Support high-level programming languages.

So far, the HSA specifications comprehend:

  • HSA Intermediate Layer (HSAIL)
    • virtual instruction set for parallel programs
    • similar to LLVM IR and OpenCL SPIR
    • finalized to a specific instruction set by a JIT compiler
    • make late decisions on which core(s) should run a task
    • explicitly parallel
    • supports exceptions, virtual functions and other high-level features
    • syscall methods (I/O, printf, etc.)
    • debugging support
  • HSA memory model
    • compatible with C++11, OpenCL, Java and .NET memory models
    • relaxed consistency
    • designed to support both managed languages (e.g. Java) and unmanaged languages (e.g. C)
    • will make it much easier to develop 3rd party compilers for a wide range of heterogeneous products programmed in Fortran, C++, C++ AMP, Java, et al.
  • HSA dispatcher and run-time
    • designed to enable heterogeneous task queueing: a work queue per core, distribution of work into queues, load balancing by work stealing
    • any core can schedule work for any other, including itself
    • significant reduction of overhead of scheduling work for a core

Hardware implementations[edit]

Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency.[5]

Standard architecture with a discrete GPU attached to the PCI Express bus. Zero-copy between the GPU and CPU is not possible due to distinct physical memories. 
HSA brings unified virtual memory, and facilitates passing pointers over PCI Express instead of copying the entire data. 
In partitioned main memory, one part of the system memory is exclusively allocated to the GPU. As a result, zero-copy operation are not possible. 
Unified main memory, made possible by a combination of HSA-enabled GPU and CPU. As a result, it is possible to perform zero-copy operations.[7] 

Software support[edit]

For Radeon graphics there are both the drm driver as part of the free and open-source driver, and proprietary kernel blob as part of AMD Catalyst. On the GDC 2014, AMD was exploring a strategy for Catalyst to use the drm driver.[8]

Some of the HSA-specific features implemented in the hardware need to be supported by the operating system's kernel and by specific device drivers. For example, in July 2014 AMD published a set of 83 patches to be merged into version 3.17 of the Linux kernel mainline, aimed at supporting its Radeon and AMD FirePro graphics cards, and APUs based on so-called Graphics Core Next (GCN).[9] This very first implementation focuses on "Kaveri" or "Berlin" APU and works alongside the existing radeon kernel graphics driver (kgd).

Integrated support for HSA platforms has been announced for version 9 of the Java Virtual Machine, due in 2015.[10]

See also[edit]


External links[edit]