ROCm

ROCm
Developer(s)	AMD
Initial release	November 14, 2016; 8 years ago
Stable release	5.3.0 / October 4, 2022; 2 years ago
Repository	Meta-repository; github.com/radeonopencompute/rocm
Written in	C, C++, Python, Fortran, Julia
Middleware	HIP
Engine	AMDgpu kernel driver, HIPCC, a LLVM-based compiler
Operating system	Linux, Windows
Platform	Supported GPUs
Predecessor	Close to metal, Stream, HSA
Size	<2 GiB
Type	GPGPU libraries and APIs
License	Libre
Website	www.amd.com/en/graphics/servers-solutions-rocm

ROCm^[2] is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP (GPU-kernel-based programming), OpenMP/Message Passing Interface (MPI) (directive-based programming), OpenCL.

ROCm is free, libre and open-source software (except the GPU firmware blobs^[3]), it is distributed under various licenses.

Background

The first GPGPU software stack from ATI/AMD was Close to Metal, which became Stream.

ROCm was launched around 2016^[4] with the Boltzmann Initiative.^[5] ROCm stack builds upon previous AMD GPU stacks, some tools trace back to GPUOpen, others to the Heterogeneous System Architecture (HSA).

Heterogeneous System Architecture

HSA was aimed at producing a middle-level, hardware-agnostic intermediate representation, that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using LLVM, and its AMDGPU backend that was upstreamed,^[6] although there is still research on such enhanced modularity with LLVM MLIR.^[7]

Microsoft AMP C++ 1.2

Programming abilities

ROCm as a stack ranges from the kernel driver to the end-user applications. AMD has introductory videos about AMD GCN hardware,^[8] and ROCm programming^[9] via its learning portal.^[10]

One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit.^[11]

High-level programming

HIP programming

HIP(HCC) kernel language

Memory allocation

NUMA

Heterogeneous Memory Model and Shared Virtual Memory

ROCm code objects

Compute/Graphics interop

Low-level programming

Hardware support

ROCm is primarily targeted at discrete professional GPUs, but [1], but unofficial support includes Vega-family and RDNA2 consumer GPUs.

Accelerated Processor Units (APU) are "enabled", but not officially supported. Having ROCm functional there is involved.^[12]

Professional-grade GPUs

AMD Instinct accelerators are the first-class ROCm citizens, alongside the prosumer Radeon Pro GPU series: they mostly see full support.

The only consumer-grade GPU that has relatively equal support is, as of January 2022, the Radeon VII (GCN 5 - Vega).

Consumer-grade GPUs

Name of GPU series	Southern Islands	Sea Islands	Volcanic Islands	Arctic Islands/Polaris	Vega	Navi 1X	Navi 2X
Released	Jan 2012	Sep 2013	Jun 2015	Jun 2016	Jun 2017	Jul 2019	Nov 2020
Marketing Name	Radeon HD 7000	Radeon Rx 200	Radeon Rx 300	Radeon RX 400/500	Radeon RX Vega/Radeon VII(7 nm)	Radeon RX 5000	Radeon RX 6000
AMD support
Instruction set	GCN instruction set					RDNA instruction set
Microarchitecture	GCN 1st gen	GCN 2nd gen	GCN 3rd gen	GCN 4th gen	GCN 5th gen	RDNA	RDNA 2
Type	Unified shader model
ROCm ^[13]
OpenCL	1.2 (on Linux: 1.1 (no Image support) with Mesa 3D)	2.0 (Adrenalin driver on Win7+) (on Linux: 1.1 (no Image support) with Mesa 3D, 2.0 with AMD drivers or AMD ROCm)				2.0	2.1 ^[14]
Vulkan	1.0 (Win 7+ or Mesa 17+)	1.2 (Adrenalin 20.1, Linux Mesa 3D 20.0)
Shader model	5.1	5.1 6.3			6.4		6.5
OpenGL	4.6 (on Linux: 4.6 (Mesa 3D 20.0))
Direct3D	11 (11_1) 12 (11_1)	11 (12_0) 12 (12_0)			11 (12_1) 12 (12_1)		11 (12_1) 12 (12_2)
`/drm/amdgpu`^[a]	Experimental^[15]

^ DRM (Direct Rendering Manager) is a component of the Linux kernel.

Software ecosystem

Learning resources

AMD ROCm product manager gave a tour of the stack.^[16]

Third-party integration

The main consumers of the stack are machine learning and high-performance computing/GPGPU applications.

Machine learning

Various Deep Learning frameworks have a ROCm backend:^[17]

PyTorch
TensorFlow
ONNX
MXNet
CuPy^[18]
MIOpen
Caffe
Iree (which uses LLVM Multi-Level Intermediate Representation (MLIR))

Supercomputing

ROCm is gaining significant traction in the top 500.^[19] ROCm is used with the Exascale supercomputers ElCapitan^[20]^[21] and Frontier.

Some related software is to be found at AMD Infinity hub.

Other acceleration & graphics interoperation

As of version 3.0, Blender can now use HIP compute kernels for its renderer Cycles.^[22]

Other Languages

Julia

Julia has the AMDGPU.jl package,^[23] which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code.

AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned.

Software distribution

Official

ROCm software is currently spread across dozens of public GitHub repositories. Within the main public meta-repository, there is an xml manifest for each official release: using git-repo, a version control tool built on top of git, is the recommended way to synchronize with the stack locally.^[24]

The release of ROCm 5.1 is imminent, probably mid-February given a minor release each month.^[17]


Stack area	Public GitHub organisation
Low-level (mostly)	https://github.com/radeonopencompute
Mid-level (mostly)	https://github.com/rocm-developer-tools
High-level (mostly)	https://github.com/rocmsoftwareplatform/

AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered under AMD Infinity Hub.^[25]

AMD distributes itself packages tailored to various Linux distributions.

Third-party

There is a growing third-party ecosystem packaging ROCm.

Linux distributions are packaging officially (natively) ROCm, with various degrees of advancement: Arch,^[26] Gentoo,^[27] Debian and Fedora,^[28] GNU Guix, NixOS.

There are spack packages.^[29]

Components

There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of user-space modules.

The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS.^[30]

AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, still lagging.^[31] AMD still packages officially various LLVM forks^[32]^[33]^[7] for parts that are not yet upstreamed - compiler optimizations destined to remain proprietary, debug support, OpenMP offloading...

Low-level

ROCk - Kernel driver

ROCm - Device libraries

Support libraries implemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc.

ROCt - Thunk

The thunk is responsible for all the thinking and queuing that goes into the stack.

ROCr - Runtime

The ROC runtime is different from the ROC Common Language Runtime in that it is not the same thing.

ROCm - CompilerSupport

ROCm code object manager is in charge of interacting with LLVM intermediate representation.

Mid-level

ROCclr Common Language Runtime

The common language runtime is an indirection layer adapting calls to ROCr on linux and PAL on windows. It used to be able to route between different compilers like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP, OpenCL).

OpenCL

ROCm ships its Installable Client Driver ICD loader and an OpenCL^[34] implementation bundled together. As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition.^[35]

HIP - Heterogeneous Interface for Portability

The AMD implementation for its GPUs is called HIPAMD. There is also a CPU implementation mostly for demonstration purposes.

HIPCC

HIP builds a `HIPCC` compiler that either wraps Clang and compiles with LLVM open AMDGPU backend, or redirects to the NVIDIA compiler.^[36]

HIPIFY

HIPIFY is a source-to-source compiling tool, it translates CUDA to HIP and reverse, either using a clang-based tool, or a sed-like Perl script.

GPUFORT

Like HIPIFY, GPUFORT is a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so.^[37]

High-level

ROCm high-level libraries are usually consumed directly by application software, such as machine learning frameworks. Most of the following libraries are in the General Matrix Multiply (GEMM) category, which GPU architecture excels at.

The majority of these user-facing libraries comes in dual-form: hip for the indirection layer that can route to Nvidia hardware, and roc for AMD implementation.^[38]

rocBLAS / hipBLAS

rocBLAS and hipBLAS are central in high-level libraries, it is the AMD implementation for Basic Linear Algebra Subprograms. It uses the library Tensile privately.

rocSOLVER / hipSOLVER

This pair of libraries constitutes the LAPACK implementation for ROCm and is strongly coupled to rocBLAS.

Utilities

ROCm developer tools: Debug, tracer, profiler, System Management Interface, Validation suite, Cluster management.
GPUOpen tools: GPU analyzer, memory visualizer...
External tools: radeontop (TUI overview)

Comparison with competitors

ROCm is a competitor to similar stacks aimed at GPU computing: Nvidia CUDA and Intel OneAPI.

NVidia CUDA

Nvidia is close-source until cuBLAS and such high-level libraries.
Nvidia vendors the Clang frontend and its Parallel Thread Execution (PTX) LLVM GPU backend as the Nvidia CUDA Compiler (NVCC).
There is an open-source layer above it, for example RAPIDS.

Intel OneAPI

References

^ "ROCm 5.3 release". GitHub. 4 October 2022. Retrieved 10 October 2022.
^ "Question: What does ROCm stand for? · Issue #1628 · RadeonOpenCompute/ROCm". Github.com. Retrieved 18 January 2022.
^ "Debian -- Details of package firmware-amd-graphics in buster". Packages.debian.org. Retrieved 18 January 2022.
^ "AMD @ SC16: Radeon Open Compute Platform (ROCm) 1.3 Released, Boltzmann Comes to Fruition". anandtech.com. Retrieved 19 January 2022.
^ "AMD @ SC15: Boltzmann Initiative Announced - C++ and CUDA Compilers for AMD GPUs". anandtech.com. Retrieved 19 January 2022.
^ "User Guide for AMDGPU Backend — LLVM 13 documentation". Llvm.org. Retrieved 18 January 2022.
^ ^a ^b "The LLVM Compiler Infrastructure". GitHub. 19 January 2022.
^ "Introduction to AMD GPU Hardware" – via www.youtube.com.
^ "Fundamentals of HIP Programming". AMD.
^ "ROCm™ Learning Center". AMD.
^ "AMD ROCm / HCC programming: Introduction". December 26, 2018.
^ "Here's something you don't see every day: PyTorch running on top of ROCm on a 6800M (6700XT) laptop! Took a ton of minor config tweaks and a few patches but it actually functionally works. HUGE!". 10 December 2021.
^ "ROCm Getting Started Guide v5.2.3".
^ "AMD Radeon RX 6800 XT Specs". TechPowerUp. Retrieved 1 January 2021.
^ Larabel, Michael (7 December 2016). "The Best Features of the Linux 4.9 Kernel". Phoronix. Retrieved 7 December 2016.
^ "ROCm presentation". HPCwire.com. 6 July 2020. Retrieved 18 January 2022.
^ ^a ^b "AMD Introduces Its Deep-Learning Accelerator Instinct MI200 Series GPUs". Infoq.com. Retrieved 18 January 2022.
^ "Installation".
^ "AMD Chips Away at Intel in World's Top 500 Supercomputers as GPU War Looms". 16 November 2020.
^ "El Capitan Supercomputer Detailed: AMD CPUs & GPUs to Drive 2 Exaflops of Compute".
^ "Livermore's el Capitan Supercomputer to Debut HPE 'Rabbit' Near Node Local Storage". 18 February 2021.
^ "Blender 3.0 takes support for AMD GPUs to the next level. Beta support available now!". Gpuopen.com. 15 November 2021. Retrieved 18 January 2022.
^ "AMD ROCm ⋅ JuliaGPU". juliagpu.org.
^ "ROCm Installation v4.3 — ROCm 4.5.0 documentation". Rocmdocs.amd.com. Retrieved 18 January 2022.
^ "Running Scientific Applications on AMD Instinct Accelerators Just Got Easier". HPCwire.com. 18 October 2021. Retrieved 25 January 2022.
^ "ROCm for Arch Linux". Github.com. 17 January 2022. Retrieved 18 January 2022.
^ "Gentoo Linux Packages Up AMD ROCm, Makes Progress On RISC-V, LTO+PGO Python". Phoronix.com. Retrieved 18 January 2022.
^ "Fedora & Debian Developers Look At Packaging ROCm For Easier Radeon GPU Computing Experience". Phoronix.com. Retrieved 18 January 2022.
^ Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.; Lee, Gregory L.; Moody, Adam; de Supinski, Bronis R.; Futral, Scott (November 15, 2015). "The Spack Package Manager: Bringing Order to HPC Software Chaos" – via GitHub.
^ Bloor, Cordell. "20211221 Packaging session notes and small update". debian-ai@lists.debian.org (Mailing list). Retrieved 18 January 2022.
^ "[Debian official packaging] How is ROCm LLVM fork still needed? · Issue #2449 · ROCm-Developer-Tools/HIP". GitHub.
^ "Aomp - V 14.0-1". GitHub. 22 January 2022.
^ "The LLVM Compiler Infrastructure". GitHub. 10 January 2022.
^ "Khronos OpenCL Registry - The Khronos Group Inc". www.khronos.org.
^ "List of OpenCL Conformant Products - The Khronos Group Inc". www.khronos.org. 3 February 2022.
^ "Figure 3. HIPCC compilation process illustration. The clang compiler".
^ "AMD Publishes Open-Source "GPUFORT" as Newest Effort to Help Transition Away from CUDA".
^ Maia, Julio; Chalmers, Noel; T. Bauman, Paul; Curtis, Nicholas; Malaya, Nicholas; McDougall, Damon; van Oostrum, Rene; Wolfe, Noah (May 2021). ROCm Library Support & Profiling Tools (PDF). AMD.

External links

"ROCm official documentation". AMD. 2022-02-10.
"ROCm Learning Center". AMD. 2022-01-25.
"ROCm official documentation on the github super-project". AMD. 2022-01-25.
"ROCm official documentation - pre 5.0". AMD. 2022-01-19.
"GPU-Accelerated Applications with AMD Instinct Accelerators & AMD ROCm Software" (PDF). AMD. 2022-01-25.
"AMD Infinity Hub". AMD. 2022-01-25. — Docker containers for scientific applications.

[drm-15] DRM (Direct Rendering Manager) is a component of the Linux kernel.

[1] "ROCm 5.3 release". GitHub. 4 October 2022. Retrieved 10 October 2022.

[2] "Question: What does ROCm stand for? · Issue #1628 · RadeonOpenCompute/ROCm". Github.com. Retrieved 18 January 2022.

[3] "Debian -- Details of package firmware-amd-graphics in buster". Packages.debian.org. Retrieved 18 January 2022.

[4] "AMD @ SC16: Radeon Open Compute Platform (ROCm) 1.3 Released, Boltzmann Comes to Fruition". anandtech.com. Retrieved 19 January 2022.

[5] "AMD @ SC15: Boltzmann Initiative Announced - C++ and CUDA Compilers for AMD GPUs". anandtech.com. Retrieved 19 January 2022.

[6] "User Guide for AMDGPU Backend — LLVM 13 documentation". Llvm.org. Retrieved 18 January 2022.

[The_LLVM_Compiler_Infrastructure-7] "The LLVM Compiler Infrastructure". GitHub. 19 January 2022.

[8] "Introduction to AMD GPU Hardware" – via www.youtube.com.

[9] "Fundamentals of HIP Programming". AMD.

[10] "ROCm™ Learning Center". AMD.

[11] "AMD ROCm / HCC programming: Introduction". December 26, 2018.

[12] "Here's something you don't see every day: PyTorch running on top of ROCm on a 6800M (6700XT) laptop! Took a ton of minor config tweaks and a few patches but it actually functionally works. HUGE!". 10 December 2021.

[13] "ROCm Getting Started Guide v5.2.3".

[14] "AMD Radeon RX 6800 XT Specs". TechPowerUp. Retrieved 1 January 2021.

[16] Larabel, Michael (7 December 2016). "The Best Features of the Linux 4.9 Kernel". Phoronix. Retrieved 7 December 2016.

[17] "ROCm presentation". HPCwire.com. 6 July 2020. Retrieved 18 January 2022.

[infoq-mi200-18] "AMD Introduces Its Deep-Learning Accelerator Instinct MI200 Series GPUs". Infoq.com. Retrieved 18 January 2022.

[19] "Installation".

[20] "AMD Chips Away at Intel in World's Top 500 Supercomputers as GPU War Looms". 16 November 2020.

[21] "El Capitan Supercomputer Detailed: AMD CPUs & GPUs to Drive 2 Exaflops of Compute".

[22] "Livermore's el Capitan Supercomputer to Debut HPE 'Rabbit' Near Node Local Storage". 18 February 2021.

[23] "Blender 3.0 takes support for AMD GPUs to the next level. Beta support available now!". Gpuopen.com. 15 November 2021. Retrieved 18 January 2022.

[24] "AMD ROCm ⋅ JuliaGPU". juliagpu.org.

[25] "ROCm Installation v4.3 — ROCm 4.5.0 documentation". Rocmdocs.amd.com. Retrieved 18 January 2022.

[26] "Running Scientific Applications on AMD Instinct Accelerators Just Got Easier". HPCwire.com. 18 October 2021. Retrieved 25 January 2022.

[27] "ROCm for Arch Linux". Github.com. 17 January 2022. Retrieved 18 January 2022.

[28] "Gentoo Linux Packages Up AMD ROCm, Makes Progress On RISC-V, LTO+PGO Python". Phoronix.com. Retrieved 18 January 2022.

[29] "Fedora & Debian Developers Look At Packaging ROCm For Easier Radeon GPU Computing Experience". Phoronix.com. Retrieved 18 January 2022.

[30] Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.; Lee, Gregory L.; Moody, Adam; de Supinski, Bronis R.; Futral, Scott (November 15, 2015). "The Spack Package Manager: Bringing Order to HPC Software Chaos" – via GitHub.

[31] Bloor, Cordell. "20211221 Packaging session notes and small update". debian-ai@lists.debian.org (Mailing list). Retrieved 18 January 2022.

[32] "[Debian official packaging] How is ROCm LLVM fork still needed? · Issue #2449 · ROCm-Developer-Tools/HIP". GitHub.

[33] "Aomp - V 14.0-1". GitHub. 22 January 2022.

[34] "The LLVM Compiler Infrastructure". GitHub. 10 January 2022.

[35] "Khronos OpenCL Registry - The Khronos Group Inc". www.khronos.org.

[36] "List of OpenCL Conformant Products - The Khronos Group Inc". www.khronos.org. 3 February 2022.

[37] "Figure 3. HIPCC compilation process illustration. The clang compiler".

[38] "AMD Publishes Open-Source "GPUFORT" as Newest Effort to Help Transition Away from CUDA".

[39] Maia, Julio; Chalmers, Noel; T. Bauman, Paul; Curtis, Nicholas; Malaya, Nicholas; McDougall, Damon; van Oostrum, Rene; Wolfe, Noah (May 2021). ROCm Library Support & Profiling Tools (PDF). AMD.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[a]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software