|Developer(s)||G.A.P./Universidad Politécnica de Valencia, H.P.C.A/Universitat Jaume I|
|Stable release||4.0.1 / February 26, 2013|
|License||Proprietary (Free for academic use)|
rCUDA is a middleware that enables Computer Unified Device Architecture CUDA remoting over a commodity network. That is, the middleware allows an application to use a CUDA-compatible graphics processing unit (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. This approach is based on the observation that GPUs in a cluster are not usually fully utilized, and it is intended to reduce the number of GPUs in the cluster, thus lowering the costs related with acquisition and maintenance while keeping performance close to that of the fully equipped configuration.
Following a proposed distributed acceleration architecture for High Performance Computing Clusterswith GPUs attached only to a few of its nodes (see Figure 1), when a node without a local GPU executes an application that makes use of a GPU to accelerate part of its code (usually referred to as kernel), some support has to be provided to deal with the data and code transfers between the local main memory and the remote GPU memory, as well as the remote execution of the kernel.
rCUDA is designed following the client-server distributed architecture: on one side, clients employ a library of wrappers to the high-level CUDA Runtime API and, on the other side, there is a GPU network service listening for requests on a TCP port. Figure 1 illustrates this proposal, where several nodes running different GPU-accelerated applications can concurrently make use of the whole set of accelerators installed in the cluster. When an application demands a GPU service, its request is derived to the client side of our architecture, running in that computer.
The client forwards the request to one of the servers, which accesses the GPU installed in that computer and executes the request in it. Time-multiplexing (sharing) the GPU is accomplished by spawning a different server process for each remote execution over a new GPU context.
The rCUDA Framework enables the concurrent usage of CUDA-compatible devices remotely.
rCUDA employs the socket API for the communication between clients and servers. Thus, it can be useful in three different environments:
- Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc.
- Academia. In commodity networks, to offer access to a few high performance GPUs concurrently to many students.
- Virtual Machines. To enable the access to the CUDA facilities on the physical machine.
The current version of rCUDA (v3.1) implements all functions in the CUDA Runtime API version 4.0, excluding graphics interoperability. rCUDA 3.1 targets the Linux OS (for 32- and 64-bit architectures) on both client and server sides.
Currently, rCUDA-ready applications have to be programmed using the plain C API. In addition, host and device code need to be compiled separately. Find code examples in the rCUDA SDK package, based on the NVIDIA CUDA SDK. The rCUDA User's Guide on the rCUDA webpage explains more.
- Duato, José; Igual, Francisco; Mayo, Rafael; Peña, Antonio; Quintana-Ortí, Enrique; Silla, Federico; (August 25, 2009). An Efficient Implementation of GPU Virtualization in High Performance Clusters. Lecture Notes in Computer Science 6043. Euro-Par 2009 – Parallel Processing Workshops HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC, Delft, The Netherlands. pp. 385–394. doi:10.1007/978-3-642-14122-5_441. ISBN 978-3-642-14122-5.
- Duato, José; Peña, Antonio; Silla, Federico; Mayo, Rafael; Quintana-Ortí, Enrique; (June 28, 2010). rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. High Performance Computing and Simulation (HPCS), 2010 International Conference on, Caen, France. pp. 224–231. doi:10.1109/HPCS.2010.5547126. ISBN 978-1-4244-6827-0.
- Duato, José; Peña, Antonio; Silla, Federico; Mayo, Rafael; Quintana-Ortí, Enrique; (September 13, 2011). Performance of CUDA Virtualized Remote GPUs in High Performance Clusters. International Conference on Parallel Processing (ICPP), 2011 IInternational Conference on Taipei, Taiwan. pp. 365–374. doi:10.1109/ICPP.2011.58. ISBN 978-1-4577-1336-1.