This article needs additional citations for verification. (December 2006) (Learn how and when to remove this template message)
A GPU cluster is a computer cluster in which each node is equipped with a Graphics Processing Unit (GPU). By harnessing the computational power of modern GPUs via General-Purpose Computing on Graphics Processing Units (GPGPU), very fast calculations can be performed with a GPU cluster.
The hardware classification of GPU clusters fall into two categories: Heterogeneous and Homogeneous.
Hardware from both of the major IHV's can be used (AMD and nVidia). Even if different models of the same GPU are used (e.g. 8800GT mixed with 8800GTX) the gpu cluster is considered heterogeneous.
Every single GPU is of the same hardware class, make, and model. (i.e. a homogeneous cluster comprising 100 8800GTs, all with the same amount of memory)
Classifying a GPU cluster according to the above semantics largely directs software development on the cluster, as different GPUs have different capabilities that can be utilized.
In addition to the computer nodes and their respective GPUs, a fast enough interconnect is needed in order to shuttle data amongst the nodes. The type of interconnect largely depends on the number of nodes present. Some examples of interconnects include Gigabit Ethernet and InfiniBand.
NVIDIA provides a list of dedicated Tesla Preferred Partners (TPP) with the capability of building and delivering a fully configured GPU cluster using the Tesla 20-series GPGPUs. AMAX Information Technologies, Dell, Hewlett-Packard and Silicon Graphics are some of the few companies that provide a complete line of GPU clusters and systems.
The software components that are required to make many GPU-equipped machines act as one include:
- Operating System
- GPU driver for the each type of GPU present in each cluster node.
- Clustering API (such as the Message Passing Interface, MPI).
- VirtualCL (VCL) cluster platform  is a wrapper for OpenCL™ that allows most unmodified applications to transparently utilize multiple OpenCL devices in a cluster as if all the devices are on the local computer.
Mapping an algorithm to run a GPU cluster is somewhat similar to mapping an algorithm to run on a traditional computer cluster. Example: rather than distributing pieces of an array from RAM, a texture is divided up amongst the nodes of the GPU cluster.