Nvidia DGX
Nvidia DGX is a line of Nvidia produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications.
DGX-1
DGX-1 servers feature 8 GPUs based on the Pascal or Volta daughter cards[1] with HBM 2 memory, connected by an NVLink mesh network.[2]
The product line is intended to bridge the gap between GPUs and AI accelerators in that the device has specific features specializing it for deep learning workloads.[3] The initial Pascal based DGX-1 delivered 170 teraflops of half precision processing,[4] while the Volta-based upgrade increased this to 960 teraflops.[5]
DGX-2
The successor of the Nvidia DGX-1 is the Nvidia DGX-2, which uses 16 32GB V100 (second generation) cards in a single unit. This increases performance of up to 2 Petaflops with 512GB of shared memory for tackling larger problems and uses NVSwitch to speed up internal communication.
Additionally, there is a higher performance version of the DGX-2, the DGX-2H with a notable difference being the replacement of the Dual Intel Xeon Platinum 8168's @ 2.7 GHz with Dual Intel Xeon Platinum 8174's @ 3.1 GHz[6]
DGX A100
Announced and released on May 14, 2020 was the 3rd generation of DGX server, including 8 Ampere-based A100 accelerators.[7] Also included is 15TB of PCIe gen 4 NVMe storage,[8] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[7]
Accelerators
Comparison of accelerators used in DGX:[7]
Accelerator |
---|
A100 |
V100 |
P100 |
Architecture | FP32 CUDA Cores | Boost Clock | Memory Clock | Memory Bus Width | Memory Bandwidth | VRAM | Single Precision | Double Precision | INT8 Tensor | FP16 Tensor | FP32 Tensor | Interconnect | GPU | GPU Die Size | Transistor Count | TDP | Manufacturing Process |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ampere | 6912 | ~1410MHz | 2.4Gbps HBM2 | 5120-bit | 1.6TB/sec | 40GB | 19.5 TFLOPs | 9.7 TFLOPs | 624 TFLOPs | 312 TFLOPs | 156 TFLOPs | 600GB/sec | A100 | 826mm2 | 54.2B | 400W | TSMC 7N |
Volta | 5120 | 1530MHz | 1.75Gbps HBM2 | 4096-bit | 900GB/sec | 16GB/32GB | 15.7 TFLOPs | 7.8 TFLOPs | N/A | 125 TFLOPs | N/A | 300GB/sec | GV100 | 815mm2 | 21.1B | 300W/350W | TSMC 12nm FFN |
Pascal | 3584 | 1480MHz | 1.4Gbps HBM2 | 4096-bit | 720GB/sec | 16GB | 10.6 TFLOPs | 5.3 TFLOPs | N/A | N/A | N/A | 160GB/sec | GP100 | 610mm2 | 15.3B | 300W | TSMC 16nm FinFET |
See also
References
- ^ "nvidia dgx-1" (PDF).
- ^ "inside pascal".
Eight GPU hybrid cube mesh architecture with NVLink
- ^ "deep learning supercomputer".
- ^ "DGX-1 deep learning system" (PDF).
NVIDIA DGX-1 Delivers 75X Faster Training...Note: Caffe benchmark with AlexNet, training 1.28M images with 90 epochs
- ^ "DGX Server". DGX Server. Nvidia. Retrieved 7 September 2017.
- ^ https://docs.nvidia.com/dgx/pdf/dgx2-user-guide.pdf
- ^ a b c Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
- ^ Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.