The product line is intended to bridge the gap between GPUs and AI accelerators in that the device has specific features specializing it for deep learning workloads. The initial Pascal based DGX-1 delivered 170 teraflops of half precision processing, while the Volta-based upgrade increased this to 960 teraflops.
The successor of the Nvidia DGX-1 is the Nvidia DGX-2, which uses 16 32GB V100 (second generation) cards in a single unit. This increases performance of up to 2 Petaflops with 512GB of shared memory for tackling larger problems and uses NVSwitch to speed up internal communication.
Additionally, there is a higher performance version of the DGX-2, the DGX-2H with a notable difference being the replacement of the Dual Intel Xeon Platinum 8168's @ 2.7 GHz with Dual Intel Xeon Platinum 8174's @ 3.1 GHz
DGX A100 Server
Announced and released on May 14, 2020 was the 3rd generation of DGX server, including 8 Ampere-based A100 accelerators. Also included is 15TB of PCIe gen 4 NVMe storage, two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 Server was $199,000.
Comparison of accelerators used in DGX:
|Architecture||FP32 CUDA Cores||Boost Clock||Memory Clock||Memory Bus Width||Memory Bandwidth||VRAM||Single Precision||Double Precision||INT8 Tensor||FP16 Tensor||FP32 Tensor||Interconnect||GPU||GPU Die Size||Transistor Count||TDP||Manufacturing Process|
|Ampere||6912||~1410MHz||2.4Gbps HBM2||5120-bit||1.6TB/sec||40GB||19.5 TFLOPs||9.7 TFLOPs||624 TFLOPs||312 TFLOPs||156 TFLOPs||600GB/sec||A100||826mm2||54.2B||400W||TSMC 7N|
|Volta||5120||1530MHz||1.75Gbps HBM2||4096-bit||900GB/sec||16GB/32GB||15.7 TFLOPs||7.8 TFLOPs||N/A||125 TFLOPs||N/A||300GB/sec||GV100||815mm2||21.1B||300W/350W||TSMC 12nm FFN|
|Pascal||3584||1480MHz||1.4Gbps HBM2||4096-bit||720GB/sec||16GB||10.6 TFLOPs||5.3 TFLOPs||N/A||N/A||N/A||160GB/sec||GP100||610mm2||15.3B||300W||TSMC 16nm FinFET|
- "nvidia dgx-1" (PDF).
- "inside pascal".
Eight GPU hybrid cube mesh architecture with NVLink
- "deep learning supercomputer".
- "DGX-1 deep learning system" (PDF).
NVIDIA DGX-1 Delivers 75X Faster Training...Note: Caffe benchmark with AlexNet, training 1.28M images with 90 epochs
- "DGX Server". DGX Server. Nvidia. Retrieved 7 September 2017.
- Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
- Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.