Jump to content

Nvidia DGX

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Monkbot (talk | contribs) at 15:20, 31 January 2021 (Task 18 (cosmetic): eval 7 templates: hyphenate params (1×);). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Nvidia DGX is a line of Nvidia produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications.

DGX-1

DGX-1 servers feature 8 GPUs based on the Pascal or Volta daughter cards[1] with HBM 2 memory, connected by an NVLink mesh network.[2]

The product line is intended to bridge the gap between GPUs and AI accelerators in that the device has specific features specializing it for deep learning workloads.[3] The initial Pascal based DGX-1 delivered 170 teraflops of half precision processing,[4] while the Volta-based upgrade increased this to 960 teraflops.[5]

DGX-2

The successor of the Nvidia DGX-1 is the Nvidia DGX-2, which uses 16 32GB V100 (second generation) cards in a single unit. This increases performance of up to 2 Petaflops with 512GB of shared memory for tackling larger problems and uses NVSwitch to speed up internal communication.

Additionally, there is a higher performance version of the DGX-2, the DGX-2H with a notable difference being the replacement of the Dual Intel Xeon Platinum 8168's @ 2.7 GHz with Dual Intel Xeon Platinum 8174's @ 3.1 GHz[6]

DGX A100

Announced and released on May 14, 2020 was the 3rd generation of DGX server, including 8 Ampere-based A100 accelerators.[7] Also included is 15TB of PCIe gen 4 NVMe storage,[8] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[7]

Accelerators

Comparison of accelerators used in DGX:[7]

Accelerator
A100​
V100​
P100
Architecture FP32 CUDA Cores Boost Clock Memory Clock Memory Bus Width Memory Bandwidth VRAM Single Precision Double Precision INT8 Tensor FP16 Tensor FP32 Tensor Interconnect GPU GPU Die Size Transistor Count TDP Manufacturing Process
Ampere 6912 ~1410MHz 2.4Gbps HBM2 5120-bit 1.6TB/sec 40GB 19.5 TFLOPs 9.7 TFLOPs 624 TFLOPs 312 TFLOPs 156 TFLOPs 600GB/sec A100 826mm2 54.2B 400W TSMC 7N
Volta 5120 1530MHz 1.75Gbps HBM2 4096-bit 900GB/sec 16GB/32GB 15.7 TFLOPs 7.8 TFLOPs N/A 125 TFLOPs N/A 300GB/sec GV100 815mm2 21.1B 300W/350W TSMC 12nm FFN
Pascal 3584 1480MHz 1.4Gbps HBM2 4096-bit 720GB/sec 16GB 10.6 TFLOPs 5.3 TFLOPs N/A N/A N/A 160GB/sec GP100 610mm2 15.3B 300W TSMC 16nm FinFET

See also

References

  1. ^ "nvidia dgx-1" (PDF).
  2. ^ "inside pascal". Eight GPU hybrid cube mesh architecture with NVLink
  3. ^ "deep learning supercomputer".
  4. ^ "DGX-1 deep learning system" (PDF). NVIDIA DGX-1 Delivers 75X Faster Training...Note: Caffe benchmark with AlexNet, training 1.28M images with 90 epochs
  5. ^ "DGX Server". DGX Server. Nvidia. Retrieved 7 September 2017.
  6. ^ https://docs.nvidia.com/dgx/pdf/dgx2-user-guide.pdf
  7. ^ a b c Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  8. ^ Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.