Jump to content

NVLink: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
No edit summary
Line 7: Line 7:
! Semiconductor !! Interconnect !! Transmission<br>Technology<br>Rate (per lane) !! Lanes per<br>Sub-Link<br>(out + in) !! Sub-Link Data Rate<br>(per data direction) !! Sub-Link<br>Count || Total Data Rate<br>(out + in) || Total<br>Lanes<br>(out + in) || Total<br>Data Rate<br>(out + in)
! Semiconductor !! Interconnect !! Transmission<br>Technology<br>Rate (per lane) !! Lanes per<br>Sub-Link<br>(out + in) !! Sub-Link Data Rate<br>(per data direction) !! Sub-Link<br>Count || Total Data Rate<br>(out + in) || Total<br>Lanes<br>(out + in) || Total<br>Data Rate<br>(out + in)
|-
|-
| Nvidia P100<ref>[https://www.theregister.co.uk/2016/06/20/nvidia_tesla_p100_pcie_card/ All aboard the PCIe bus for Nvidia's Tesla P100 supercomputer grunt] by Chris Williams at theregister.co.uk on June 20, 2016 </ref> || PCIe 3.0 || {{0}}8 GT/s || 16 + 16 ''b)'' || 128 Gbit/s = 16 GByte/s || 1 || {{0}}16 + {{0}}16 GByte/s<ref>[https://www.nextplatform.com/2016/05/04/nvlink-takes-gpu-acceleration-next-level/ NVLink Takes GPU Acceleration To The Next Level] by Timothy Prickett Morgan at nextplatform.com on May 4, 2016</ref> || 32 ''c)'' || {{0}}32 GByte/s
| Nvidia P100<ref>[https://www.theregister.co.uk/2016/06/20/nvidia_tesla_p100_pcie_card/ All aboard the PCIe bus for Nvidia's Tesla P100 supercomputer grunt] by Chris Williams at theregister.co.uk on June 20, 2016 </ref> || PCIe 3.0 || {{0}}8 GT/s || 16 + 16 '''Ⓑ''' || 128 Gbit/s = 16 GByte/s || 1 || {{0}}16 + {{0}}16 GByte/s<ref>[https://www.nextplatform.com/2016/05/04/nvlink-takes-gpu-acceleration-next-level/ NVLink Takes GPU Acceleration To The Next Level] by Timothy Prickett Morgan at nextplatform.com on May 4, 2016</ref> || 32 '''Ⓒ''' || {{0}}32 GByte/s
|-
|-
| [[POWER9#I.2FO|IBM Power9]]<ref>[https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/1cb956e8-4160-4bea-a956-e51490c2b920/attachment/56cea2a9-a574-4fbb-8b2c-675432367250/media/POWER9-VUG.pdf POWER9 Webinar presentation by IBM for Power Systems VUG] by Jeff Stuecheli on January 26, 2017</ref> || PCIe 4.0 || 16 GT/s || 16 + 16 ''b)'' || 256 Gbit/s = 32 GByte/s || 3 || {{0}}96 + {{0}}96 GByte/s || 48 || 192 GByte/s
| [[POWER9#I.2FO|IBM Power9]]<ref>[https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/1cb956e8-4160-4bea-a956-e51490c2b920/attachment/56cea2a9-a574-4fbb-8b2c-675432367250/media/POWER9-VUG.pdf POWER9 Webinar presentation by IBM for Power Systems VUG] by Jeff Stuecheli on January 26, 2017</ref> || PCIe 4.0 || 16 GT/s || 16 + 16 '''Ⓑ''' || 256 Gbit/s = 32 GByte/s || 3 || {{0}}96 + {{0}}96 GByte/s || 48 || 192 GByte/s
|-
|-
| Nvidia P100 || NVLink 1.0 || 20 GT/s || {{0}}8 + {{0}}8 '''Ⓐ''' || 160 Gbit/s = 20 GByte/s || 4 || {{0}}80 + {{0}}80 GByte/s || 64 || 160 GByte/s
| Nvidia P100 || NVLink 1.0 || 20 GT/s || {{0}}8 + {{0}}8 '''Ⓐ''' || 160 Gbit/s = 20 GByte/s || 4 || {{0}}80 + {{0}}80 GByte/s || 64 || 160 GByte/s
Line 22: Line 22:
'''Note:''' Data Rate columns were rounded by being approximated by transmission rate, see real world performances paragraph<br>
'''Note:''' Data Rate columns were rounded by being approximated by transmission rate, see real world performances paragraph<br>
'''Ⓐ''': sample value; NVLink sub-link bundling should be possible<br>
'''Ⓐ''': sample value; NVLink sub-link bundling should be possible<br>
'''b)''' sample value; other fractions for the PCIe lane usage should be possible<br>
'''''': sample value; other fractions for the PCIe lane usage should be possible<br>
'''c)''' a single PCIe lane transfers data over a differential pair
'''''': a single(no! 16) PCIe lane transfers data over a differential pair


Real world performances could be determinate by applying different encapsulation taxes as well usage rate. Those comes from various sources:
Real world performances could be determinate by applying different encapsulation taxes as well usage rate. Those comes from various sources:

Revision as of 06:52, 28 October 2017

NVLink is a wire-based communications protocol for near-range semiconductor communications developed by Nvidia that can be used for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connections with data rates of 20 and 25 Gbit/s (v1.0/v2.0) per data lane in one data direction. Total data rates in real world systems are 160 and 300 GByte/s (v1.0/v2.0) for the total system sum of input and output data streams.[1] NVLink products introduced to date focus on the high-performance application space. NVLINK, first announced in March 2014, uses a proprietary High-Speed Signaling interconnect (NVHS) developed by Nvidia.[2]

The following table shows a comparison of relevant bus parameters for real world semiconductors that all do offer NVLink as one of their options:

Semiconductor Interconnect Transmission
Technology
Rate (per lane)
Lanes per
Sub-Link
(out + in)
Sub-Link Data Rate
(per data direction)
Sub-Link
Count
Total Data Rate
(out + in)
Total
Lanes
(out + in)
Total
Data Rate
(out + in)
Nvidia P100[3] PCIe 3.0 08 GT/s 16 + 16 128 Gbit/s = 16 GByte/s 1 016 + 016 GByte/s[4] 32 032 GByte/s
IBM Power9[5] PCIe 4.0 16 GT/s 16 + 16 256 Gbit/s = 32 GByte/s 3 096 + 096 GByte/s 48 192 GByte/s
Nvidia P100 NVLink 1.0 20 GT/s 08 + 08 160 Gbit/s = 20 GByte/s 4 080 + 080 GByte/s 64 160 GByte/s
IBM Power8+ NVLink 1.0 20 GT/s 08 + 08 160 Gbit/s = 20 GByte/s 4 080 + 080 GByte/s 64 160 GByte/s
Nvidia V100 NVLink 2.0 25 GT/s 08 + 08 200 Gbit/s = 25 GByte/s 6[6] 150 + 150 GByte/s 96 300 GByte/s
IBM Power9[7] NVLink 2.0
(BlueLink ports)
25 GT/s 08 + 08 200 Gbit/s = 25 GByte/s 6 150 + 150 GByte/s 96 300 GByte/s

Note: Data Rate columns were rounded by being approximated by transmission rate, see real world performances paragraph
: sample value; NVLink sub-link bundling should be possible
: sample value; other fractions for the PCIe lane usage should be possible
: a single(no! 16) PCIe lane transfers data over a differential pair

Real world performances could be determinate by applying different encapsulation taxes as well usage rate. Those comes from various sources:

  • 128b/130b line code
  • Link control characters
  • Transaction header
  • Buffering capabilities (depends on device)
  • DMA usage on computer side (depends on other software, usually negligible on benchmarks)

Those physical limitations usually reduces the data rate between 90 and 95% of the transfer rate. NVLink benchmarks shows an achievable transfer rate of about 35.3 GB/s (host to device) for a 40 GB/s (2 sub-lanes uplink) NVLink connection towards a P100 GPU in a system that is driven by a set of IBM Power8 CPUs.[8]

On 5 April 2016, Nvidia announced that NVLink would be implemented in the Pascal-microarchitecture-based GP100 GPU, as used in, for example, Nvidia Tesla P100 products.[9] With the introduction of the DGX-1 high performance computer base it was possible to have up to eight P100 modules in a single rack system connected to up to two host CPUs. The carrier board (...) allows for a dedicated board for routing the NVLink connections – each P100 requires 800 pins, 400 for PCIe + power, and another 400 for the NVLinks, adding up to nearly 1600 board traces for NVLinks alone (...).[10] Each CPU has direct connection to 4 units of P100 via PCIe and each P100 has one NVLink each to the 3 other P100s in the same CPU group plus one more NVLink to one P100 in the other CPU group. Each NVLink (link interface) offers a bidirectional 20 GB/sec up 20 GB/sec down, with 4 links per GP100 GPU, for an aggregate bandwidth of 80 GB/sec up and another 80 GB/sec down.[11] NVLink supports routing so that in the DGX-1 design for every P100 a total of 4 of the other 7 P100s are directly reachable and the remaining 3 are reachable with only one hop. According to depictions in Nvidia's blog based publications from 2014 NVLink allows bundling of individual links for increased point to point performance so that for example a design with two P100s and all links established between the two units would allow the full NVLink bandwidth of 80 GB/s between them.[12]

The US Department of Energy contracted Nvidia and IBM in about November 2014 to build and deliver (set to 2017) two supercomputers named "Summit" and "Sierra",[13] which will use NVLink for the node interconnects, while a variant of InfiniBand will be used for the system interconnects.[14] These systems will combine Nvidia's Volta architecture with the POWER9 family of CPUs. At GTC2017 Nvidia presented its upcoming Volta generation of GPUs and indicated the integration of a revised version 2.0 of NVLink that would allow total i/o data rates of 300 GB/s for a single chip for this design, and further announced the option for pre-orders with a delivery promise for Q3/2017 of the DGX-1 and DGX-Station high performance computers that will be equipped with GPU modules of type V100 and have NVLink 2.0 realized in either a networked (two groups of four V100 modules with inter-group connectivity) of or a fully interconnected fashion of one group of four V100 modules.

See also

References

  1. ^ "What Is NVLink?". Nvidia. 2014-11-14.
  2. ^ Nvidia NVLINK 2.0 arrives in IBM servers next year by Jon Worrel on fudzilla.com on August 24, 2016
  3. ^ All aboard the PCIe bus for Nvidia's Tesla P100 supercomputer grunt by Chris Williams at theregister.co.uk on June 20, 2016
  4. ^ NVLink Takes GPU Acceleration To The Next Level by Timothy Prickett Morgan at nextplatform.com on May 4, 2016
  5. ^ POWER9 Webinar presentation by IBM for Power Systems VUG by Jeff Stuecheli on January 26, 2017
  6. ^ GV100 Blockdiagramm in "GTC17: NVIDIA präsentiert die nächste GPU-Architektur Volta - Tesla V100 mit 5.120 Shadereinheiten und 16 GB HBM2" by Andreas Schilling on hardwareluxx.de on May 10, 2017
  7. ^ NVIDIA Volta GV100 GPU Chip For Summit Supercomputer Twice as Fast as Pascal P100 – Speculated To Hit 9.5 TFLOPs FP64 Compute by Hassan Mujtaba at wccftech.com on December 20, 2016
  8. ^ Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers by Eliot Eshelman on microway.com on January 26, 2017
  9. ^ "Inside Pascal: NVIDIA's Newest Computing Platform". 2016-04-05.
  10. ^ Anandtech.com
  11. ^ NVIDIA Unveils the DGX-1 HPC Server: 8 Teslas, 3U, Q2 2016 by anandtech.com on April, 2016
  12. ^ How NVLink Will Enable Faster, Easier Multi-GPU Computing by Mark Harris on November 14, 2014
  13. ^ "Whitepaper: Summit and Sierra Supercomputers" (PDF). 2014-11-01.
  14. ^ "Nvidia Volta, IBM POWER9 Land Contracts For New US Government Supercomputers". AnandTech. 2014-11-17.