Nvidia Tesla

From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article is about GPGPU cards. For the GPU microarchitecture, see Tesla (microarchitecture).
Nvidia Tesla
Nvidia Tesla GPU

Nvidia Tesla is Nvidia's brand name for their products targeting stream processing and/or general purpose GPU. Products utilize GPUs from the G80 series onward. Both the underlying microarchitecture of the initial GPUs "Tesla" and the Tesla product line take their name from pioneering electrical engineer Nikola Tesla.

Overview[edit]

With their very high computational power (measured in floating point operations per second or FLOPS) compared to microprocessors, the Tesla products target the high performance computing market.[1] As of 2012, Nvidia Teslas power some of the world's fastest supercomputers, including Titan at Oak Ridge National Laboratory and Tianhe-1A, in Tianjin, China.

The lack of ability to output images to a display was the main difference between Tesla products and the consumer level GeForce cards and the professional level Quadro cards, but the latest Tesla C-class products include one Dual-Link DVI port.[2] For equivalent single precision output, Fermi-based Nvidia GeForce cards have four times less dual-precision performance. Tesla products primarily operate:[3]

  • in simulations and in large scale calculations (especially floating-point calculations)
  • for high-end image generation for applications in professional and scientific fields
  • with the use of OpenCL or CUDA.

Nvidia intends to offer ARMv8 processor cores embedded into future Tesla GPUs as part of Project Denver.[4] This will be a 64-bit follow on to the 32-bit Tegra chips.

Tesla itself will be followed by the TB/s Volta in 2016.[5]

Market[edit]

The defense industry currently accounts for less than a sixth of Tesla sales, but Sumit Gupta predicts further sales to the geospatial intelligence market.[6]

Specifications and configurations[edit]

Configuration Model Micro-architecture APIs GPUs Core clock, each
(MHz)
Shaders Memory Processing power (peak)
(GFLOP)[7]
TDP watts Form factor
and features
Compute capability4 OpenCL Thread processors (total) Clock, each (MHz) Bandwidth, max. (GB/s) Bus type GPU bus width (bit) Total size (MiB) Clock (MHz) Single precision (SP), total (MUL+ADD+SF) Single precision (SP) MAD (MUL+ADD) Double precision (DP) FMA
GPU Computing
processor1
C870 Tesla 1.0  ? 1 600 128 1350 76.8 GDDR3 384 1536 1600 518.4 345.6 N/A 170.9 Full-height video card
Deskside Supercomputer1 D870  ? 2 600 2 × 128 (256) 1350 153.6 GDDR3 384 3072 1600 1036.8 691.2 N/A 520 Deskside system or Rack unit
GPU Computing Server1 S870  ? 4 600 4 × 128 (512) 1350 307.2 GDDR3 384 6144 1600 2073.6 1382.4 N/A 1U Rack
C1060
Computing Processor 2
C1060 1.3  ? 1 602 240 1300 102.4 GDDR3 512 4096 1600 933.12 622.08 77.76 187.8 2 slot video card
S1075 1U[8]
GPU Computing
Server3,4
S1070  ? 4 602 4 × 240 (960) 1440 409.6 GDDR3 512 16384 1600 4147.2 2764.8 345.6 1U Rack
IEEE 754-2008 capabilities
C2050/C2070/C2075
GPU Computing Processor
C2050/C2070/C2075 Fermi 2.0  ? 1 575 448 1150 144 GDDR5 384 3072/61445 1500 [9][10] 1288 1030.46 515.2 238/247/225 Full-height video card
IEEE 754-2008 FMA capabilities
M2050
GPU Computing Module
M2050 1 575 448 1150 148.4 GDDR5 384 30725 1546 1288 1030.46 515.2 225 Computing Module
IEEE 754-2008 FMA capabilities
M2070/M2070Q[11]
GPU Computing Module
M2070/M2070Q 1 575 448 1150 150.336 GDDR5 384 61445 1566 1288 1030.46 515.2 225 Computing Module
IEEE 754-2008 FMA capabilities
M2090[12][13][14]
GPU Computing Module
M2090 1 650 512 1301 177 GDDR5 384 61445 1848  ? 1332.2 666.1 225 Computing Module
IEEE 754-2008 FMA capabilities
S2050 1U
GPU Computing
System
S2050 4 575 4 × 448 (1792) 1150 4 × 148.4 (593.6) GDDR5 384 122885 3092 5152 4121.66 2060.8 900 1U Rack
IEEE 754-2008 FMA capabilities
K10
GPU Computing Module
K10 / GK104 Kepler 3.0  ? 2 745 1536 per GPU 745 160 per GPU GDDR5 256 per GPU 4096 per GPU 2500 2288 per GPU - 95 per GPU 225 Computing Module
IEEE 754-2008 FMA capabilities
K20 GPU Computing Module GK110 3.5 1.1 1 706 2496 706 208 GDDR5 320 5120 2600 3520 - 1170 225 Computing Module IEEE 754-2008 FMA capabilities
K20X GPU Computing Module GK110 1 732 2688 732 250 GDDR5 384 6144 2600 3950 - 1310 235[15] Computing Module IEEE 754-2008 FMA capabilities
K40 GPU Computing Module GK110 1 2880 745 288 GDDR5 384 12288 3004 4290 1430 245 Computing Module IEEE 754-2008 FMA capabilities
K80 GPU Computing Module GK210 3.7 2 2496 x 2 (4992) 562 (Base)
875 (Boost)
240 per GPU GDDR5 384 12288 per GPU 2505 8740 2910 300 Computing Module IEEE 754-2008 FMA capabilities
Nvidia Tesla 2075

Notes

  • 1 Specifications not specified by Nvidia are assumed to be based on the GeForce 8800GTX
  • 2 Specifications not specified by Nvidia are assumed to be based on the GeForce GTX 285
  • 3 A host system/server is required to connect to the 1U GPU computing server by the PCI Express card (similar set-up as the Nvidia Quadro Plex)
  • 4 Core architecture version according to the CUDA programming guide.
  • 5 With ECC on, a portion of the dedicated memory is used for ECC bits, so the available user memory is reduced by 12.5%. (e.g., 3 GB total memory yields 2.625 GB of user available memory.)
  • 6 Fermi implements the new fused multiply–add (FMA) instruction for both 32-bit single-precision and 64-bit double-precision floating point numbers (GT200 supported FMA only in double precision) that improves upon multiply-add by retaining full precision in the intermediate stage.[16]
  • Performance figures are for single-precision except where noted.
  • NVIDIA Tesla Supercomputers are also available with up to 8× Fermi GPUs from manufacturers.

See also[edit]

References[edit]

External links[edit]