Nvidia Tesla
| nVidia Tesla | |
|---|---|
The Tesla graphics processing unit (GPU) is nVidia's third brand of GPUs. It is based on high-end GPUs from the G80 (and on), as well as the Quadro lineup. Tesla is nVidia's first dedicated General Purpose GPU. The Tesla series takes its name from pioneering Serbian electrical engineer Nikola Tesla.
Contents |
[edit] Tesla overview
Because of their very high computational power (measured in floating point operations per second or FLOPS) compared to previous microprocessors, the Tesla products target the high performance computing market.[1] The lack of ability to output images to a display[2] was the main difference between Tesla products and the consumer level GeForce cards and the professional level Quadro cards, but the latest Tesla C-class products include one Dual-Link DVI port[3]. (C. For equivalent single precision output, Fermi-based nVidia Geforce cards have four times less dual-precision performance. Tesla products primarily operate[4]:
- in simulations and in large scale calculations (especially floating-point calculations)
- for high-end image generation for applications in professional and scientific fields
- with the use of OpenCL or CUDA.
As of 2011[update] nVidia Teslas power the second-fastest supercomputer in the world, Tianhe-1A, in Tianjin, China.
[edit] Specifications and configurations
| Configuration | Model | # of GPUs | Core clock in MHz (each) |
Shaders | Memory | Processing Power (peak) GFLOPs[5] |
Compute capability4 | TDP watts | Form factor and features |
|||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Thread Processors (total) | Clock in MHz (each) | Bandwidth max (GB/s) | Bus type | Bus width (bit, each GPU) | Total size (MiB) | Clock (MHz) | Single Precision(SP) Total(MUL+ADD+SF) | Single Precision(SP) MAD(MUL+ADD) | Double Precision(DP) FMA | |||||||
| GPU Computing Processor1 |
C870 | 1 | 600 | 128 | 1350 | 76.8 | GDDR3 | 384 | 1536 | 1600 | 518.4 | 345.6 | 0 | 1.0 | 170.9 | Full-height video card |
| Deskside Supercomputer1 | D870 | 2 | 600 | 2 × 128 (256) | 1350 | 153.6 | GDDR3 | 384 | 3072 | 1600 | 1036.8 | 691.2 | 0 | 1.0 | 520 | Deskside system or Rack unit |
| GPU Computing Server1 |
S870 | 4 | 600 | 4 × 128 (512) | 1350 | 307.2 | GDDR3 | 384 | 6144 | 1600 | 2073.6 | 1382.4 | 0 | 1.0 | 1U Rack | |
| C1060 Computing Processor 2 |
C1060 | 1 | 602 | 240 | 1300 | 102.4 | GDDR3 | 512 | 4096 | 1600 | 933.12 | 622.08 | 77.76 | 1.3 | 187.8 | 2 slot video card |
| S1075 1U[6] GPU Computing Server3,4 |
S1070 | 4 | 602 | 4 × 240 (960) | 1440 | 409.6 | GDDR3 | 512 | 16384 | 1600 | 4147.2 | 2764.8 | 345.6 | 1.3 | 1U Rack IEEE 754-2008 capabilities |
|
| C2050/C2070/C2075 GPU Computing Processor |
C2050/C2070/C2075 | 1 | 575 | 448 | 1150 | 144 | GDDR5 | 384 | 3072/61445 | 3000 | 1288 | 1030.46 | 515.2 | 2.0 | 238/247/225 | Full-height video card IEEE 754-2008 FMA capabilities |
| M2050 GPU Computing Module |
M2050 | 1 | 575 | 448 | 1150 | 148.4 | GDDR5 | 384 | 30725 | 1546 | 1288 | 1030.46 | 515.2 | 2.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
| M2070/M2070Q[7] GPU Computing Module |
M2070/M2070Q | 1 | 575 | 448 | 1150 | 150.336 | GDDR5 | 384 | 61445 | 1566 | 1288 | 1030.46 | 515.2 | 2.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
| M2090[8][9][10] GPU Computing Module |
M2090 | 1 | 650 | 512 | 1300 | 177 | GDDR5 | 384 | 61445 | 1850 | 1331 | ? | 665 | 2.0 | 225 | Computing Module IEEE 754-2008 FMA capabilities |
| S2050 1U GPU Computing System |
S2050 | 4 | 575 | 4 × 448 (1792) | 1150 | 4 × 148.4 (593.6) | GDDR5 | 384 | 122885 | 3092 | 5152 | 4121.66 | 2060.8 | 2.0 | 900 | 1U Rack IEEE 754-2008 FMA capabilities |
Notes
- 1 Specifications not specified by NVIDIA are assumed to be based on the GeForce 8800GTX
- 2 Specifications not specified by NVIDIA are assumed to be based on the GeForce GTX 285
- 3 A host system/server is required to connect to the 1U GPU computing server by the PCI Express card (similar set-up as the Nvidia Quadro Plex)
- 4 Core architecture version according to the CUDA programming guide.
- 5 With ECC on, a portion of the dedicated memory is used for ECC bits, so the available user memory is reduced by 12.5%. (e.g. 3 GB total memory yields 2.625 GB of user available memory.)
- 6 Fermi implements the new fused multiply–add (FMA) instruction for both 32-bit single-precision and 64-bit double-precision floating point numbers (GT200 supported FMA only in double precision) that improves upon multiply-add by retaining full precision in the intermediate stage.[11]
- For the basic specifications of Tesla, refer to the GPU Computing Processor specifications.
- Performance figures are for single-precision except where noted.
- NVIDIA Tesla Supercomputers are also available with up to 8x Fermi GPUs from Manufacturers.
[edit] See also
- Nvidia Tesla Personal Supercomputer
- GeForce 8 series
- GeForce 200 Series
- GeForce 400 Series
- GeForce 500 Series
- CUDA
- GPGPU
- OpenCL
- Stream Processing
[edit] References
- ^ High Performance Computing - Supercomputing with Tesla GPUs
- ^ VR-Zone report
- ^ [1]
- ^ Tesla Technical Brief (PDF)
- ^ Nvidia Announces Tesla 20 Series
- ^ Difference between Tesla S1070 and S1075
- ^ NVidia Tesla M2050 & M2070/M2070Q Specs Online
- ^ TESLA M2090 Product brief
- ^ http://www.nvidia.com/docs/IO/43395/Tesla-M2090-Board-Specification.pdf
- ^ http://www.nvidia.com/docs/IO/105880/DS-Tesla-M-Class-Aug11.pdf
- ^ NVIDIA Fermi Compute Architecture Whitepaper.pdfPDF (855KiB), Page 13 of 22
[edit] External links
- NVIDIA Product Overview and Technical Brief
- NVIDIA's Tesla homepage
- Nvidia Tesla C2050 / C2070 GPU Computing Processor
- Nvidia Tesla S2050 GPU Computing System
- Nvidia Tesla C1060 Computing Processor
- Nvidia Tesla S1070
- Nvidia Tesla M1060 Processor
- Nvidia Parallel Nsight
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||