Maxwell (microarchitecture)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Nvidia Maxwell
History
Predecessor Kepler
Successor Pascal

Maxwell is the codename for a GPU microarchitecture developed by Nvidia as the successor to the Kepler microarchitecture. The Maxwell architecture was introduced in later models of the GeForce 700 series and is also used in the GeForce 800M series, GeForce 900 series, and Quadro Kxxx series, all manufactured in 28 nm.[1]

The very first Maxwell-based products to hit the market were the GeForce GTX 750 and the GeForce GTX 750 Ti. Both were released on February 18, 2014, both with the chip code number GM107. Earlier GeForce 700 series GPUs had used Kepler chips with the code numbers GK1xx. The GM10x GPUs are also used in the GeForce 800M series and the Quadro Kxxx series.

A second generation of Maxwell-based products was introduced on September 18, 2014 with the GeForce GTX 970 and GeForce GTX 980, followed by the GeForce GTX 960 on January 22, 2015. These GPUs have GM2xx chip code numbers.

Maxwell introduced an all-new design for the Streaming Multiprocessor (SM) that dramatically improves power efficiency.[2]

Maxwell introduced the sixth generation PureVideo HD and CUDA Compute Capability 5.2.

First generation Maxwell (GM10x)[edit]

First generation Maxwell GM107/GM108 were released as GeForce GTX 745, GTX 750/750 Ti and GTX 850M/860M (GM107) and GTX 830M/840M (GM108). These new chips provide few consumer-facing additional features; Nvidia instead focused on power efficiency. Nvidia increased the amount of L2 cache from 256 KiB on GK107 to 2 MiB on GM107, reducing the memory bandwidth needed. Accordingly, Nvidia cut the memory bus from 192 bit on GK106 to 128 bit on GM107, further saving power.[3] Nvidia also changed the streaming multiprocessor design from that of Kepler (SMX), naming it SMM. The structure of the warp scheduler is inherited from Kepler, which allows each scheduler to issue up to two instructions that are independent from each other and are in order from the same warp. The layout of SMM units is partitioned so that each of the 4 warp schedulers in an SMM controls 1 set of 32 FP32 CUDA cores, 1 set of 8 load/store units, and 1 set of 8 special function units. This is in contrast to Kepler, where each SMX has 4 schedulers that schedule to a shared pool of 6 sets of 32 FP32 CUDA cores, 2 sets of 16 load/store units, and 2 sets of 16 special function units.[4] These units are connected by a crossbar that uses power to allow the resources to be shared.[4] This crossbar is removed in Maxwell.[4] Texture units and FP64 CUDA cores are still shared.[3] SMM allows for a finer-grain allocation of resources than SMX, saving power when the workload isn't optimal for shared resources. Nvidia claims a 128 CUDA core SMM has 90% of the performance of a 192 CUDA core SMX.[3] Also, each Graphics Processing Cluster, or GPC, contains up to 4 SMX units in Kepler, and up to 5 SMM units in first generation Maxwell.[3]

GM107 supports CUDA Compute Capability 5.0 compared to 3.5 on GK110/GK208 GPUs and 3.0 on GK10x GPUs. Dynamic Parallelism and HyperQ, two features in GK110/GK208 GPUs, are also supported across the entire Maxwell product line.

Maxwell provides native shared memory atomic operations for 32-bit integers and native shared memory 32-bit and 64-bit compare-and-swap (CAS), which can be used to implement other atomic functions.

NVENC[edit]

Main article: Nvidia NVENC

Maxwell-based GPUs also contain the NVENC SIP block introduced with Kepler. Nvidia's video encoder, NVENC, is 1.5 to 2 times faster than on Kepler-based GPUs meaning it can encode video at 6 to 8 times playback speed.[3]

PureVideo[edit]

Main article: Nvidia PureVideo

Nvidia also claims an 8 to 10 times performance increase in PureVideo Feature Set E video decoding due to the video decoder cache paired with increases in memory efficiency. However, H.265 is not supported for full hardware decoding, relying on a mix of hardware and software decoding.[3] When decoding video, a new low power state "GC5" is used on Maxwell GPUs to conserve power.[3]

Second generation Maxwell (GM20x)[edit]

Second generation Maxwell introduced several new technologies: Dynamic Super Resolution,[5] Third Generation Delta Color Compression,[6] Multi-Pixel Programming Sampling,[7] Nvidia VXGI (Real-Time-Voxel-Global Illumination),[8] VR Direct,[9][10][11] Multi-Projection Acceleration,[6] and Multi-Frame Sampled Anti-Aliasing(MFAA)[12] however support for Coverage-Sampling Anti-Aliasing(CSAA) was removed.[13] HDMI 2.0 support was also added.[14][15]

Second generation Maxwell also changed the ROP to memory controller ratio from 8:1 to 16:1.[16] However, some of the ROPs are generally idle in the GTX 970 because there are not enough enabled SMMs to give them work to do and therefore reduces its maximum fill rate.[17]

Second generation Maxwell also has up to 4 SMM units per GPC, compared to 5 SMM units per GPC.[16]

GM204 supports CUDA Compute Capability 5.2 compared to 5.0 on GM107/GM108 GPUs, 3.5 on GK110/GK208 GPUs and 3.0 on GK10x GPUs.[6][16][18]

Maxwell second generation GM20x GPUs have an upgraded NVENC which supports HEVC encoding and adds support for H.264 encoding resolutions at 1440p/60FPS & 4K/60FPS compared to NVENC on Maxwell first generation GM10x GPUs which only supported H.264 1080p/60FPS encoding.[11]

After consumer complaints,[19] Nvidia revealed that it is able to disable individual units each containing 256KB of L2 cache and 8 ROPs without disabling whole memory controllers.[20] This comes at the costs of dividing the memory bus into high speed and low speed segments that cannot be accessed at the same time for reads because the L2/ROP unit managing both of the GDDR5 controllers shares the read return channel and the write data bus between the GDDR5 controllers, making either simultaneously reading from both GDDR5 controllers or simultaneously writing to both GDDR5 controllers impossible.[20] This is used in the GeForce GTX 970, which therefore can be described as having 3.5 GB in its high speed segment on a 224-bit bus and 512 MB in a low speed segment on a 32-bit bus.[20] The peak speed of such a GPU can still be attained, but the peak speed figure is only reachable if one segment is executing a read operation while the other segment is executing a write.[20]

Successor[edit]

After Maxwell, the next architecture will be codenamed Pascal.[21] Nvidia has announced that the Pascal GPU will feature stacked DRAM, Unified Memory, and NVLink.[21]

See also[edit]

References[edit]

  1. ^ http://videocardz.com/50902/nvidia-geforce-gtx-880-gtx-870-coming-fall
  2. ^ "5 Things You Should Know About the New Maxwell GPU Architecture". 2014-02-21. 
  3. ^ a b c d e f g Smith, Ryan; T S, Ganesh (18 February 2014). "The NVIDIA GeForce GTX 750 Ti and GTX 750 Review: Maxwell Makes Its Move". AnandTech. Archived from the original on 18 February 2014. Retrieved 18 February 2014. 
  4. ^ a b c http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3
  5. ^ http://www.geforce.com/whats-new/articles/dynamic-super-resolution-instantly-improves-your-games-with-4k-quality-graphics
  6. ^ a b c http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF
  7. ^ http://www.geforce.com/hardware/technology/mfaa/technology
  8. ^ http://www.geforce.com/whats-new/articles/maxwells-voxel-global-illumination-technology-introduces-gamers-to-the-next-generation-of-graphics
  9. ^ http://www.geforce.com/whats-new/articles/maxwell-architecture-gpus-the-only-choice-for-virtual-reality-gaming
  10. ^ http://blogs.nvidia.com/blog/2014/09/18/maxwell-virtual-reality/
  11. ^ a b http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/5
  12. ^ http://www.geforce.com/whats-new/articles/multi-frame-sampled-anti-aliasing-delivers-better-performance-and-superior-image-quality
  13. ^ http://forums.realhardwarereviews.com/news/new-nvidia-maxwell-chips-do-not-support-fast-csaa/
  14. ^ http://www.geforce.com/whats-new/articles/maxwell-architecture-gtx-980-970
  15. ^ http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review
  16. ^ a b c http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/3
  17. ^ http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980
  18. ^ http://devblogs.nvidia.com/parallelforall/maxwell-most-advanced-cuda-gpu-ever-made/
  19. ^ http://www.lazygamer.net/general-news/nvidias-gtx970-has-a-rather-serious-memory-allocation-bug/
  20. ^ a b c d http://www.anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation/2
  21. ^ a b http://blogs.nvidia.com/blog/2014/03/25/gpu-roadmap-pascal/