GeForce 900 series

GeForce 900 Series
Release date	September 2014
Codename	Maxwell
Models	GeForce Series GeForce GT Series; GeForce GTX Series;
Cards
Mid-range	GeForce GTX 960
High-end	GeForce GTX 970
Enthusiast	GeForce GTX 980
API support
DirectX	Direct3D 11.3 and Direct3D 12 (feature level 12_1)
OpenCL	1.2
OpenGL	OpenGL 4.5
History
Predecessor	GeForce 700 series
Successor	GeForce 1000 series

The GeForce 900 Series is a family of graphics processing units developed by Nvidia, used in desktop and laptop PCs. It serves as the high-end introduction for the Maxwell architecture (GM-codenamed chips), named after the Scottish theoretical physicist James Clerk Maxwell.

The Maxwell microarchitecture, the successor to Kepler microarchitecture, will for the first time feature an integrated ARM CPU of its own.^[6] This will make Maxwell GPUs more independent from the main CPU according to Nvidia's CEO Jen-Hsun Huang.^[7] Nvidia expects three major things from the Maxwell architecture: improved graphics capabilities, simplified programming as well as better energy-efficiency compared to the GeForce 700 Series and GeForce 600 Series ^[8]

Maxwell was announced in September 2010.^[9] The first GeForce consumer-class products based on the Maxwell architecture were released in early 2014.^[10] Nvidia is expected to release the Maxwell-powered Tesla accelerator cards as well as Quadro professional graphics cards based on this architecture in late 2014. Eventually, Maxwell architecture will be used for mobile application processors that belong to the Erista family of Tegra SoCs.

Architecture

First generation Maxwell (GM10x)

First generation Maxwell GM107/GM108 were released as GeForce GTX 745, GTX 750/750 Ti and GTX 850M/860M (GM107) and GTX 830M/840M (GM108). These new chips provide few consumer-facing additional features; Nvidia instead focused on power efficiency. Nvidia increased the amount of L2 cache from 256 KiB on GK107 to 2 MiB on GM107, reducing the memory bandwidth needed. Accordingly, Nvidia cut the memory bus from 192 bit on GK106 to 128 bit on GM107, further saving power.^[11] Nvidia also changed the streaming multiprocessor design from that of Kepler (SMX), naming it SMM. The structure of the warp scheduler is inherited from Kepler, which allows each scheduler to issue up to two instructions that are independent from each other and are in order from the same warp. The layout of SMM units is partitioned so that each of the 4 warp schedulers in an SMM controls 1 set of 32 FP32 CUDA cores, 1 set of 8 load/store units, and 1 set of 8 special function units. This is in contrast to Kepler, where each SMX has 4 schedulers that schedule to a shared pool of 6 sets of 32 FP32 CUDA cores, 2 sets of 16 load/store units, and 2 sets of 16 special function units.^[12] These units are connected by a crossbar that uses power to allow the resources to be shared.^[12] This crossbar is removed in Maxwell.^[12] Texture units and FP64 CUDA cores are still shared.^[11] SMM allows for a finer-grain allocation of resources than SMX, saving power when the workload isn't optimal for shared resources. Nvidia claims a 128 CUDA core SMM has 90% of the performance of a 192 CUDA core SMX.^[11] Also, each Graphics Processing Cluster, or GPC, contains up to 4 SMX units in Kepler, and up to 5 SMM units in first generation Maxwell.^[11]

GM107 supports CUDA Compute Capability 5.0 compared to 3.5 on GK110/GK208 GPUs and 3.0 on GK10x GPUs. Dynamic Parallelism and HyperQ, two features in GK110/GK208 GPUs, are also supported across the entire Maxwell product line.

Maxwell provides native shared memory atomic operations for 32-bit integers and native shared memory 32-bit and 64-bit compare-and-swap (CAS), which can be used to implement other atomic functions.

NVENC

Maxwell-based GPUs also contain the NVENC SIP block introduced with Kepler. Nvidia's video encoder, NVENC, is 1.5 to 2 times faster than on Kepler-based GPUs meaning it can encode video at 6 to 8 times playback speed.^[11]

PureVideo

Nvidia also claims an 8 to 10 times performance increase in PureVideo Feature Set E video decoding due to the video decoder cache paired with increases in memory efficiency. However, H.265 is not supported for full hardware decoding, relying on a mix of hardware and software decoding.^[11] When decoding video, a new low power state "GC5" is used on Maxwell GPUs to conserve power.^[11]

Second generation Maxwell (GM20x)

Second generation Maxwell introduced a several new technologies: Dynamic Super Resolution,^[13] Third Generation Delta Color Compression,^[14] Multi-Pixel Programming Sampling,^[15] Nvidia VXGI (Real-Time-Voxel-Global Illumination),^[16] VR Direct,^[17]^[18]^[19] Multi-Projection Acceleration,^[14] and Multi-Frame Sampled Anti-Aliasing(MFAA)^[20] however support for Coverage-Sampling Anti-Aliasing(CSAA) was removed.^[21] HDMI 2.0 support was also added.^[22]^[23]

Second generation Maxwell also changed the ROP to memory controller ratio from 8:1 to 16:1.^[24] However, some of the ROPs are generally idle in the GTX 970 because there are not enough enabled SMMs to give them work to do and therefore reduces its maximum fill rate.^[25]

Second generation Maxwell also has up to 4 SMM units per GPC, compared to 5 SMM units per GPC.^[24]

GM204 supports CUDA Compute Capability 5.2 compared to 5.0 on GM107/GM108 GPUs, 3.5 on GK110/GK208 GPUs and 3.0 on GK10x GPUs.^[14]^[24]^[26]

Maxwell second generation GM20x GPUs have an upgraded NVENC which supports HEVC encoding and adds support for H.264 encoding resolutions at 1440p/60FPS & 4K/60FPS compared to NVENC on Maxwell first generation GM10x GPUs which only supported H.264 1080p/60FPS encoding.^[19]

Maxwell GM206 GPU supports full fixed function HEVC decoding.^[27]

GeForce 970 specifications controversy

Issues with the GeForce 970's performance were first brought up by users when they found out that the cards, while featuring 4 GB of memory, rarely accessed memory over the 3.5 GB boundary. Further testing and investigation eventually led to Nvidia issuing a statement that the card's initially announced specifications had been altered without notice before the card was made commercially available, and that the card took a performance hit once memory over the 3.5 GB limit were put into use.^[28]^[29]^[30]

The card's back-end hardware specifications, initially announced as being identical to those of the GeForce 980, differed in the amount of L2 cache (1.75 MB versus 2 MB in the GeForce 980) and the amount of ROPs (56 versus 64 in the 980). Additionally, it was revealed that the card was designed to access its memory as a 3.5 GB section, plus a 0.5 GB one, access to the latter being 7 times slower than the first one.^[31] The company then went on to promise a specific driver modification in order to alleviate the performance issues produced by the cutbacks suffered by the card.^[32] However, Nvidia later clarified that the promise had been a miscommunication and there would be no specific driver update for GTX 970.^[33] Nvidia claimed that it would assist customers who wanted refunds in obtaining them.^[34] On February 26, 2015, Nvidia CEO Jen-Hsun Huang went on record in Nvidia's official blog to apologize for the incident.^[35]

Nvidia revealed that it is able to disable individual units, each containing 256KB of L2 cache and 8 ROPs, without disabling whole memory controllers.^[36] This comes at the cost of dividing the memory bus into high speed and low speed segments that cannot be accessed at the same time unless one segment is reading while the other segment is writing because the L2/ROP unit managing both of the GDDR5 controllers shares the read return channel and the write data bus between the two GDDR5 controllers and itself.^[36] This is used in the GeForce GTX 970, which therefore can be described as having 3.5 GB in its high speed segment on a 224-bit bus and 512 MB in a low speed segment on a 32-bit bus.^[36]

Future

After Maxwell, the next architecture is code-named Pascal.^[37] Nvidia has announced that the Pascal GPU will feature stacked DRAM, Unified Memory, and NVLink.^[37]

Products

GeForce 900 (9xx) series

¹ Shader Processors : Texture mapping units : Render output units
² Pixel fillrate is calculated as the lowest of three numbers: number of ROPs multiplied by the base core clock speed, number of rasterizers multiplied by the number of fragments they can generate per rasterizer multiplied by the base core clock speed, and the number of streaming multiprocessors multiplied by the number of fragments per clock that they can output multiplied by the base clock rate.^[25]
³ Texture fillrate is calculated as the number of TMUs multiplied by the base core clock speed.
⁴ Single precision performance is calculated as 2 times the number of shaders multiplied by the base core clock speed.
⁵ Double precision performance of the GTX 980, GTX 970, and GTX 960 are 1/32 of single-precision performance.^[38]
⁶ SLI support connecting up to 4 identical GPUs card for a 4-way SLI configuration. Those support 4-way SLI can support 3-way & 2-way SLI, however a Dual-GPUs card is already 2-way SLI configuration internally therefore they support 4-way SLI with an identical Dual-GPUs card but do not support 3-way SLI.
⁷ Due to the disabling of one or more L2 cache/ROP units without disabling all of the memory controllers attached to the disabled units, the memory has been segmented. One segment must be reading while the other must be writing to achieve the peak speed. Since the peak speed is impossible to reach with pure reads or pure writes, they and their associated buses are split in this table.

Model	Launch	Code name	Fab (nm)	Transistors (Million)	Die size (mm²)	GPU count	Bus interface	Memory (MiB)⁷	Core config¹	Clock speeds			Fillrate		Memory			API support (version)			Processing Power (GFLOPS)		GFLOPS/W Single Precision	TDP (watts)	SLI support⁶	Release Price (USD)
Model	Launch	Code name	Fab (nm)	Transistors (Million)	Die size (mm²)	GPU count	Bus interface	Memory (MiB)⁷	Core config¹	Base core clock (MHz)	Boost core clock (MHz)	Memory (MT/s)	Pixel (GP/s)²	Texture (GT/s)³	Bandwidth (GB/s)⁷	Bus type	Bus width (bit)⁷	DirectX	OpenGL	OpenCL	Single precision⁴	Double precision⁵	GFLOPS/W Single Precision	TDP (watts)	SLI support⁶	Release Price (USD)
GeForce GTX 960 ^[39]	Jan 22, 2015	GM206	28	2940	227	1	PCIe 3.0 x16	2048 4096	1024:64:32	1127	1178	7010	39.3	72.1	112	GDDR5	128	12.0^[2]^[5]	4.5	1.2	2308	72.1	19.2	120	2-way	$199
GeForce GTX 970 ^[40]	Sep 18, 2014	GM204	28	5200	398	1	PCIe 3.0 x16	3584+512 ^[41]	1664:104:56 ^[42]	1050	1178	7010	54.6	109.2	196+28 ^[43]	GDDR5	224+32 ^[36]	12.0^[2]^[5]	4.5	1.2^[44]	3494	109	24.1	145	3-way	$329
GeForce GTX 980 ^[45]	Sep 18, 2014	GM204	28	5200	398	1	PCIe 3.0 x16	4096	2048:128:64	1126	1216	7010	72.1	144	224	GDDR5	256	12.0^[2]^[5]	4.5	1.2^[1]	4612	144	28.0	165	4-way	$549
GeForce GTX Titan X ^[46]	Mar 17, 2015^[47]	GM200	28	8000	551^[48]	1	PCIe 3.0 x16	12288	3072:192:96^[49]	1100	1390	6008	106	211	288	GDDR5	384	12.0	4.5	1.2	6758^[50]	?	>22.5	<300^[51]^[52]	4-way^[53]	?

GeForce 900M (9xxM) series

Some implementations may use different specifications.

Model	Launch	Code name	Fab (nm)	Transistors (Million)	Die size (mm²)	GPU count	Bus interface	Memory (MiB)	Core config¹	Clock speeds			Fillrate		Memory			API support (version)			Processing Power (GFLOPS)		GFLOPS/W Single Precision	TDP (watts)	SLI support⁶
Model	Launch	Code name	Fab (nm)	Transistors (Million)	Die size (mm²)	GPU count	Bus interface	Memory (MiB)	Core config¹	Base core clock (MHz)	Boost core clock (MHz)	Memory (MT/s)	Pixel (GP/s)²	Texture (GT/s)³	Bandwidth (GB/s)	Bus type	Bus width (bit)	DirectX	OpenGL	OpenCL	Single precision⁴	Double precision⁵	GFLOPS/W Single Precision	TDP (watts)	SLI support⁶
GeForce GTX 965M ^[54]^[55]	Jan 05, 2015	GM204	28	5200(?)	398(?)	1	PCIe 3.0 x16	2048	1024:64:32	944	???	5000	30.2	60.4	80	GDDR5	128	12.0^[2]^[5]	4.5	1.2^[1]	1933	60.41	Unknown	Unknown	Unknown
GeForce GTX 970M ^[56]	Oct 07, 2014	GM204	28	5200	398	1	PCIe 3.0 x16	3072 6144	1280:80:48	924	993	5012	37.0	73.9	120	GDDR5	192^[57]	12.0^[2]^[5]	4.5	1.2^[1]	2365	73.9	Unknown	Unknown	Yes
GeForce GTX 980M ^[58]	Oct 07, 2014	GM204	28	5200	398	1	PCIe 3.0 x16	4096 8192	1536:96:64	1038	1127	5012	49.8	99.6	160	GDDR5	256^[57]	12.0^[2]^[5]	4.5	1.2^[1]	3189	99.6	Unknown	Unknown	Yes

Chipset table

References

External links

[TPUDB980-1] ttp://www.techpowerup.com/gpudb/2621/geforce-gtx-980.html

[D3D11.3-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/4

[3] ttp://blogs.nvidia.com/blog/2014/09/19/maxwell-and-dx12-delivered/

[4] ttp://blogs.msdn.com/b/directx/archive/2014/09/18/directx-12-lights-up-nvidia-s-maxwell-editor-s-day.aspx

[11.3And12.0RenderingFeaturesAreEquivalent-5] ^ ^a ^b ^c ^d ^e ^f ^g http://www.anandtech.com/show/8544/microsoft-details-direct3d-113-12-new-features

[guru3d.com-6] Nvidia Maxwell to be first GPU with ARM CPU in 2013, Guru3d.com

[7] Nvidia Maxwell Graphics Processors to Have Integrated ARM General-Purpose Cores., xbitlabs.com

[Xbitlabs-8] Nvidia: Next-Generation Maxwell Architecture Will Break New Grounds.., xbitlabs.com

[9] ttp://www.anandtech.com/show/3939/gtc-2010-reporters-notebook-day-1-nvidia-announces-future-gpu-families-for-2011-and-2013

[10] ttp://www.geforce.com/whats-new/articles/introducing-the-geforce-gtx-750-class

[anand750-11] ^ ^a ^b ^c ^d ^e ^f ^g Smith, Ryan; T S, Ganesh (18 February 2014). "The NVIDIA GeForce GTX 750 Ti and GTX 750 Review: Maxwell Makes Its Move". AnandTech. Archived from the original on 18 February 2014. Retrieved 18 February 2014.

[AnandTechGTX750-12] ttp://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3

[13] ttp://www.geforce.com/whats-new/articles/dynamic-super-resolution-instantly-improves-your-games-with-4k-quality-graphics

[international.download.nvidia.com-14] ttp://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF

[15] ttp://www.geforce.com/hardware/technology/mfaa/technology

[16] ttp://www.geforce.com/whats-new/articles/maxwells-voxel-global-illumination-technology-introduces-gamers-to-the-next-generation-of-graphics

[17] ttp://www.geforce.com/whats-new/articles/maxwell-architecture-gpus-the-only-choice-for-virtual-reality-gaming

[18] ttp://blogs.nvidia.com/blog/2014/09/18/maxwell-virtual-reality/

[anandtech.com-19] ttp://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/5

[20] ttp://www.geforce.com/whats-new/articles/multi-frame-sampled-anti-aliasing-delivers-better-performance-and-superior-image-quality

[21] ttp://forums.realhardwarereviews.com/news/new-nvidia-maxwell-chips-do-not-support-fast-csaa/

[22] ttp://www.geforce.com/whats-new/articles/maxwell-architecture-gtx-980-970

[23] ttp://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review

[AnandTech980page3-24] ttp://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/3

[techreport.com-25] ttp://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980

[26] ttp://devblogs.nvidia.com/parallelforall/maxwell-most-advanced-cuda-gpu-ever-made/

[27] ttp://www.anandtech.com/show/8923/nvidia-launches-geforce-gtx-960

[28] "NVIDIA Discloses Full Memory Structure and Limitations of GTX 970". PCPer.

[29] "GeForce GTX 970 Memory Issue Fully Explained – Nvidia's Response". WCFTech.

[30] "Why Nvidia's GTX 970 slows down when using more than 3.5GB VRAM". PCGamer.

[31] "GeForce GTX 970: Correcting The Specs & Exploring Memory Allocation". AnandTech.

[32] "NVIDIA Working on New Driver For GeForce GTX 970 To Tune Memory Allocation Problems and Improve Performance". WCFTech.

[33] "NVIDIA clarifies no driver update for GTX 970 specifically". PC World.

[34] ttp://www.pcper.com/news/Graphics-Cards/NVIDIA-Plans-Driver-Update-GTX-970-Memory-Issue-Help-Returns

[35] "Nvidia CEO addresses GTX 970 controversy". PCGamer. 2015-02-26.

[AnandTechCorrectionPage2-36] ttp://www.anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation/2

[blogs.nvidia.com-37] ttp://blogs.nvidia.com/blog/2014/03/25/gpu-roadmap-pascal/

[38] Smith, Ryan (September 18, 2014). "The NVIDIA GeForce GTX 980 Review: Maxwell Mark 2". AnandTech. p. 1. Retrieved September 19, 2014.

[39] GeForce GTX 960 | Specifications | GeForce

[40] GeForce GTX 970 | Specifications | GeForce

[NVIDIA_Responds_to_GTX_970_3.5GB_Memory_Issue-41] ttp://www.pcper.com/news/Graphics-Cards/NVIDIA-Responds-GTX-970-35GB-Memory-Issue

[GeForce_GTX_970:_Correcting_The_Specs_&_Exploring_Memory_Allocation-42] ttp://www.anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation

[43] ttp://www.anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation/4

[44] ttp://www.techpowerup.com/gpudb/2620/geforce-gtx-970.html

[45] GeForce GTX 980 | Specifications | GeForce

[46] ttp://blogs.nvidia.com/blog/2015/03/04/smaug/

[47] ttp://www.techpowerup.com/gpudb/2632/geforce-gtx-titan-x.html

[48] ttp://www.techpowerup.com/gpudb/2632/geforce-gtx-titan-x.html

[49] ttp://www.techpowerup.com/gpudb/2632/geforce-gtx-titan-x.html

[50] ttp://www.techpowerup.com/gpudb/2632/geforce-gtx-titan-x.html

[51] ttp://www.legitreviews.com/hands-nvidia-geforce-gtx-titan-x-12gb-video-card_159519

[52] Titan X requires 6+8 pin, which provide at most 300W (75+75+150).

[53] ttp://www.legitreviews.com/hands-nvidia-geforce-gtx-titan-x-12gb-video-card_159519

[54] GeForce GTX 965M | Specifications | GeForce

[55] ttp://www.techpowerup.com/gpudb/2634/geforce-gtx-965m.html

[56] GeForce GTX 970M | Specifications | GeForce

[hardware.fr-57] ttp://www.hardware.fr/focus/106/gtx-970-3-5-go-224-bit-lieu-4-go-256-bit.html

[58] GeForce GTX 980M | Specifications | GeForce

[2]

[3]

[4]

[5]

[1]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]