Teraflops Research Chip

Teraflops Research Chip
General information
Launched	2006
Designed by	Intel Tera-Scale Computing Research Program
Performance
Max. CPU clock rate	5.67 GHz
Data width	38-bit
Architecture and classification
Instructions	96-bit VLIW
Physical specifications
Transistors	100,000,000;
Cores	80;
Socket	custom 1248-pin LGA (343 signal pins);
History
Successor	Xeon Phi

Intel Teraflops Research Chip (codenamed Polaris) is a research manycore processor containing 80 cores, using a network-on-chip architecture, developed by Intel's Tera-Scale Computing Research Program.^[1] It was manufactured using a 65 nm CMOS process with eight layers of copper interconnect and contains 100 million transistors on a 275 mm² die.^[2]^[3]^[4] Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0 TFLOPS while dissipating less than 100 W.^[3] Research from the project was later incorporated into Xeon Phi. The technical lead of the project was Sriram R. Vangal.^[4]

The processor was initially presented at the Intel Developer Forum on September 26, 2006^[5] and officially announced on February 11, 2007.^[6] A working chip was presented at the 2007 IEEE International Solid-State Circuits Conference, alongside technical specifications.^[2]

Architecture

The chip consists of a 10x8 2D mesh network of cores and nominally operates at 4 GHz.^{[nb 1]} Each core, called a tile (3 mm²), contains a processing engine and a 5-port wormhole-switched router (0.34 mm²) with mesochronous interfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.^[2] The processing engine in each tile contains two independent, 9-stage pipeline, single-precision floating-point multiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.^[3] Each FPMAC unit is capable of performing 2 single-precision floating-point operations per cycle. Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bit very long instruction word (VLIW) encodes up to eight operations per cycle.^[3] The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.^[4] Underneath each tile, a 256 KB SRAM module (codenamed Freya) was 3D stacked, thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB.^[7] The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.^[8]

Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.^[4]^[9]

Instruction types and their latency^[4]
Instruction type	Latency (cycles)
FPMAC	9
LOAD/STORE	2
SEND/RECEIVE	2
JUMP/BRANCH	1
STALL/WFD	?
SLEEP/WAKE	6

Application performance of Teraflops Research Chip^{[nb 2]}^[4]
Application	$FLOP$ count	${\text{TFLOPS}}_{avg}$	$\%{\text{TFLOPS}}_{peak}$	Active tiles
Stencil	358K	1.00	73.3%	80
SGEMM: Matrix multiplication	2.63M	0.51	37.5%	80
Spreadsheet	64.2K	0.45	33.2%	80
2D FFT	196K	0.02	2.73%	64

Experimental results of the Teraflops Research Chip^{[nb 3]}
$V_{CC}$	$f_{max}$ ^{[nb 4]}	${\text{TFLOPS}}_{peak}$ ^{[nb 5]}	Power^{[nb 6]}	$T$	Source
0.60 V	1.0 GHz	0.32 TFLOPS	11 W	110 °C	^[2]
0.675 V	1.0 GHz	0.32 TFLOPS	15.6 W	80 °C	^[4]
0.70 V	1.5 GHz	0.48 TFLOPS	25 W	110 °C	^[2]
0.70 V	1.35 GHz	0.43 TFLOPS	18 W	80 °C	^[4]
0.75 V	1.6 GHz	0.51 TFLOPS	21 W	80 °C	^[4]
0.80 V	2.1 GHz	0.67 TFLOPS	42 W	110 °C	^[2]
0.80 V	2.0 GHz	0.64 TFLOPS	26 W	80 °C	^[4]
0.85 V	2.4 GHz	0.77 TFLOPS	32 W	80 °C	^[4]
0.90 V	2.6 GHz	0.83 TFLOPS	70 W	110 °C	^[2]
0.90 V	2.85 GHz	0.91 TFLOPS	45 W	80 °C	^[4]
0.95 V	3.16 GHz	1.0 TFLOPS	62 W	80 °C	^[4]
1.00 V	3.13 GHz	1.0 TFLOPS	98 W	110 °C	^[2]
1.00 V	3.8 GHz	1.22 TFLOPS	78 W	80 °C	^[4]
1.05 V	4.2 GHz	1.34 TFLOPS	82 W	80 °C	^[4]
1.10 V	3.5 GHz	1.12 TFLOPS	135 W	110 °C	^[2]
1.10 V	4.5 GHz	1.44 TFLOPS	105 W	80 °C	^[4]
1.15 V	4.8 GHz	1.54 TFLOPS	128 W	80 °C	^[4]
1.20 V	4.0 GHz	1.28 TFLOPS	181 W	110 °C	^[2]
1.20 V	5.1 GHz	1.63 TFLOPS	152 W	80 °C	^[4]
1.25 V	5.3 GHz	1.70 TFLOPS	165 W	80 °C	^[4]
1.30 V	4.4 GHz	1.39 TFLOPS	?	110 °C	^[2]
1.30 V	5.5 GHz	1.76 TFLOPS	210 W	80 °C	^[4]
1.35 V	5.67 GHz	1.81 TFLOPS	230 W	80 °C	^[4]
1.40 V	4.8 GHz	1.52 TFLOPS	?	110 °C	^[2]

Issues

Intel aimed to help software development for the new exotic architecture by creating a new programming model, especially for the chip, called Ct. The model never gained the following Intel hoped for and has been eventually incorporated into Intel Array Building Blocks, a now defunct C++ library.

Notes

^ Though the chip was later shown by Intel to run as high as 5.67 GHz.
^ At 1.07 V and 4.27 GHz.
^ All measurements present performance with all 80 cores active.
^ Substantially higher frequencies at the same voltages (compared to the initial ISSCC report) were attained in 2008 with use of a custom cooling solution.
^ Values in italic were extrapolated by ${\text{FLOPS}}_{peak}=f_{max}\cdot 80{\text{ tiles}}\cdot 2{\tfrac {\text{FPMAC}}{\text{tile}}}\cdot 2{\tfrac {\text{FLOPS}}{{\text{FPMAC}}\cdot {\text{cycle}}}}$ , where the maximal frequency was manually extracted from plots and are thus only approximate in their nature.
^ Values in italic were manualy extracted from plots and are thus only approximate in their nature.

References

^ Intel Corporation. "Teraflops Research Chip". Archived from the original on July 22, 2010.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l Vangal, Sriram; Howard, Jason; Ruhl, Gregory; Dighe, Saurabh; Wilson, Howard; Tschanz, James; Finan, David; Iyer, Priya; Singh, Arvind; Jacob, Tiju; Jain, Shailendra (2007). "An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS". 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers: 98–589. doi:10.1109/ISSCC.2007.373606. ISBN 978-1-4244-0852-8. S2CID 20065641.
^ ^a ^b ^c ^d Peh, Li-Shiuan; Keckler, Stephen W.; Vangal, Sriram (2009), Keckler, Stephen W.; Olukotun, Kunle; Hofstee, H. Peter (eds.), "On-Chip Networks for Multicore Systems", Multicore Processors and Systems, Springer US, pp. 35–71, Bibcode:2009mps..book...35P, doi:10.1007/978-1-4419-0263-4_2, ISBN 978-1-4419-0262-7, retrieved 2020-05-14
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u Vangal, S.R.; Howard, J.; Ruhl, G.; Dighe, S.; Wilson, H.; Tschanz, J.; Finan, D.; Singh, A.; Jacob, T.; Jain, S.; Erraguntla, V. (2008). "An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS". IEEE Journal of Solid-State Circuits. 43 (1): 29–41. Bibcode:2008IJSSC..43...29V. doi:10.1109/JSSC.2007.910957. ISSN 0018-9200. S2CID 15672087.
^ "Intel Develops Tera-Scale Research Chips". Intel News Release. 2006.
^ Intel Corporation (February 11, 2007). "Intel Research Advances 'Era Of Tera'". Intel Press Room. Archived from the original on April 13, 2009.
^ Bautista, Jerry (2008). "Tera-scale computing and interconnect challenges - 3D stacking considerations". 2008 IEEE Hot Chips 20 Symposium (HCS). Stanford, CA, USA: IEEE: 1–34. doi:10.1109/HOTCHIPS.2008.7476514. ISBN 978-1-4673-8871-9. S2CID 26400101.
^ Intel's Teraflops Research Chip (PDF). Intel Corporation. 2007. Archived (PDF) from the original on February 18, 2020.
^ Fossum, Tryggve (2007). High End MPSOC - The Personal Super Computer (PDF). MPSoC Conference 2007. p. 6.{{cite book}}: CS1 maint: location (link) CS1 maint: location missing publisher (link)

[7] Though the chip was later shown by Intel to run as high as 5.67 GHz.

[11] At 1.07 V and 4.27 GHz.

[12] All measurements present performance with all 80 cores active.

[:0-13] Substantially higher frequencies at the same voltages (compared to the initial ISSCC report) were attained in 2008 with use of a custom cooling solution.

[14] Values in italic were extrapolated by ${\text{FLOPS}}_{peak}=f_{max}\cdot 80{\text{ tiles}}\cdot 2{\tfrac {\text{FPMAC}}{\text{tile}}}\cdot 2{\tfrac {\text{FLOPS}}{{\text{FPMAC}}\cdot {\text{cycle}}}}$ , where the maximal frequency was manually extracted from plots and are thus only approximate in their nature.

[15] Values in italic were manualy extracted from plots and are thus only approximate in their nature.

[:0-1] Intel Corporation. "Teraflops Research Chip". Archived from the original on July 22, 2010.

[:1-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l Vangal, Sriram; Howard, Jason; Ruhl, Gregory; Dighe, Saurabh; Wilson, Howard; Tschanz, James; Finan, David; Iyer, Priya; Singh, Arvind; Jacob, Tiju; Jain, Shailendra (2007). "An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS". 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers: 98–589. doi:10.1109/ISSCC.2007.373606. ISBN 978-1-4244-0852-8. S2CID 20065641.

[:2-3] Peh, Li-Shiuan; Keckler, Stephen W.; Vangal, Sriram (2009), Keckler, Stephen W.; Olukotun, Kunle; Hofstee, H. Peter (eds.), "On-Chip Networks for Multicore Systems", Multicore Processors and Systems, Springer US, pp. 35–71, Bibcode:2009mps..book...35P, doi:10.1007/978-1-4419-0263-4_2, ISBN 978-1-4419-0262-7, retrieved 2020-05-14

[:4-4] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u Vangal, S.R.; Howard, J.; Ruhl, G.; Dighe, S.; Wilson, H.; Tschanz, J.; Finan, D.; Singh, A.; Jacob, T.; Jain, S.; Erraguntla, V. (2008). "An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS". IEEE Journal of Solid-State Circuits. 43 (1): 29–41. Bibcode:2008IJSSC..43...29V. doi:10.1109/JSSC.2007.910957. ISSN 0018-9200. S2CID 15672087.

[5] "Intel Develops Tera-Scale Research Chips". Intel News Release. 2006.

[6] Intel Corporation (February 11, 2007). "Intel Research Advances 'Era Of Tera'". Intel Press Room. Archived from the original on April 13, 2009.

[8] Bautista, Jerry (2008). "Tera-scale computing and interconnect challenges - 3D stacking considerations". 2008 IEEE Hot Chips 20 Symposium (HCS). Stanford, CA, USA: IEEE: 1–34. doi:10.1109/HOTCHIPS.2008.7476514. ISBN 978-1-4673-8871-9. S2CID 26400101.

[:3-9] Intel's Teraflops Research Chip (PDF). Intel Corporation. 2007. Archived (PDF) from the original on February 18, 2020.

[10] Fossum, Tryggve (2007). High End MPSOC - The Personal Super Computer (PDF). MPSoC Conference 2007. p. 6.{{cite book}}: CS1 maint: location (link) CS1 maint: location missing publisher (link)

[1]

[2]

[3]

[4]

[5]

[6]

[nb 1]

[7]

[8]

[9]

[nb 2]

[nb 3]

[nb 4]

[nb 5]

[nb 6]

Architecture

Issues

See also

Notes

References