Tensor Processing Unit: Difference between revisions

Content deleted Content added

Inline

Revision as of 01:17, 29 June 2019

A tensor processing unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning.

Overview

The tensor processing unit was announced in May 2016 at Google I/O, when the company said that the TPU had already been used inside their data centers for over a year.^[1]^[2] The chip has been specifically designed for Google's TensorFlow framework, a symbolic math library which is used for machine learning applications such as neural networks.^[3] However, Google still uses CPUs and GPUs for other types of machine learning.^[1] Other AI accelerator designs are appearing from other vendors also and are aimed at embedded and robotics markets.

Google's TPUs are proprietary and are not commercially available, although on February 12, 2018, The New York Times reported that Google "would allow other companies to buy access to those chips through its cloud-computing service."^[4] Google has stated that they were used in the AlphaGo versus Lee Sedol series of man-machine Go games,^[2] as well as in the AlphaZero system which produced Chess, Shogi and Go playing programs from the game rules alone and went on to beat the leading programs in those games.^[5] Google has also used TPUs for Google Street View text processing, and was able to find all the text in the Street View database in less than five days. In Google Photos, an individual TPU can process over 100 million photos a day. It is also used in RankBrain which Google uses to provide search results.^[6]

Compared to a graphics processing unit, it is designed for a high volume of low precision computation (e.g. as little as 8-bit precision)^[7] with higher input/output operations per joule, and lacks hardware for rasterisation/texture mapping.^[2] The TPU ASICs are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center rack, according to Google Distinguished Hardware Engineer Norman Jouppi.^[1]

Products

First generation TPU

The first-generation TPU is an 8-bit matrix multiplication engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus. It is manufactured on a 28 nm process with a die size ≤ 331 mm². The clock speed is 700 MHz and it has a thermal design power of 28-40 W. It has 28 MiB of on chip memory, and 4 MiB of 32-bit accumulators taking the results of a 256x256 systolic array of 8-bit multipliers.^[8] Within the TPU package is 8 GiB of dual-channel 2133 MHz DDR3 SDRAM offering 34 GB/s of bandwidth.^[9] Instructions transfer data to or from the host, perform matrix multiplications or convolutions, and apply activation functions.^[8]

Second generation TPU

The second-generation TPU was announced in May 2017.^[10] Google stated the first-generation TPU design was limited by memory bandwidth and using 16 GB of High Bandwidth Memory in the second-generation design increased bandwidth to 600GB/s and performance to 45 TFLOPS.^[9] The TPUs are then arranged into four-chip modules with a performance of 180 TFLOPS.^[10] Then 64 of these modules are assembled into 256-chip pods with 11.5 PFLOPS of performance.^[10] Notably, while the first-generation TPUs were limited to integers, the second-generation TPUs can also calculate in floating point. This makes the second-generation TPUs useful for both training and inference of machine learning models. Google has stated these second-generation TPUs will be available on the Google Compute Engine for use in TensorFlow applications.^[11]

Third generation TPU

The third-generation TPU was announced on May 8, 2018.^[12] Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation.^[13]^[14] This results in an 8-fold increase in performance per pod (with up to 1,024 chips per pod) compared to the second-generation TPU deployment.

Edge TPU

In July 2018, Google announced the Edge TPU. The Edge TPU is Google’s purpose-built ASIC chip designed to run machine learning (ML) models for edge computing, meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs). In January 2019, Google made the Edge TPU available to developers with a line of products under the Coral brand.

The product offerings include a single board computer (SBC), a system on module (SoM), a USB accessory, a mini PCI-e card, and an M.2 card. The SBC Coral Dev Board and Coral SoM both run Mendel Linux OS - a derivative of Debian. The USB, PCI-e, and M.2 products function as add-ons to existing computer systems, and support Debian-based Linux systems on x86-64 and ARM64 hosts (including Raspberry Pi).

The machine learning runtime used to execute models on the Edge TPU is based on TensorFlow Lite.^[15] The Edge TPU is only capable of accelerating forward-pass operations, which means it's primarily useful for performing inferences (although it is possible to perform lightweight transfer learning on the Edge TPU^[16]). The Edge TPU also only supports 8-bit math, meaning that for a network to be compatible with the Edge TPU, it needs to be trained using TensorFlow quantization-aware training technique.

References

^ ^a ^b ^c "Google's Tensor Processing Unit explained: this is what the future of computing looks like". TechRadar. Retrieved 2017-01-19.
^ ^a ^b ^c Jouppi, Norm (May 18, 2016). "Google supercharges machine learning tasks with TPU custom chip". Google Cloud Platform Blog. Google. Retrieved 2017-01-22.
^ "TensorFlow: Open source machine learning" "It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip
^ "Google Makes Its Special A.I. Chips Available to Others". The New York Times. Retrieved 2018-02-12.
^ McGourty, Colin (6 December 2017). "DeepMind's AlphaZero crushes chess". chess24.com. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)
^ "Google's Tensor Processing Unit could advance Moore's Law 7 years into the future". PCWorld. Retrieved 2017-01-19.
^ Armasu, Lucian (2016-05-19). "Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)". Tom's Hardware. Retrieved 2016-06-26.
^ ^a ^b Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh; Boden, Nan; Borchers, Al; Boyle, Rick; Cantin, Pierre-luc; Chao, Clifford; Clark, Chris; Coriell, Jeremy; Daley, Mike; Dau, Matt; Dean, Jeffrey; Gelb, Ben; Ghaemmaghami, Tara Vazir; Gottipati, Rajendra; Gulland, William; Hagmann, Robert; Ho, C. Richard; Hogberg, Doug; Hu, John; Hundt, Robert; Hurt, Dan; Ibarz, Julian; Jaffey, Aaron; Jaworski, Alek; Kaplan, Alexander; Khaitan, Harshit; Koch, Andy; Kumar, Naveen; Lacy, Steve; Laudon, James; Law, James; Le, Diemthu; Leary, Chris; Liu, Zhuyuan; Lucke, Kyle; Lundin, Alan; MacKean, Gordon; Maggiore, Adriana; Mahony, Maire; Miller, Kieran; Nagarajan, Rahul; Narayanaswami, Ravi; Ni, Ray; Nix, Kathy; Norrie, Thomas; Omernick, Mark; Penukonda, Narayana; Phelps, Andy; Ross, Jonathan; Ross, Matt; Salek, Amir; Samadiani, Emad; Severn, Chris; Sizikov, Gregory; Snelham, Matthew; Souter, Jed; Steinberg, Dan; Swing, Andy; Tan, Mercedes; Thorson, Gregory; Tian, Bo; Toma, Horia; Tuttle, Erick; Vasudevan, Vijay; Walter, Richard; Wang, Walter; Wilcox, Eric; Yoon, Doe Hyun (June 26, 2017). In-Datacenter Performance Analysis of a Tensor Processing Unit™ (PDF). Toronto, Canada. Retrieved 17 November 2017.
^ ^a ^b Kennedy, Patrick (22 August 2017). "Case Study on the Google TPU and GDDR5 from Hot Chips 29". Serve The Home. Retrieved 23 August 2017.
^ ^a ^b ^c Bright, Peter (17 May 2017). "Google brings 45 teraflops tensor flow processors to its compute cloud". Ars Technica. Retrieved 30 May 2017.
^ Kennedy, Patrick (17 May 2017). "Google Cloud TPU Details Revealed". Serve The Home. Retrieved 30 May 2017.
^ Frumusanu, Andre (8 May 2018). "Google I/O Opening Keynote Live-Blog". Retrieved 9 May 2018.
^ Feldman, Michael (11 May 2018). "Google Offers Glimpse of Third-Generation TPU Processor". Top 500. Retrieved 14 May 2018.
^ Teich, Paul (10 May 2018). "Tearing Apart Google's TPU 3.0 AI Coprocessor". The Next Platform. Retrieved 14 May 2018.
^ "Bringing intelligence to the edge with Cloud IoT". Google Blog. 2018-07-25. Retrieved 2018-07-25.
^ "Retrain an image classification model on-device". Coral. Retrieved 2019-05-03.

External links

[:0-1] "Google's Tensor Processing Unit explained: this is what the future of computing looks like". TechRadar. Retrieved 2017-01-19.

[GCP_blog_2016-2] Jouppi, Norm (May 18, 2016). "Google supercharges machine learning tasks with TPU custom chip". Google Cloud Platform Blog. Google. Retrieved 2017-01-22.

[YoutubeClip-3] "TensorFlow: Open source machine learning" "It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip

[4] "Google Makes Its Special A.I. Chips Available to Others". The New York Times. Retrieved 2018-02-12.

[5] McGourty, Colin (6 December 2017). "DeepMind's AlphaZero crushes chess". chess24.com. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)

[6] "Google's Tensor Processing Unit could advance Moore's Law 7 years into the future". PCWorld. Retrieved 2017-01-19.

[7] Armasu, Lucian (2016-05-19). "Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)". Tom's Hardware. Retrieved 2016-06-26.

[InDatacenterPerformanceAnalysisOfATensorProcessingUnit-2017-8] Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh; Boden, Nan; Borchers, Al; Boyle, Rick; Cantin, Pierre-luc; Chao, Clifford; Clark, Chris; Coriell, Jeremy; Daley, Mike; Dau, Matt; Dean, Jeffrey; Gelb, Ben; Ghaemmaghami, Tara Vazir; Gottipati, Rajendra; Gulland, William; Hagmann, Robert; Ho, C. Richard; Hogberg, Doug; Hu, John; Hundt, Robert; Hurt, Dan; Ibarz, Julian; Jaffey, Aaron; Jaworski, Alek; Kaplan, Alexander; Khaitan, Harshit; Koch, Andy; Kumar, Naveen; Lacy, Steve; Laudon, James; Law, James; Le, Diemthu; Leary, Chris; Liu, Zhuyuan; Lucke, Kyle; Lundin, Alan; MacKean, Gordon; Maggiore, Adriana; Mahony, Maire; Miller, Kieran; Nagarajan, Rahul; Narayanaswami, Ravi; Ni, Ray; Nix, Kathy; Norrie, Thomas; Omernick, Mark; Penukonda, Narayana; Phelps, Andy; Ross, Jonathan; Ross, Matt; Salek, Amir; Samadiani, Emad; Severn, Chris; Sizikov, Gregory; Snelham, Matthew; Souter, Jed; Steinberg, Dan; Swing, Andy; Tan, Mercedes; Thorson, Gregory; Tian, Bo; Toma, Horia; Tuttle, Erick; Vasudevan, Vijay; Walter, Richard; Wang, Walter; Wilcox, Eric; Yoon, Doe Hyun (June 26, 2017). In-Datacenter Performance Analysis of a Tensor Processing Unit™ (PDF). Toronto, Canada. Retrieved 17 November 2017.

[TPU_memory-9] Kennedy, Patrick (22 August 2017). "Case Study on the Google TPU and GDDR5 from Hot Chips 29". Serve The Home. Retrieved 23 August 2017.

[TFP_v2-10] Bright, Peter (17 May 2017). "Google brings 45 teraflops tensor flow processors to its compute cloud". Ars Technica. Retrieved 30 May 2017.

[11] Kennedy, Patrick (17 May 2017). "Google Cloud TPU Details Revealed". Serve The Home. Retrieved 30 May 2017.

[12] Frumusanu, Andre (8 May 2018). "Google I/O Opening Keynote Live-Blog". Retrieved 9 May 2018.

[13] Feldman, Michael (11 May 2018). "Google Offers Glimpse of Third-Generation TPU Processor". Top 500. Retrieved 14 May 2018.

[14] Teich, Paul (10 May 2018). "Tearing Apart Google's TPU 3.0 AI Coprocessor". The Next Platform. Retrieved 14 May 2018.

[15] "Bringing intelligence to the edge with Cloud IoT". Google Blog. 2018-07-25. Retrieved 2018-07-25.

[16] "Retrain an image classification model on-device". Coral. Retrieved 2019-05-03.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

@@ Line 8: / Line 8: @@
 Google's TPUs are proprietary and are not commercially available, although on February 12, 2018, ''The New York Times'' reported that Google "would allow other companies to buy access to those chips through its cloud-computing service."<ref>{{Cite news|url=https://www.nytimes.com/2018/02/12/technology/google-artificial-intelligence-chips.html|title=Google Makes Its Special A.I. Chips Available to Others|newspaper=The New York Times|language=en|access-date=2018-02-12}}</ref>  Google has stated that they were used in the [[AlphaGo versus Lee Sedol]] series of man-machine [[Go (game)|Go]] games,<ref name="GCP blog 2016" /> as well as in the [[AlphaZero]] system which produced [[Chess]], [[Shogi]] and Go playing programs from the game rules alone and went on to beat the leading programs in those games.<ref>{{Cite web|url=https://chess24.com/en/read/news/deepmind-s-alphazero-crushes-chess|title=DeepMind’s AlphaZero crushes chess|last=McGourty|first=Colin|date=6 December 2017|website=chess24.com|language=en|archive-url=|archive-date=|dead-url=}}</ref> Google has also used TPUs for [[Google Street View]] text processing, and was able to find all the text in the Street View database in less than five days. In [[Google Photos]], an individual TPU can process over 100 million photos a day. It is also used in [[RankBrain]] which Google uses to provide search results.<ref>{{Cite news|url=http://www.pcworld.com/article/3072256/google-io/googles-tensor-processing-unit-said-to-advance-moores-law-seven-years-into-the-future.html|title=Google's Tensor Processing Unit could advance Moore's Law 7 years into the future|newspaper=PCWorld|language=en|access-date=2017-01-19}}</ref>
-Compared to a [[graphics processing unit]], it is designed for a high volume of low precision computation (e.g. as little as [[8-bit]] precision)<ref>{{Cite web|url=http://www.tomshardware.com/news/google-tensor-processing-unit-machine-learning,31834.html|title=Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)|last=Armasu|first=Lucian|date=2016-05-19|website=Tom's Hardware|access-date=2016-06-26}}</ref> with higher [[IOPS|input/output operations per second]] per [[watt]], and lacks hardware for rasterisation/[[texture mapping]].<ref name="GCP blog 2016">{{Cite web|url=https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html|title=Google supercharges machine learning tasks with TPU custom chip|last=Jouppi|first=Norm|date=May 18, 2016|website=Google Cloud Platform Blog|publisher=Google|language=en-US|access-date=2017-01-22}}</ref> The TPU [[Application-specific integrated circuit|ASIC]]<nowiki/>s are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center [[19-inch rack|rack]], according to Google Distinguished Hardware Engineer Norman Jouppi.<ref name=":0" />
+Compared to a [[graphics processing unit]], it is designed for a high volume of low precision computation (e.g. as little as [[8-bit]] precision)<ref>{{Cite web|url=http://www.tomshardware.com/news/google-tensor-processing-unit-machine-learning,31834.html|title=Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)|last=Armasu|first=Lucian|date=2016-05-19|website=Tom's Hardware|access-date=2016-06-26}}</ref> with higher input/output operations per joule, and lacks hardware for rasterisation/[[texture mapping]].<ref name="GCP blog 2016">{{Cite web|url=https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html|title=Google supercharges machine learning tasks with TPU custom chip|last=Jouppi|first=Norm|date=May 18, 2016|website=Google Cloud Platform Blog|publisher=Google|language=en-US|access-date=2017-01-22}}</ref> The TPU [[Application-specific integrated circuit|ASIC]]<nowiki/>s are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center [[19-inch rack|rack]], according to Google Distinguished Hardware Engineer Norman Jouppi.<ref name=":0" />
 ==Products==