• The TPU on a PCB. Source: Schneider 2017, (c) Google.

Tensor Processing Unit

Improve this article. Show messages.


The TPU on a PCB. Source: Schneider 2017, (c) Google.

Tensor Processing Unit (TPU) is an ASIC announced by Google for executing Machine Learning (ML) algorithms. CPUs are general purpose processors. GPUs are more suited for graphics and tasks that can benefit from parallel execution. DSPs work well for signal processing tasks that typically require mathematical precision. On the other hand, TPUs are optimized for ML. While any of the others could also be used for ML, TPUs are expected to bring better performance per watt for ML. In fact, TPUs are said to catapult computing power seven years into the future, which is equivalent to three generations of Moore's Law. Tests show a 14x better performance compared to GPUs.

Google claims that TPUs are tailored for running TensorFlow, which is an open-source software library for Machine Intelligence.


  • What is Google's interest in making the TPU?

    Google has claimed that "great software shines brightest with great hardware underneath." This is particularly true of ML where a TPU would offer software the requisite power to run faster and hence process more data. Google wants to use TPUs to power its ML algorithms. As of May 2016, more than 100 teams are said to be using ML within Google. Google Today, Street View, Inbox Smart Reply, RankBrain and voice search are products that are already benefiting from TPU hardware. AlphaGo used TPUs to defeat Go world champion Lee Sedol.

    Beyond Google's internal projects, TPUs can offer an advantage for all ML applications implemented in TensorFlow. ML applications looking to run out of a cloud infrastructure will tend to prefer Google Cloud Platform powered by TPUs. Likewise, TPUs may be a differentiator for Google Cloud Platform when application developers select an ML service API for their applications. For example, Google Cloud Machine Learning is a managed ML service from Google that will directly benefit from TPUs.

    There's also the claim that TPU may be Google's answer to Intel's Xeon processors that dominate datacenters.

  • Can TPUs be used for ML frameworks other than TensorFlow?

    TensorFlow is not the only framework for ML. More specifically, there are multiple frameworks for Deep Learning (DL). However, Google has not disclosed if TensorFlow algorithms are hardwired in TPU or if TPU is a generic accelerator for ML.

  • How is TPU able to achieve its superior performance compared to other processor types?

    Since TPU is an ASIC, optimization is hardwired into the chip. This gives it a speed advantage over CPUs, DSPs and GPUs. Compared to traditional ASICs, such as those used for video stream decoding, TPU's ability to process more data comes from the fact that it sacrifices precision. For example, instead of 32-bit processing, 8-bit processing may be employed and extra precision is used only when required. A TPU can therefore do more operations per second with the same amount of silicon. A TPU can be considered as an AI Accelerator that's emerging as a processor specialized to the task of Machine Learning and Artificial Neural Networks.

  • Doesn't low-precision arthimetric reduce accuracy of calculations?

    Research has shown that deep learning algorithms are not affected by low-precision arithmetic. In fact, low-precision arithmetic can be used for both training as well as inference. This is because ML is essentially probabilistic in nature and high-precision arithmetic is unnecessary. One writer reported that "having more data that is less precise yield better results than having half as much data that was more precise." In fact, addition of noise during training can improve performance.

    One report claimed that Google intends to TPU only for inference. The report added that low-precision arithmetic is not suited for training.

  • What's the competition for Google's TPU?

    Google's TPU is in fact used only within Google, at least for now. Nvidia dominates the ML processor market with its GPUs. Nvidia has specialized its Tesla GPUs, named Pascal, that are suited for ML. These can be used either for training or for inference. Movidius makes Visual Processing Units (VPUs), named Myriad 2, that offer visual intelligence at device level. IBM's own chip named TrueNorth is based on a project that built the digital equivalent of a rodent's brain. TrueNorth is meant to bring deep learning to devices for the purpose of inference. Intel announced in November 2016 an AI processor named Nervana that may come to market end of 2017. Nervana is designed to be used for both training and inference. Microsoft for its part has been using FPGAs instead in its datacenters since these can be configured easily unlike ASICs. Configurability is an important aspect when algorithms change frequently and hence ASICs are not suitable. Qualcomm announced in January 2017 that it has optimized TensorFlow for the Hexagon 682 DSP. ARM is promoting its MALI GPUs to offload ML processing from its Cortex CPUs.

  • Is Google's TPU anyway connected to SGI's product of the same name?

    No. Silicon Graphics had something called a TPU in its workstations in the 2000s. It was an advanced DSP that used dynamic shared-memory access. This has nothing to do with Google's TPU.


  1. Armasu, Lucian. 2016. "Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency." Tom's Hardware. May 19. Retrieved 2017-02-20.
  2. Bright, Peter. 2016. "Programmable chips turning Azure into a supercomputing powerhouse." Ars Technica. September 28. Retrieved 2017-02-20.
  3. Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. 2015. "Training deep neural networks with low precision multiplications." September 23. arXiv. Retrieved 2017-02-20.
  4. Davies, Jem. 2016. "ARM and Machine Learning." ARM. December 12. Retrieved 2017-02-20.
  5. Freund, Karl. 2016. "Google's TPU Chip Creates More Questions Than Answers." Forbes. May 26. Retrieved 2017-02-20.
  6. Gupta, Suyog, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. "Deep Learning with Limited Numerical Precision." February 9. arXiv. Retrieved 2017-02-20.
  7. Jacobowitz, P.J. 2017. "TensorFlow machine learning now optimized for the Snapdragon 835 and Hexagon 682 DSP." Qualcomm. January 10. Retrieved 2017-02-20.
  8. Jouppi, Norm. 2016. "Google supercharges machine learning tasks with TPU custom chip." Google Cloud Platform Blog. May 18. Retrieved 2017-02-20.
  9. Metz, Cade. 2015. "IBM's 'Rodent Brain' Chip Could Make Our Phones Hyper-Smart." Wired. August 17. Retrieved 2017-02-20.
  10. Metz, Cade. 2016. "Intel Looks to a New Chip to Power the Coming Age of AI." Wired. November 18. Retrieved 2017-02-20.
  11. Morgan, Timothy Prickett. 2016. "Nvidia Pushes Deep Learning Inference With New Pascal GPUs." The Next Platform. September 13. Retrieved 2017-02-20.
  12. Nvidia. 2016. "Deep Learning Frameworks." Nvidia. April 5. Updated February 9, 2017. Retrieved 2017-02-20.
  13. Osborne, Joe. 2016. "Google's Tensor Processing Unit explained: this is what the future of computing looks like." Tech Radar India. August 23. Retrieved 2017-02-20.
  14. Racanelli, Heidi. 2000. SGI Tensor Processing Unit (TPU) XIO Board Introduction. Document Number 007-4222-002. Silicon Graphics, Inc. Retrieved 2017-02-20.
  15. Schneider, David. 2017. "Google Details Tensor Chip Powers." IEEE Spectrum. April 6. Retrieved 2017-04-11.
  16. Singh, Akash. 2016. "What is the difference among CPU, GPU, APU, FPGA, DSP, and Intel MIC?" Quora. Updated May 25. Retrieved 2017-02-20.
  17. Ung, Gordon Mah. 2016. "Google's Tensor Processing Unit could advance Moore's Law 7 years into the future." PC World. May 18. Retrieved 2017-02-20.
  18. Wikipedia. 2017. AI accelerator. Updated February 14. Retrieved 2017-02-20.
  19. Yegulalp, Serdar. 2016. "13 frameworks for mastering machine learning." InfoWorld. January 28. Retrieved 2017-02-20.



Google announces that it's been using TPUs in its data centers for ML for more than a year.


See Also

  • Machine Learning
  • Deep Learning
  • TensorFlow
  • Processor types
  • GPU
  • ASIC

Further Reading

  1. Warden, Pete. 2015. "Why are Eight Bits Enough for Deep Neural Networks?" Pete Warden's blog. May 23. Retrieved 2017-02-20.

Top Contributors

Last update: 2017-04-11 03:52:03 by tintin
Creation: 2017-02-20 18:34:31 by tintin

Article Stats

BETA V0.10