Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper explores the performance of TPU with a focus on AI and its implementation in edge computing. It first provides an overview of TPUs, specifically their design in relation to neural networks, their general architecture, compilation techniques and supporting frameworks. Furthermore, we provide a comparative analysis of Cloud and Edge TPU performance against other counterpart chip architectures. It is then discussed how TPUs can be used to speed up AI workloads. The results show that TPUs can provide significant performance improvements both in cloud and edge computing. Additionally, we address the need for further research for the deployment of more architectures in the Edge TPU, as well as the need for the development of more robust comparisons in edge computing.
翻译:张量处理单元(TPUs)是谷歌开发的深度学习专用硬件加速器。本文以人工智能及其在边缘计算中的实现为重点,探讨了TPU的性能表现。首先概述了TPU的基本情况,特别是其针对神经网络的专用设计、通用架构、编译技术及配套框架。随后,我们对云端和边缘TPU与其他同类芯片架构的性能进行了对比分析。接着讨论了如何利用TPU加速AI工作负载。结果表明,TPU在云端和边缘计算中均能带来显著性能提升。此外,我们指出需要在边缘TPU上部署更多架构,以及在边缘计算领域建立更稳健的对比基准,这两方面均有待进一步研究。