Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In particular, it highlights the most advanced approaches to support deep learning accelerations including not only GPU and TPU-based accelerators but also design-specific hardware accelerators such as FPGA-based and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators and co-processors. The survey also describes accelerators based on emerging memory technologies and computing paradigms, such as 3D-stacked Processor-In-Memory, non-volatile memories (mainly, Resistive RAM and Phase Change Memories) to implement in-memory computing, Neuromorphic Processing Units, and accelerators based on Multi-Chip Modules. Among emerging technologies, we also include some insights into quantum-based accelerators and photonics. To conclude, the survey classifies the most influential architectures and technologies proposed in the last years, with the purpose of offering the reader a comprehensive perspective in the rapidly evolving field of deep learning.
翻译:深度学习(DL)的最新发展趋势表明,硬件加速器已成为图像分类、计算机视觉和语音识别等几类高性能计算(HPC)应用最可行的解决方案。本综述总结并分类了为满足HPC应用性能要求而设计的深度学习加速器的最新进展。特别地,本文重点阐述了支持深度学习加速的最先进方法,不仅包括基于GPU和TPU的加速器,还包括特定设计的硬件加速器,例如基于FPGA和ASIC的加速器、神经处理单元、基于开放硬件RISC-V的加速器及协处理器。本综述还描述了基于新兴存储技术和计算范式的加速器,例如用于实现内存计算的3D堆叠内存处理器、非易失性存储器(主要是阻变随机存取存储器和相变存储器)、神经形态处理单元以及基于多芯片模块的加速器。在新兴技术中,我们还对基于量子和光子学的加速器进行了探讨。最后,本文对近年来提出的最具影响力的架构和技术进行了分类,旨在为读者提供这一快速发展领域的全面视角。