Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In particular, it highlights the most advanced approaches to support deep learning accelerations including not only GPU and TPU-based accelerators but also design-specific hardware accelerators such as FPGA-based and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators and co-processors. The survey also describes accelerators based on emerging memory technologies and computing paradigms, such as 3D-stacked Processor-In-Memory, non-volatile memories (mainly, Resistive RAM and Phase Change Memories) to implement in-memory computing, Neuromorphic Processing Units, and accelerators based on Multi-Chip Modules. The survey classifies the most influential architectures and technologies proposed in the last years, with the purpose of offering the reader a comprehensive perspective in the rapidly evolving field of deep learning. Finally, it provides some insights into future challenges in DL accelerators such as quantum accelerators and photonics.
翻译:近年来,深度学习(DL)的趋势使得硬件加速器成为图像分类、计算机视觉和语音识别等多类高性能计算(HPC)应用中最可行的解决方案。本综述总结并归类了面向HPC应用性能需求的深度学习加速器设计领域的最新进展。特别地,它重点介绍了支持深度学习加速的最先进方法,不仅包括基于GPU和TPU的加速器,还包括特定设计的硬件加速器,如基于FPGA和ASIC的加速器、神经处理单元、基于开放硬件RISC-V的加速器及协处理器。本综述还描述了基于新兴存储技术和计算范式的加速器,例如:3D堆叠处理器内存储器、用于实现存内计算的非易失性存储器(主要为电阻式RAM和相变存储器)、神经形态处理单元,以及基于多芯片模块的加速器。本综述对近年来提出的最具影响力的架构和技术进行了分类,旨在为读者提供在快速发展的深度学习领域中一个全面视角。最后,它展望了深度学习加速器的未来挑战,如量子加速器和光子学加速器。