The advancement of Deep Learning (DL) is driven by efficient Deep Neural Network (DNN) design and new hardware accelerators. Current DNN design is primarily tailored for general-purpose use and deployment on commercially viable platforms. Inference at the edge requires low latency, compact and power-efficient models, and must be cost-effective. Digital processors based on typical von Neumann architectures are not conducive to edge AI given the large amounts of required data movement in and out of memory. Conversely, analog/mixed signal in-memory computing hardware accelerators can easily transcend the memory wall of von Neuman architectures when accelerating inference workloads. They offer increased area and power efficiency, which are paramount in edge resource-constrained environments. In this paper, we propose AnalogNAS, a framework for automated DNN design targeting deployment on analog In-Memory Computing (IMC) inference accelerators. We conduct extensive hardware simulations to demonstrate the performance of AnalogNAS on State-Of-The-Art (SOTA) models in terms of accuracy and deployment efficiency on various Tiny Machine Learning (TinyML) tasks. We also present experimental results that show AnalogNAS models achieving higher accuracy than SOTA models when implemented on a 64-core IMC chip based on Phase Change Memory (PCM). The AnalogNAS search code is released: https://github.com/IBM/analog-nas
翻译:深度学习的发展得益于高效的深度神经网络设计和新型硬件加速器。当前深度神经网络设计主要针对通用用途和在商业可行平台上部署。边缘推理要求低延迟、紧凑且能效高的模型,同时必须具有成本效益。基于典型冯·诺依曼架构的数字处理器因大量数据在内存与处理器间的移动需求,不利于边缘AI应用。相比之下,模拟/混合信号存内计算硬件加速器在加速推理任务时可轻松突破冯·诺依曼架构的“内存墙”限制。这类加速器能提供更高的面积与能效,这在资源受限的边缘环境中至关重要。本文提出AnalogNAS——一种面向模拟存内计算推理加速器部署的自动化深度神经网络设计框架。我们通过广泛的硬件仿真,验证了AnalogNAS在各类TinyML任务中,其搜索所得模型在精度与部署效率上均达到或超越当前最先进模型。实验结果还表明,在基于相变存储器的64核存内计算芯片上,AnalogNAS模型实现的精度高于现有最先进模型。AnalogNAS搜索代码已开源:https://github.com/IBM/analog-nas