The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided.
翻译:云端处理数据量的增长、物联网应用的快速发展以及日益凸显的数据隐私问题,正推动计算模式从云端处理向边缘处理转变。边缘设备有限的能量与计算资源迫使传统冯·诺依曼架构向存内计算架构转型,尤其适用于机器学习与神经网络应用。网络压缩技术被用于在有限硬件资源上实现神经网络,其中量化是最有效的网络压缩技术之一,可降低内存占用、延迟与能耗。本文全面综述了基于存内计算的量化神经网络,并建立了软件量化方法到存内计算硬件实现的关联。此外,本文还提出了开放挑战、量化神经网络设计需求与建议,以及基于存内计算的量化神经网络硬件发展路线图。