Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e., placement and scheduling of DNN elementary operations on hardware units of the accelerator). We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings that utilize the hardware resources more effectively. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations. To find, analyze, and exploit these mappings, we: (i) extend a general-purpose state-of-the-art mapping tool (Timeloop) to support mixed quantization, which is not currently available; (ii) propose an efficient multi-objective optimization algorithm to find the most suitable bit-widths and mapping for each DNN layer executed on the accelerator; and (iii) conduct a detailed experimental evaluation to validate the proposed method. On two CNNs (MobileNetV1 and MobileNetV2) and two accelerators (Eyeriss and Simba) we show that for a given quality metric (such as the accuracy on ImageNet), energy savings are up to 37% without any accuracy drop.
翻译:在基于CNN推理加速器实现的卷积神经网络中,能量效率和内存占用取决于多种因素,包括权重量化策略(即数据类型和位宽)与映射策略(即DNN基本操作在加速器硬件单元上的放置与调度)。研究表明,在实现过程中启用丰富的混合量化方案能够开辟一个先前隐藏的映射空间,从而更有效地利用硬件资源。相较于优化不足的CNN实现方案,采用量化权重与激活值并配合合适映射策略的CNN能在精度、能耗和内存需求之间实现显著更优的权衡。为发现、分析和利用这些映射策略,我们:(i)扩展了通用型先进映射工具Timeloop以支持当前尚未实现的混合量化功能;(ii)提出高效的多目标优化算法,为在加速器上执行的每个DNN层寻找最适宜的位宽与映射方案;(iii)通过详尽的实验评估验证所提方法的有效性。在两种CNN架构(MobileNetV1与MobileNetV2)和两种加速器(Eyeriss与Simba)上的实验表明,在给定质量指标(如ImageNet数据集上的准确率)下,该方法可实现高达37%的能耗节约且不损失任何精度。