Recent advances in machine learning by deep neural networks are significant. But using these networks has been accompanied by a huge number of parameters for storage and computations that leads to an increase in the hardware cost and posing challenges. Therefore, compression approaches have been proposed to design efficient accelerators. One important approach for deep neural network compression is quantization that full-precision values are stored in low bit-width. In this way, in addition to memory saving, the operations will be replaced by simple ones with low cost. Many methods are suggested for DNNs Quantization in recent years, because of flexibility and influence in designing efficient hardware. Therefore, an integrated report is essential for better understanding, analysis, and comparison. In this paper, we provide a comprehensive survey. We describe the quantization concepts and categorize the methods from different perspectives. We discuss using the scale factor to match the quantization levels with the distribution of the full-precision values and describe the clustering-based methods. For the first time, we review the training of a quantized deep neural network and using Straight-Through Estimator comprehensively. Also, we describe the simplicity of operations in quantized deep convolutional neural networks and explain the sensitivity of the different layers in quantization. Finally, we discuss the evaluation of the quantization methods and compare the accuracy of previous methods with various bit-width for weights and activations on CIFAR-10 and the large-scale dataset, ImageNet.
翻译:近年来,基于深度神经网络的机器学习取得了显著进展。然而,这些网络的应用伴随着海量参数的存储与计算,导致硬件成本上升并带来诸多挑战。为此,研究者提出了压缩方法以设计高效加速器。其中,量化是深度神经网络压缩的重要途径,通过将全精度数值存储为低位宽数值。该方法不仅能节省内存,还可将复杂运算替换为低成本简单运算。由于量化在高效硬件设计中的灵活性与影响力,近年来涌现出大量针对深度神经网络量化的方法。因此,为便于理解、分析与比较,亟需一份系统性报告。本文提供全面综述,阐述量化概念并从不同维度对方法进行分类。我们探讨如何利用缩放因子使量化等级匹配全精度数值分布,并介绍基于聚类的量化方法。首次系统性回顾了量化深度神经网络的训练过程及直通估计器的应用。同时,阐述量化深度卷积神经网络的运算简化原理,并解释不同网络层对量化的敏感度差异。最后,评估各类量化方法,在CIFAR-10与大规模数据集ImageNet上,对比不同权重与激活值位宽下现有方法的精度表现。