Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2*0.5.
翻译:训练后量化(PTQ)被广泛认为是最实用的压缩方法之一,得益于其数据隐私保护和低计算成本。我们认为PTQ方法中存在一个被忽视的振荡问题。本文率先探索并提出了理论证明,解释了该问题在PTQ中的重要性。随后,我们引入了一个有原则且通用的理论框架来解决该问题。具体而言,我们首先对PTQ中的振荡进行了形式化描述,并证明该问题由模块容量差异引起。为此,我们定义了数据依赖和数据无关场景下的模块容量(ModCap),其中相邻模块的差异用于衡量振荡程度。通过选择top-k差异值来解决问题,相应的模块被联合优化和量化。大量实验表明,我们的方法成功减少了性能下降,并泛化到不同的神经网络和PTQ方法。例如,在2/4位ResNet-50量化中,我们的方法比先前最先进的方法提升了1.9%。在小型模型量化中效果更为显著,例如在MobileNetV2*0.5上,该方法比BRECQ方法提升了6.61%。