Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2*0.5.
翻译:训练后量化(PTQ)因其数据隐私保护和低计算成本,被广泛认为是最实用的压缩方法之一。我们认为PTQ中存在一个被忽视的振荡问题。本文首次探索并给出理论证明,解释此类问题在PTQ中的本质重要性。随后,我们尝试通过引入一个理论上严谨且通用的框架来解决这一问题。具体而言,我们首先对PTQ中的振荡进行数学建模,并证明该问题源于模块容量的差异。为此,我们定义了数据依赖和数据无关场景下的模块容量(ModCap),利用相邻模块之间的差异度量振荡程度。通过选择前k个差异对应的模块进行联合优化与量化,该问题得以解决。大量实验表明,我们的方法成功降低了性能下降幅度,并可推广至不同神经网络和PTQ方法。例如,在2/4比特ResNet-50量化中,本方法相较现有最优方法提升1.9%;小型模型量化上的优势更为显著,如MobileNetV2*0.5上较BRECQ方法提升6.61%。