Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework comprises two main components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance. The code is available at \url{https://github.com/ilur98/DLMC-QUANT}.
翻译:模型重参数化是一种在不牺牲性能的前提下提升推理速度的广泛采用技术。然而,当前的后训练量化方法应用于重参数化模型时,常导致显著精度下降。这主要是由通道特异性和样本特异性离群值引起的——这些离群值仅出现在特定样本和通道上,并影响量化参数的选取。为解决此问题,我们提出RepAPQ——一种保持量化重参数化模型精度的新型框架。与以往使用均方误差作为度量的框架不同,我们采用平均绝对误差来减轻离群值对量化参数的影响。该框架包含两个核心组件:量化保护重参数化和跨块校准。为有效校准,量化保护重参数化将多个分支融合为带仿射层的单卷积结构。训练过程中,仿射层加速收敛并放大卷积输出,从而更好适应含离群值的样本。此外,跨块校准通过利用阶段输出的度量值作为监督,解决平均绝对误差引入的梯度问题,并增强层间与量化参数的关联性。大量实验证明RepAPQ在多种模型与任务中的有效性。相比先前方法,本框架在8位PTQ中提升约1%,在6位PTQ中提升约2%,展现了优越性能。代码已开源至\url{https://github.com/ilur98/DLMC-QUANT}。