Super-resolution (SR) networks have been investigated for a while, with their mobile and lightweight versions gaining noticeable popularity recently. Quantization, the procedure of decreasing the precision of network parameters (mostly FP32 to INT8), is also utilized in SR networks for establishing mobile compatibility. This study focuses on a very important but mostly overlooked post-training quantization (PTQ) step: representative dataset (RD), which adjusts the quantization range for PTQ. We propose a novel pipeline (clip-free quantization pipeline, CFQP) backed up with extensive experimental justifications to cleverly augment RD images by only using outputs of the FP32 model. Using the proposed pipeline for RD, we can successfully eliminate unwanted clipped activation layers, which nearly all mobile SR methods utilize to make the model more robust to PTQ in return for a large overhead in runtime. Removing clipped activations with our method significantly benefits overall increased stability, decreased inference runtime up to 54% on some SR models, better visual quality results compared to INT8 clipped models - and outperforms even some FP32 non-quantized models, both in runtime and visual quality, without the need for retraining with clipped activation.
翻译:超分辨率(SR)网络已研究多年,其移动端与轻量级版本近年来日益流行。量化——即降低网络参数精度(通常为FP32至INT8)的过程——也被用于SR网络以实现移动兼容性。本研究聚焦于一个至关重要却常被忽视的训练后量化(PTQ)步骤:代表性数据集(RD),该步骤用于调整PTQ的量化范围。我们提出了一种新颖的流水线(无裁剪量化流水线,CFQP),该流水线基于大量实验验证,通过仅利用FP32模型的输出来巧妙增强RD图像。利用所提出的RD流水线,我们能够成功消除不必要的裁剪激活层——几乎所有移动SR方法都利用该层来使模型对PTQ更鲁棒,但会带来巨大的运行时开销。使用我们的方法去除裁剪激活显著提升了整体稳定性,将某些SR模型的推理运行时间降低高达54%,相比INT8裁剪模型获得了更优的视觉质量结果——甚至在某些模型的运行时间和视觉质量上超越了部分未量化的FP32模型,且无需使用裁剪激活进行重训练。