Diffusion models have shown superior performance in real-world video super-resolution (VSR). However, the slow processing speeds and heavy resource consumption of diffusion models hinder their practical application and deployment. Quantization offers a potential solution for compressing the VSR model. Nevertheless, quantizing VSR models is challenging due to their temporal characteristics and high fidelity requirements. To address these issues, we propose QuantVSR, a low-bit quantization model for real-world VSR. We propose a spatio-temporal complexity aware (STCA) mechanism, where we first utilize the calibration dataset to measure both spatial and temporal complexities for each layer. Based on these statistics, we allocate layer-specific ranks to the low-rank full-precision (FP) auxiliary branch. Subsequently, we jointly refine the FP and low-bit branches to achieve simultaneous optimization. In addition, we propose a learnable bias alignment (LBA) module to reduce the biased quantization errors. Extensive experiments on synthetic and real-world datasets demonstrate that our method obtains comparable performance with the FP model and significantly outperforms recent leading low-bit quantization methods. Code is available at: https://github.com/bowenchai/QuantVSR.
翻译:扩散模型在真实世界视频超分辨率任务中已展现出卓越性能。然而,扩散模型处理速度缓慢且资源消耗巨大,阻碍了其实际应用与部署。量化为压缩视频超分辨率模型提供了一种潜在解决方案。尽管如此,由于视频超分辨率模型具有时序特性及高保真度要求,对其进行量化仍面临挑战。为解决这些问题,我们提出QuantVSR——一种面向真实世界视频超分辨率的低位量化模型。我们提出了一种时空复杂度感知机制:首先利用校准数据集测量每层的空间与时间复杂度,基于这些统计量为低秩全精度辅助分支分配层特异性秩,随后联合优化全精度分支与低位分支以实现同步优化。此外,我们提出可学习偏置对齐模块以减少有偏量化误差。在合成数据集与真实数据集上的大量实验表明,本方法在性能上与全精度模型相当,并显著优于近期领先的低位量化方法。代码发布于:https://github.com/bowenchai/QuantVSR。