One-Step Diffusion Models have demonstrated promising capability and fast inference in video super-resolution (VSR) for real-world. Nevertheless, the substantial model size and high computational cost of Diffusion Transformers (DiTs) limit downstream applications. While low-bit quantization is a common approach for model compression, the effectiveness of quantized models is challenged by the high dynamic range of input latent and diverse layer behaviors. To deal with these challenges, we introduce LSGQuant, a layer-sensitivity guided quantizing approach for one-step diffusion-based real-world VSR. Our method incorporates a Dynamic Range Adaptive Quantizer (DRAQ) to fit video token activations. Furthermore, we estimate layer sensitivity and implement a Variance-Oriented Layer Training Strategy (VOLTS) by analyzing layer-wise statistics in calibration. We also introduce Quantization-Aware Optimization (QAO) to jointly refine the quantized branch and a retained high-precision branch. Extensive experiments demonstrate that our method has nearly performance to origin model with full-precision and significantly exceeds existing quantization techniques. Code is available at: https://github.com/zhengchen1999/LSGQuant.
翻译:一步扩散模型在真实世界视频超分辨率任务中展现出优异的性能和快速的推理能力。然而,扩散变换器庞大的模型规模和高计算成本限制了其下游应用。虽然低位量化是模型压缩的常用方法,但输入潜变量的高动态范围以及各层行为的多样性对量化模型的有效性提出了挑战。为解决这些挑战,我们提出了LSGQuant,一种面向一步扩散真实世界视频超分辨率的层敏感度引导量化方法。本方法采用动态范围自适应量化器来适配视频令牌激活值。此外,我们通过分析校准过程中的层间统计量来估计层敏感度,并实施方差导向的层训练策略。我们还引入了量化感知优化,以联合微调量化分支和保留的高精度分支。大量实验表明,本方法在性能上接近原始全精度模型,并显著超越现有量化技术。代码发布于:https://github.com/zhengchen1999/LSGQuant。