Recently, Diffusion Transformers (DiTs) have emerged in Real-World Image Super-Resolution (Real-ISR) to generate high-quality textures, yet their heavy inference burden hinders real-world deployment. While Post-Training Quantization (PTQ) is a promising solution for acceleration, existing methods in super-resolution mostly focus on U-Net architectures, whereas generic DiT quantization is typically designed for text-to-image tasks. Directly applying these methods to DiT-based super-resolution models leads to severe degradation of local textures. Therefore, we propose Q-DiT4SR, the first PTQ framework specifically tailored for DiT-based Real-ISR. We propose H-SVD, a hierarchical SVD that integrates a global low-rank branch with a local block-wise rank-1 branch under a matched parameter budget. We further propose Variance-aware Spatio-Temporal Mixed Precision: VaSMP allocates cross-layer weight bit-widths in a data-free manner based on rate-distortion theory, while VaTMP schedules intra-layer activation precision across diffusion timesteps via dynamic programming (DP) with minimal calibration. Experiments on multiple real-world datasets demonstrate that our Q-DiT4SR achieves SOTA performance under both W4A6 and W4A4 settings. Notably, the W4A4 quantization configuration reduces model size by 5.8$\times$ and computational operations by over 60$\times$. Our code and models will be available at https://github.com/xunzhang1128/Q-DiT4SR.
翻译:近年来,扩散Transformer(DiTs)在真实世界图像超分辨率(Real-ISR)任务中展现出生成高质量纹理的能力,但其沉重的推理负担阻碍了实际部署。后训练量化(PTQ)是一种极具前景的加速方案,然而现有超分辨率领域的量化方法主要针对U-Net架构,而通用的DiT量化方案通常为文生图任务设计。将这些方法直接应用于基于DiT的超分辨率模型会导致局部纹理严重退化。为此,我们提出了Q-DiT4SR,这是首个专为基于DiT的Real-ISR模型定制的PTQ框架。我们提出了H-SVD,一种分层奇异值分解方法,它在匹配的参数预算下,将全局低秩分支与局部块级秩-1分支相集成。我们进一步提出了方差感知时空混合精度方案:VaSMP基于率失真理论以数据无关的方式分配跨层权重位宽,而VaTMP则通过动态规划(DP)以最小校准成本,在扩散时间步上调度层内激活精度。在多个真实世界数据集上的实验表明,我们的Q-DiT4SR在W4A6和W4A4配置下均实现了最先进的性能。值得注意的是,W4A4量化配置将模型大小减少了5.8倍,并将计算操作降低了超过60倍。我们的代码和模型将在https://github.com/xunzhang1128/Q-DiT4SR 公开。