Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on computation-heavy and inflexible calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Bounded-init Grid Refinement (BGR) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $δ$-Guided Bit Switching ($δ$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$\times$ speedup over full-precision baselines on advanced DiT models while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.
翻译:扩散Transformer(DiTs)已成为视频生成领域的最先进架构,但其计算与内存需求阻碍了实际部署。尽管训练后量化(PTQ)为加速视频DiT模型提供了一种前景广阔的方法,但现有方法存在两个关键局限:(1)依赖于计算量大且灵活性不足的校准流程;(2)量化后性能显著下降。为应对这些挑战,我们提出了DVD-Quant,一种新颖的无需数据视频DiT量化框架。该方法融合了三大创新技术:(1)用于无校准数据量化误差削减的有界初始化网格优化(BGR)与(2)自动缩放旋转量化(ARQ),以及(3)用于自适应位宽分配的$δ$引导位切换($δ$-GBS)。在多个视频生成基准上的大量实验表明,DVD-Quant在先进DiT模型上实现了相较全精度基线约2$\times$的加速,同时保持了视觉保真度。值得注意的是,DVD-Quant首次实现了视频DiT的W4A4 PTQ量化且不损失视频质量。代码与模型将在https://github.com/lhxcs/DVD-Quant发布。