Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. Unlike image diffusion, we observe that the temporal features, which are integrated into all frame features, exhibit pronounced skewness. Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization. To address these issues, we introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. Specifically, we propose the High Temporal Discriminability Quantization (HTDQ) method, designed for temporal features, which retains the high discriminability of quantized features, providing precise temporal guidance for all video frames. In addition, we present the Scattered Channel Range Integration (SCRI) method which aims to improve the coverage of quantization levels across individual channels. Experimental validations across various models, datasets, and bit-width settings demonstrate the effectiveness of our QVD in terms of diverse metrics. In particular, we achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD.
翻译:近年来,视频扩散模型(VDMs)因其在生成连贯且逼真的视频内容方面取得的显著进展而备受关注。然而,同时处理多帧特征以及较大的模型规模,导致了高延迟和巨大的内存消耗,阻碍了其更广泛的应用。后训练量化(PTQ)是一种减少内存占用并提高计算效率的有效技术。与图像扩散不同,我们观察到,集成到所有帧特征中的时序特征表现出明显的偏态分布。此外,我们研究了视频扩散模型激活中显著的通道间差异和不对称性,这导致单个通道对量化级别的覆盖率较低,从而增加了量化难度。为了解决这些问题,我们提出了首个专为视频扩散模型定制的PTQ策略,命名为QVD。具体而言,我们提出了针对时序特征设计的高时序可区分性量化(HTDQ)方法,该方法保留了量化特征的高可区分性,为所有视频帧提供精确的时序指导。此外,我们提出了分散通道范围集成(SCRI)方法,旨在提高各独立通道对量化级别的覆盖率。在不同模型、数据集和比特位宽设置下的实验验证,证明了我们的QVD方法在多种指标上的有效性。特别是在W8A8配置下,我们实现了近乎无损的性能下降,在FVD指标上优于现有方法205.12分。