QVD: Post-training Quantization for Video Diffusion Models

Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. Unlike image diffusion, we observe that the temporal features, which are integrated into all frame features, exhibit pronounced skewness. Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization. To address these issues, we introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. Specifically, we propose the High Temporal Discriminability Quantization (HTDQ) method, designed for temporal features, which retains the high discriminability of quantized features, providing precise temporal guidance for all video frames. In addition, we present the Scattered Channel Range Integration (SCRI) method which aims to improve the coverage of quantization levels across individual channels. Experimental validations across various models, datasets, and bit-width settings demonstrate the effectiveness of our QVD in terms of diverse metrics. In particular, we achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD.

翻译：近年来，视频扩散模型（VDMs）因其在生成连贯且逼真的视频内容方面取得的显著进展而备受关注。然而，同时处理多帧特征以及较大的模型规模，导致了高延迟和巨大的内存消耗，阻碍了其更广泛的应用。后训练量化（PTQ）是一种减少内存占用并提高计算效率的有效技术。与图像扩散不同，我们观察到，集成到所有帧特征中的时序特征表现出明显的偏态分布。此外，我们研究了视频扩散模型激活中显著的通道间差异和不对称性，这导致单个通道对量化级别的覆盖率较低，从而增加了量化难度。为了解决这些问题，我们提出了首个专为视频扩散模型定制的PTQ策略，命名为QVD。具体而言，我们提出了针对时序特征设计的高时序可区分性量化（HTDQ）方法，该方法保留了量化特征的高可区分性，为所有视频帧提供精确的时序指导。此外，我们提出了分散通道范围集成（SCRI）方法，旨在提高各独立通道对量化级别的覆盖率。在不同模型、数据集和比特位宽设置下的实验验证，证明了我们的QVD方法在多种指标上的有效性。特别是在W8A8配置下，我们实现了近乎无损的性能下降，在FVD指标上优于现有方法205.12分。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日