因果强制：高质量实时交互式视频生成中自回归扩散蒸馏的正确实现 (Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation) - 专知论文

会员服务 ·

0

蒸馏 · 视频 · GitHub · 实时交互 · 交互 ·

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

翻译：因果强制：高质量实时交互式视频生成中自回归扩散蒸馏的正确实现

Hongzhou Zhu,Min Zhao,Guande He,Hang Su,Chongxuan Li,Jun Zhu

from arxiv, Project page and the code: \href{https://thu-ml.github.io/CausalForcing.github.io/}{https://thu-ml.github.io/CausalForcing.github.io/}

To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. However, existing approaches do not bridge this gap theoretically. They initialize the AR student via ODE distillation, which requires frame-level injectivity, where each noisy frame must map to a unique clean frame under the PF-ODE of an AR teacher. Distilling an AR student from a bidirectional teacher violates this condition, preventing recovery of the teacher's flow map and instead inducing a conditional-expectation solution, which degrades performance. To address this issue, we propose Causal Forcing that uses an AR teacher for ODE initialization, thereby bridging the architectural gap. Empirical results show that our method outperforms all baselines across all metrics, surpassing the SOTA Self Forcing by 19.3\% in Dynamic Degree, 8.7\% in VisionReward, and 16.7\% in Instruction Following. Project page and the code: \href{https://thu-ml.github.io/CausalForcing.github.io/}{https://thu-ml.github.io/CausalForcing.github.io/}

翻译：为实现实时交互式视频生成，现有方法将预训练的双向视频扩散模型蒸馏为少步自回归模型，在将完整注意力替换为因果注意力时面临架构差异。然而，现有方法未从理论上弥合这一差异。它们通过ODE蒸馏初始化自回归学生模型，这要求满足帧级单射性条件——即每个含噪帧在自回归教师的PF-ODE下必须映射到唯一的干净帧。从双向教师蒸馏自回归学生违反了该条件，导致无法恢复教师的流形映射，转而产生条件期望解，从而降低生成性能。为解决该问题，我们提出因果强制方法，采用自回归教师进行ODE初始化，从而弥合架构差异。实验结果表明，我们的方法在所有指标上均优于基线模型，其中动态度指标超越当前最优的Self Forcing方法19.3%，视觉奖励指标提升8.7%，指令跟随指标提高16.7%。项目页面与代码：\href{https://thu-ml.github.io/CausalForcing.github.io/}{https://thu-ml.github.io/CausalForcing.github.io/}

0

相关内容

基于扩散模型和流模型的推理时引导生成技术

基于扩散模型和流模型的推理时引导生成技术

专知会员服务

16+阅读 · 2025年4月30日

CVPR2025最新《扩散Transformers》论文，概述最新图像视频生成方法

CVPR2025最新《扩散Transformers》论文，概述最新图像视频生成方法

专知会员服务

12+阅读 · 2025年4月20日

【NUS博士论文】视频语义理解中的因果模型

【NUS博士论文】视频语义理解中的因果模型

专知会员服务

37+阅读 · 2024年10月30日

深度生成模型如何因果化? 新南威尔士大学等《因果深度生成模型》综述，详述GAN、VAE和扩散模型的因果化

深度生成模型如何因果化? 新南威尔士大学等《因果深度生成模型》综述，详述GAN、VAE和扩散模型的因果化

专知会员服务

45+阅读 · 2023年1月31日

【MPG & MILA 】因果表示学习，Towards Causal Representation Learning

专知会员服务

52+阅读 · 2021年7月29日

【ICLR2021】自监督蒸馏学习视觉表示

【ICLR2021】自监督蒸馏学习视觉表示

专知会员服务

34+阅读 · 2021年4月14日

因果关联学习，Causal Relational Learning

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

模型压缩 | 知识蒸馏经典解读

模型压缩 | 知识蒸馏经典解读

AINLP

11+阅读 · 2020年5月31日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

AI新视野 | 数据蒸馏Dataset Distillation

AI新视野 | 数据蒸馏Dataset Distillation

人工智能前沿讲习班

31+阅读 · 2019年6月14日

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

专知

15+阅读 · 2018年6月11日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

Ian Goodfellow等提出自注意力GAN，ImageNet图像合成获最优结果！

Ian Goodfellow等提出自注意力GAN，ImageNet图像合成获最优结果！

新智元

11+阅读 · 2018年5月24日

【学界】极端图像压缩的生成对抗网络，可生成低码率的高质量图像

【学界】极端图像压缩的生成对抗网络，可生成低码率的高质量图像

GAN生成式对抗网络

10+阅读 · 2018年4月25日

深度强化学习首次在无监督视频摘要生成问题中的应用：实现state-of-the-art效果

深度强化学习首次在无监督视频摘要生成问题中的应用：实现state-of-the-art效果

专知

26+阅读 · 2018年1月21日

【论文】所见所想所真，对抗学习GAN提升跨模态检索效果！阿里巴巴AI Labs等团队最新工作

【论文】所见所想所真，对抗学习GAN提升跨模态检索效果！阿里巴巴AI Labs等团队最新工作

专知

12+阅读 · 2017年12月21日

面向多核DSP的实时视频并行编码关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于控制器动态线性化的数据驱动控制方法及在精馏过程的应用

国家自然科学基金

1+阅读 · 2015年12月31日

基于自媒体处理中的极坐标下的非线性理论及超分辨率重建方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

保持结构的交互式图像及视频编辑方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

多信道压缩采样实现多维射频层析成像的理论与方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向视觉质量的高效立体视频编码资源分配优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

相依回归模型与扩散过程的统计推断及其应用

国家自然科学基金

1+阅读 · 2014年12月31日

基于压缩域的海量视频浓缩关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

自热回收精馏过程的优化与控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

High-Fidelity Causal Video Diffusion Models for Real-Time Ultra-Low-Bitrate Semantic Communication

Arxiv

0+阅读 · 2月14日

MonarchRT: Efficient Attention for Real-Time Video Generation

Arxiv

0+阅读 · 2月12日

Flow caching for autoregressive video generation

Arxiv

0+阅读 · 2月11日

Causality in Video Diffusers is Separable from Denoising

Arxiv

0+阅读 · 2月10日

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Arxiv

0+阅读 · 2月5日

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Arxiv

0+阅读 · 2月5日

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Arxiv

0+阅读 · 2月4日

GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation

Arxiv

0+阅读 · 2月2日

Reward-Forcing: Autoregressive Video Generation with Reward Feedback

Arxiv

0+阅读 · 1月23日

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Arxiv

0+阅读 · 1月23日

VIP会员

文章信息

相关主题

相关VIP内容

基于扩散模型和流模型的推理时引导生成技术

基于扩散模型和流模型的推理时引导生成技术

专知会员服务

16+阅读 · 2025年4月30日

CVPR2025最新《扩散Transformers》论文，概述最新图像视频生成方法

CVPR2025最新《扩散Transformers》论文，概述最新图像视频生成方法

专知会员服务

12+阅读 · 2025年4月20日

【NUS博士论文】视频语义理解中的因果模型

【NUS博士论文】视频语义理解中的因果模型

专知会员服务

37+阅读 · 2024年10月30日

深度生成模型如何因果化? 新南威尔士大学等《因果深度生成模型》综述，详述GAN、VAE和扩散模型的因果化

深度生成模型如何因果化? 新南威尔士大学等《因果深度生成模型》综述，详述GAN、VAE和扩散模型的因果化

专知会员服务

45+阅读 · 2023年1月31日

【MPG & MILA 】因果表示学习，Towards Causal Representation Learning

专知会员服务

52+阅读 · 2021年7月29日

【ICLR2021】自监督蒸馏学习视觉表示

【ICLR2021】自监督蒸馏学习视觉表示

专知会员服务

34+阅读 · 2021年4月14日

因果关联学习，Causal Relational Learning

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

热门VIP内容

开通专知VIP会员享更多权益服务

论学习、公平性与复杂度

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

2025中国人工智能学会系列白皮书⸺棋盘上的人工智能|附下载

通用智能体评估的逻辑架构

相关资讯

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

模型压缩 | 知识蒸馏经典解读

模型压缩 | 知识蒸馏经典解读

AINLP

11+阅读 · 2020年5月31日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

AI新视野 | 数据蒸馏Dataset Distillation

AI新视野 | 数据蒸馏Dataset Distillation

人工智能前沿讲习班

31+阅读 · 2019年6月14日

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

专知

15+阅读 · 2018年6月11日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

Ian Goodfellow等提出自注意力GAN，ImageNet图像合成获最优结果！

Ian Goodfellow等提出自注意力GAN，ImageNet图像合成获最优结果！

新智元

11+阅读 · 2018年5月24日

【学界】极端图像压缩的生成对抗网络，可生成低码率的高质量图像

【学界】极端图像压缩的生成对抗网络，可生成低码率的高质量图像

GAN生成式对抗网络

10+阅读 · 2018年4月25日

深度强化学习首次在无监督视频摘要生成问题中的应用：实现state-of-the-art效果

深度强化学习首次在无监督视频摘要生成问题中的应用：实现state-of-the-art效果

专知

26+阅读 · 2018年1月21日

【论文】所见所想所真，对抗学习GAN提升跨模态检索效果！阿里巴巴AI Labs等团队最新工作

【论文】所见所想所真，对抗学习GAN提升跨模态检索效果！阿里巴巴AI Labs等团队最新工作

专知

12+阅读 · 2017年12月21日

相关论文

High-Fidelity Causal Video Diffusion Models for Real-Time Ultra-Low-Bitrate Semantic Communication

Arxiv

0+阅读 · 2月14日

MonarchRT: Efficient Attention for Real-Time Video Generation

Arxiv

0+阅读 · 2月12日

Flow caching for autoregressive video generation

Arxiv

0+阅读 · 2月11日

Causality in Video Diffusers is Separable from Denoising

Arxiv

0+阅读 · 2月10日

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Arxiv

0+阅读 · 2月5日

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Arxiv

0+阅读 · 2月5日

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Arxiv

0+阅读 · 2月4日

GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation

Arxiv

0+阅读 · 2月2日

Reward-Forcing: Autoregressive Video Generation with Reward Feedback

Arxiv

0+阅读 · 1月23日

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Arxiv

0+阅读 · 1月23日

相关基金

面向多核DSP的实时视频并行编码关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于控制器动态线性化的数据驱动控制方法及在精馏过程的应用

国家自然科学基金

1+阅读 · 2015年12月31日

基于自媒体处理中的极坐标下的非线性理论及超分辨率重建方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

保持结构的交互式图像及视频编辑方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

多信道压缩采样实现多维射频层析成像的理论与方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向视觉质量的高效立体视频编码资源分配优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

相依回归模型与扩散过程的统计推断及其应用

国家自然科学基金

1+阅读 · 2014年12月31日

基于压缩域的海量视频浓缩关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

自热回收精馏过程的优化与控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员