Motion-Conditioned Diffusion Model for Controllable Video Synthesis - 专知论文

会员服务 ·

0

控制器 · MoDELS · 多样性 · 稀疏 · 可理解性 ·

2023 年 4 月 27 日

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

翻译：运动条件扩散模型用于可控视频合成

Tsai-Shien Chen,Chieh Hubert Lin,Hung-Yu Tseng,Tsung-Yi Lin,Ming-Hsuan Yang

from arxiv, Project page: https://tsaishien-chen.github.io/MCDiff/

Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

翻译：近期扩散模型的进展极大地提升了合成内容的质量与多样性。为利用扩散模型的强大表达能力，研究者探索了多种可控机制，使用户能够直观地引导内容合成过程。尽管最新工作主要聚焦于视频合成，但在有效描述和控制所需内容及运动的方法上仍存在不足。针对这一空白，我们提出MCDiff，一种条件扩散模型，可从起始图像帧和一组笔画生成视频，用户可通过这些笔画指定合成的内容与动态。为处理稀疏运动输入的歧义性并提升合成质量，MCDiff首先利用流完成模型基于视频帧的语义理解与稀疏运动控制预测密集视频运动，随后扩散模型合成高质量的未来帧以构成输出视频。我们定性与定量地证明，MCDiff在笔画引导的可控视频合成中达到了最先进的视觉质量。在MPII人体姿态上的额外实验进一步展示了模型在多内容与运动合成方面的能力。

0

相关内容

控制器

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

生物炭对水稻土壤微域环境和根功能影响及调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子态排列不变部分的研究

国家自然科学基金

0+阅读 · 2014年12月31日

WO3微/纳结构有序多孔薄膜的原位合成及其对NO2的气敏性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

双目立体视频到多视点立体视频生成及压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

青藏高原西部大气持久性有机污染物季节变化、干湿沉降和长距离传输

国家自然科学基金

0+阅读 · 2012年12月31日

RIP1在三阴乳腺癌中对TRAIL受体介导的细胞自噬调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多自由度哈密顿系统的动力学不稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

参数化统计新模型及其在图像特征抽取中的应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

碳纳米管的非线性光学性质及其在飞秒激光器中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

基于图域几何PDE与特征不变量的离散曲面处理

国家自然科学基金

0+阅读 · 2009年12月31日

Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

Arxiv

0+阅读 · 2023年6月13日

Single Motion Diffusion

Arxiv

0+阅读 · 2023年6月13日

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model

Arxiv

0+阅读 · 2023年6月13日

Grounded Image Captioning in Top-down View

Arxiv

0+阅读 · 2023年6月13日

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

Arxiv

0+阅读 · 2023年6月12日

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

Arxiv

0+阅读 · 2023年6月12日

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Arxiv

0+阅读 · 2023年6月12日

Diffusion Self-Guidance for Controllable Image Generation

Arxiv

0+阅读 · 2023年6月11日

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face

Arxiv

0+阅读 · 2023年6月11日

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt

Arxiv

0+阅读 · 2023年6月9日

VIP会员

文章信息

相关主题

最新内容

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

0+阅读 · 12分钟前

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

0+阅读 · 17分钟前

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

0+阅读 · 今天8:28

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

6+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

6+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

4+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

7+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

7+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

11+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

16+阅读 · 7月18日

相关VIP内容

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

印度精确打击与指挥架构的断层

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

Arxiv

0+阅读 · 2023年6月13日

Single Motion Diffusion

Arxiv

0+阅读 · 2023年6月13日

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model

Arxiv

0+阅读 · 2023年6月13日

Grounded Image Captioning in Top-down View

Arxiv

0+阅读 · 2023年6月13日

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

Arxiv

0+阅读 · 2023年6月12日

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

Arxiv

0+阅读 · 2023年6月12日

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Arxiv

0+阅读 · 2023年6月12日

Diffusion Self-Guidance for Controllable Image Generation

Arxiv

0+阅读 · 2023年6月11日

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face

Arxiv

0+阅读 · 2023年6月11日

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt

Arxiv

0+阅读 · 2023年6月9日

相关基金

生物炭对水稻土壤微域环境和根功能影响及调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子态排列不变部分的研究

国家自然科学基金

0+阅读 · 2014年12月31日

WO3微/纳结构有序多孔薄膜的原位合成及其对NO2的气敏性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

双目立体视频到多视点立体视频生成及压缩方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

青藏高原西部大气持久性有机污染物季节变化、干湿沉降和长距离传输

国家自然科学基金

0+阅读 · 2012年12月31日

RIP1在三阴乳腺癌中对TRAIL受体介导的细胞自噬调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多自由度哈密顿系统的动力学不稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

参数化统计新模型及其在图像特征抽取中的应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

碳纳米管的非线性光学性质及其在飞秒激光器中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

基于图域几何PDE与特征不变量的离散曲面处理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员