Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by textual prompts still remains challenging. In this paper, we introduce Cinemo, a novel image animation approach towards achieving better motion controllability, as well as stronger temporal consistency and smoothness. In general, we propose three effective strategies at the training and inference stages of Cinemo to accomplish our goal. At the training stage, Cinemo focuses on learning the distribution of motion residuals, rather than directly predicting subsequent via a motion diffusion model. Additionally, a structural similarity index-based strategy is proposed to enable Cinemo to have better controllability of motion intensity. At the inference stage, a noise refinement technique based on discrete cosine transformation is introduced to mitigate sudden motion changes. Such three strategies enable Cinemo to produce highly consistent, smooth, and motion-controllable results. Compared to previous methods, Cinemo offers simpler and more precise user controllability. Extensive experiments against several state-of-the-art methods, including both commercial tools and research approaches, across multiple metrics, demonstrate the effectiveness and superiority of our proposed approach.

翻译：扩散模型凭借其强大的生成能力，在图像动画领域取得了显著进展。然而，如何在时间维度上保持与输入静态图像的细节信息（如风格、背景及主体）的时空一致性，并确保文本提示引导的动画视频叙事流畅性，仍是当前面临的挑战。本文提出Cinemo，一种新颖的图像动画方法，旨在实现更优的运动可控性、更强的时间一致性与流畅度。总体而言，我们在Cinemo的训练与推理阶段设计了三种有效策略以实现目标。在训练阶段，Cinemo通过运动扩散模型专注于学习运动残差的分布，而非直接预测后续帧。此外，我们提出基于结构相似性指数的策略，使Cinemo能更好地控制运动强度。在推理阶段，引入基于离散余弦变换的噪声优化技术以缓解运动突变。这三种策略使Cinemo能够生成高度一致、流畅且运动可控的结果。相较于现有方法，Cinemo提供了更简洁精确的用户控制机制。通过在多个指标上与包括商业工具和研究方法在内的若干先进方法进行广泛对比实验，验证了本方法的有效性与优越性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日