AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

With the advance of text-to-image (T2I) diffusion models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. However, adding motion dynamics to existing high-quality personalized T2Is and enabling them to generate animations remains an open challenge. In this paper, we present AnimateDiff, a practical framework for animating personalized T2I models without requiring model-specific tuning. At the core of our framework is a plug-and-play motion module that can be trained once and seamlessly integrated into any personalized T2Is originating from the same base T2I. Through our proposed training strategy, the motion module effectively learns transferable motion priors from real-world videos. Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator. We further propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns, such as different shot types, at a low training and data collection cost. We evaluate AnimateDiff and MotionLoRA on several public representative personalized T2I models collected from the community. The results demonstrate that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity. Codes and pre-trained weights are available at https://github.com/guoyww/AnimateDiff.

翻译：随着文生图（T2I）扩散模型（如Stable Diffusion）以及DreamBooth、LoRA等个性化技术的进步，每个人都能以可承受的成本将想象力转化为高质量图像。然而，为现有的高质量个性化T2I模型添加运动动态并使其生成动画仍是一个开放挑战。本文提出AnimateDiff，一种无需模型特定调优即可实现个性化T2I模型动画化的实用框架。该框架的核心是一个即插即用的运动模块，该模块仅需训练一次即可无缝集成到源自同一基础T2I模型的任意个性化T2I模型中。通过我们提出的训练策略，运动模块能从真实世界视频中有效学习可迁移的运动先验。训练完成后，该运动模块可插入个性化T2I模型，形成个性化动画生成器。我们进一步提出MotionLoRA——一种针对AnimateDiff的轻量级微调技术，使预训练的运动模块能够以较低的训练和数据收集成本适应新运动模式（如不同镜头类型）。我们在社区收集的多个具有代表性的公开个性化T2I模型上评估了AnimateDiff和MotionLoRA。结果表明，我们的方法能帮助这些模型在保持视觉质量和运动多样性的同时，生成时序平滑的动画片段。代码与预训练权重已开源至https://github.com/guoyww/AnimateDiff。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日