MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse video generations. Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion. For example, generating a video with a car moving in a prescribed manner under specific camera movements to make a movie, or a video illustrating how a bear would lift weights to inspire creators. Adaptation methods have been developed for customizing appearance like subject or style, yet unexplored for motion. It is straightforward to extend mainstream adaption methods for motion customization, including full model tuning, parameter-efficient tuning of additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept learned by these methods is often coupled with the limited appearances in the training videos, making it difficult to generalize the customized motion to other appearances. To overcome this challenge, we propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Further, we design a novel appearance-debiased temporal loss to mitigate the influence of appearance on the temporal training objective. Experimental results show the proposed method can generate videos of diverse appearances for the customized motions. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions. Our code and model weights will be released.

翻译：大规模预训练扩散模型在多样化视频生成中展现出卓越能力。给定一组具有相同运动概念的短视频片段，运动定制任务旨在使现有文本到视频扩散模型能够生成包含该运动模式的视频。例如，生成汽车按照指定方式在特定摄像机运动下行驶的视频以制作电影，或展示熊如何进行举重动作的视频来启发创作者。现有适配方法已开发用于外观定制（如主体或风格），但在运动定制方面尚待探索。虽然可直观地将主流适配方法（包括全模型微调、附加层的参数高效微调以及低秩适配（LoRA））扩展至运动定制，但这些方法习得的运动概念常与训练视频中的有限外观耦合，导致难以将定制运动泛化至其他外观。为突破这一瓶颈，我们提出MotionDirector，采用双路径LoRA架构解耦外观与运动的学习。进一步，我们设计了新型外观去偏时序损失函数，以削弱外观对时序训练目标的影响。实验结果表明，所提方法能够为定制运动生成具有多样外观的视频。我们的方法还支持多种下游应用，例如分别对视频外观和运动进行混合，以及将定制运动应用于单张图像的动画生成。相关代码与模型权重将开源发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日