Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods. Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/
翻译:视频中的运动主要由相机运动(由相机移动引起)和物体运动(由物体移动产生)组成。精确控制相机运动和物体运动对于视频生成至关重要。然而,现有工作要么主要关注一种运动类型,要么未能清晰区分两者,限制了其控制能力和多样性。因此,本文提出MotionCtrl,一个用于视频生成的统一灵活运动控制器,旨在有效且独立地控制相机运动和物体运动。MotionCtrl的架构和训练策略经过精心设计,充分考虑了相机运动、物体运动的内在特性以及不完美的训练数据。与先前方法相比,MotionCtrl具有三个主要优势:1)它能够有效且独立地控制相机运动和物体运动,实现更精细的运动控制,并促进两种运动类型灵活多样的组合。2)其运动条件由相机位姿和轨迹决定,这些条件与外观无关,对生成视频中物体的外观或形状影响极小。3)它是一个相对通用的模型,一经训练即可适应广泛的相机位姿和轨迹。我们进行了大量的定性和定量实验,以证明MotionCtrl相对于现有方法的优越性。项目页面:https://wzhouxiff.github.io/projects/MotionCtrl/