Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods.
翻译:视频中的运动主要包括由摄像机运动引起的摄像机运动,以及由物体运动产生的物体运动。精确控制摄像机和物体运动对于视频生成至关重要。然而,现有工作要么主要关注单一类型的运动,要么未清晰区分两者,从而限制了其控制能力和多样性。为此,本文提出MotionCtrl,一种针对视频生成的统一且灵活的运动控制器,旨在有效且独立地控制摄像机和物体运动。MotionCtrl的架构和训练策略经过精心设计,充分考虑了摄像机运动、物体运动以及训练数据不完善的内在特性。与先前方法相比,MotionCtrl具有三大优势:1)它能够有效且独立地控制摄像机和物体运动,实现更精细的运动控制,并促进两种运动类型的灵活多样组合;2)其运动条件由摄像机姿态和轨迹决定,这些条件不含外观信息,对生成视频中物体的外观或形状影响极小;3)它是一个相对通用的模型,经过训练后可适配广泛的摄像机姿态和轨迹。大量定性和定量实验已证明MotionCtrl相较于现有方法的优越性。