Motion-controllable image animation is a fundamental task with a wide range of potential applications. Recent works have made progress in controlling camera or object motion via various motion representations, while they still struggle to support collaborative camera and object motion control with adaptive control granularity. To this end, we introduce 3D-aware motion representation and propose an image animation framework, called Perception-as-Control, to achieve fine-grained collaborative motion control. Specifically, we construct 3D-aware motion representation from a reference image, manipulate it based on interpreted user instructions, and perceive it from different viewpoints. In this way, camera and object motions are transformed into intuitive and consistent visual changes. Then, our framework leverages the perception results as motion control signals, enabling it to support various motion-related video synthesis tasks in a unified and flexible way. Experiments demonstrate the superiority of the proposed approach. For more details and qualitative results, please refer to our anonymous project webpage: https://chen-yingjie.github.io/projects/Perception-as-Control.
翻译:运动可控的图像动画是一项基础任务,具有广泛的应用前景。现有研究已通过多种运动表示在控制相机或物体运动方面取得进展,但仍难以支持具有自适应控制粒度的相机与物体协同运动控制。为此,我们引入三维感知运动表示,并提出名为“感知即控制”的图像动画框架,以实现细粒度的协同运动控制。具体而言,我们从参考图像构建三维感知运动表示,基于解析后的用户指令对其进行操控,并从不同视角进行感知。通过这种方式,相机与物体运动被转化为直观且一致的视觉变化。随后,本框架将感知结果作为运动控制信号,使其能够以统一灵活的方式支持各类运动相关的视频合成任务。实验证明了所提方法的优越性。更多细节与定性结果请参见我们的匿名项目网页:https://chen-yingjie.github.io/projects/Perception-as-Control。