We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing. It leverages a simple decomposition of the video editing problem into image editing followed by motion-conditioned image animation. Furthermore, given the lack of robust evaluation datasets for video editing, we introduce a new benchmark that measures edit capability across a wide variety of tasks, such as object replacement, background changes, style changes, and motion edits. We present a comprehensive human evaluation of the latest video editing methods along with MoCA, on our proposed benchmark. MoCA establishes a new state-of-the-art, demonstrating greater human preference win-rate, and outperforming notable recent approaches including Dreamix (63%), MasaCtrl (75%), and Tune-A-Video (72%), with especially significant improvements for motion edits.
翻译:我们提出MoCA,一种基于运动条件的图像动画方法用于视频编辑。该方法将视频编辑问题简化为图像编辑与随后的基于运动条件的图像动画两个步骤。针对视频编辑领域缺乏鲁棒评估数据集的问题,我们引入了一个新的基准测试,该基准可衡量包括物体替换、背景变化、风格变化及运动编辑等多样化任务的编辑能力。我们在该基准上对包括MoCA在内的最新视频编辑方法进行了全面的人工评估。MoCA达到了新的最佳性能,展现了更高的人类偏好胜率,显著优于近期重要方法如Dreamix(63%)、MasaCtrl(75%)和Tune-A-Video(72%),尤其在运动编辑任务上取得了显著提升。