The rapid development of diffusion models (DMs) has significantly advanced image and video applications, making "what you want is what you see" a reality. Among these, video editing has gained substantial attention and seen a swift rise in research activity, necessitating a comprehensive and systematic review of the existing literature. This paper reviews diffusion model-based video editing techniques, including theoretical foundations and practical applications. We begin by overviewing the mathematical formulation and image domain's key methods. Subsequently, we categorize video editing approaches by the inherent connections of their core technologies, depicting evolutionary trajectory. This paper also dives into novel applications, including point-based editing and pose-guided human video editing. Additionally, we present a comprehensive comparison using our newly introduced V2VBench. Building on the progress achieved to date, the paper concludes with ongoing challenges and potential directions for future research.
翻译:扩散模型的快速发展显著推动了图像与视频应用的发展,使"所见即所想"成为现实。其中,视频编辑领域获得了广泛关注,相关研究活动迅速兴起,亟需对现有文献进行全面而系统的梳理。本文综述了基于扩散模型的视频编辑技术,涵盖理论基础与实际应用。我们首先概述了其数学框架及图像领域的关键方法。随后,我们依据核心技术的内在联系对视频编辑方法进行分类,描绘其演进轨迹。本文还深入探讨了新兴应用,包括基于点的编辑和姿态引导的人类视频编辑。此外,我们利用新提出的V2VBench评估框架进行了综合比较。基于当前取得的进展,本文最后总结了持续存在的挑战及未来研究的潜在方向。