AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these methods frequently rely on textual input as the editing guidance, leading to ambiguities and limiting the types of edits they can perform. Recognizing these challenges, we introduce AnyV2V, a novel tuning-free paradigm designed to simplify video editing into two primary steps: (1) employing an off-the-shelf image editing model to modify the first frame, (2) utilizing an existing image-to-video generation model to generate the edited video through temporal feature injection. AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods. AnyV2V can also support any video length. Our evaluation shows that AnyV2V achieved CLIP-scores comparable to other baseline methods. Furthermore, AnyV2V significantly outperformed these baselines in human evaluations, demonstrating notable improvements in visual consistency with the source video while producing high-quality edits across all editing tasks.

翻译：在使用生成模型进行数字内容创作的动态领域中，最先进的视频编辑模型仍未能提供用户所期望的质量与控制水平。以往的视频编辑工作要么以零样本方式从基于图像的生成模型扩展而来，要么需要进行大量的微调，这可能阻碍流畅视频编辑的生成。此外，这些方法通常依赖文本输入作为编辑指导，导致模糊性并限制了它们可执行的编辑类型。认识到这些挑战，我们引入了AnyV2V，一种新颖的无调优范式，旨在将视频编辑简化为两个主要步骤：(1) 使用现成的图像编辑模型修改第一帧，(2) 利用现有的图像到视频生成模型，通过时序特征注入生成编辑后的视频。AnyV2V可以利用任何现有的图像编辑工具，支持广泛的视频编辑任务，包括基于提示的编辑、基于参考的风格迁移、主体驱动的编辑和身份操纵，这些是先前方法无法实现的。AnyV2V还支持任意视频长度。我们的评估表明，AnyV2V取得了与其他基线方法相当的CLIP分数。此外，AnyV2V在人工评估中显著优于这些基线，在所有编辑任务中生成高质量编辑的同时，在与源视频的视觉一致性方面表现出显著的提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日