AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these methods frequently rely on textual input as the editing guidance, leading to ambiguities and limiting the types of edits they can perform. Recognizing these challenges, we introduce AnyV2V, a novel tuning-free paradigm designed to simplify video editing into two primary steps: (1) employing an off-the-shelf image editing model to modify the first frame, (2) utilizing an existing image-to-video generation model to generate the edited video through temporal feature injection. AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods. AnyV2V can also support any video length. Our evaluation indicates that AnyV2V significantly outperforms other baseline methods in automatic and human evaluations by significant margin, maintaining visual consistency with the source video while achieving high-quality edits across all the editing tasks.

翻译：在基于生成模型的数字内容创作动态领域中，现有视频编辑模型仍无法提供用户期望的质量与控制水平。先前的视频编辑工作或基于图像生成模型以零样本方式扩展，或需要大量微调，这阻碍了流畅视频编辑的生产。此外，这些方法常依赖文本输入作为编辑指导，导致歧义并限制了可执行的编辑类型。针对这些挑战，我们提出AnyV2V——一种创新的无调参范式，将视频编辑简化为两个主要步骤：（1）使用现成图像编辑模型修改首帧，（2）通过时序特征注入利用现有图像到视频生成模型生成编辑后的视频。AnyV2V可借助任意现有图像编辑工具支持广泛的视频编辑任务，包括基于提示的编辑、基于参考的风格迁移、主体驱动编辑及身份操控——这些均无法通过先前方法实现。AnyV2V同时支持任意长度视频。评估表明，AnyV2V在自动评估与人工评估中均显著优于其他基线方法，在保持与源视频视觉一致性的同时，在所有编辑任务中实现高质量编辑。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日