Text-driven video editing has recently experienced rapid development. Despite this, evaluating edited videos remains a considerable challenge. Current metrics tend to fail to align with human perceptions, and effective quantitative metrics for video editing are still notably absent. To address this, we introduce VE-Bench, a benchmark suite tailored to the assessment of text-driven video editing. This suite includes VE-Bench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with multiple distinct editing prompts, editing results from 8 different models, and the corresponding Mean Opinion Scores (MOS) from 24 human annotators. Based on VE-Bench DB, we further propose VE-Bench QA, a quantitative human-aligned measurement for the text-driven video editing task. In addition to the aesthetic, distortion, and other visual quality indicators that traditional VQA methods emphasize, VE-Bench QA focuses on the text-video alignment and the relevance modeling between source and edited videos. It proposes a new assessment network for video editing that attains superior performance in alignment with human preferences. To the best of our knowledge, VE-Bench introduces the first quality assessment dataset for video editing and an effective subjective-aligned quantitative metric for this domain. All data and code will be publicly available at https://github.com/littlespray/VE-Bench.
翻译:文本驱动视频编辑技术近年来发展迅速。尽管如此,对编辑后视频的评估仍然是一个重大挑战。现有度量标准往往难以与人类感知保持一致,且目前仍显著缺乏针对视频编辑的有效定量评估指标。为此,我们提出了VE-Bench,这是一个专为文本驱动视频编辑评估而设计的基准套件。该套件包含VE-Bench DB,一个用于视频编辑的视频质量评估数据库。VE-Bench DB涵盖了包含多种运动与主体的多样化源视频、多个不同的编辑提示、来自8个不同模型的编辑结果,以及来自24位人工标注者的相应平均意见分数。基于VE-Bench DB,我们进一步提出了VE-Bench QA,一种用于文本驱动视频编辑任务的、与人类感知对齐的定量度量方法。除了传统视频质量评估方法所强调的美学、失真等视觉质量指标外,VE-Bench QA重点关注文本-视频对齐以及源视频与编辑后视频之间的关联性建模。它提出了一种新的视频编辑评估网络,该网络在符合人类偏好方面取得了优异的性能。据我们所知,VE-Bench首次为视频编辑领域引入了质量评估数据集以及一种有效的主观对齐定量度量标准。所有数据与代码将在 https://github.com/littlespray/VE-Bench 公开提供。