In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Background Replacement, a task central to creative applications such as film production and advertising, requires synthesizing entirely new, temporally consistent scenes while maintaining accurate foreground-background interactions, making large-scale data generation significantly more challenging. Consequently, this complex task remains largely underexplored due to a scarcity of high-quality training data. This gap is evident in poorly performing state-of-the-art models, e.g., Kiwi-Edit, because the primary open-source dataset that contains this task, i.e., OpenVE-3M, frequently produces static, unnatural backgrounds. In this paper, we trace this quality degradation to a lack of precise background guidance during data synthesis. Accordingly, we design a scalable pipeline that generates foreground and background guidance in a decoupled manner with strict quality filtering. Building on this pipeline, we introduce Sparkle, a dataset of ~140K video pairs spanning five common background-change themes, alongside Sparkle-Bench, the largest evaluation benchmark tailored for background replacement to date. Experiments demonstrate that our dataset and the model trained on it achieve substantially better performance than all existing baselines on both OpenVE-Bench and Sparkle-Bench. Our proposed dataset, benchmark, and model are fully open-sourced at https://showlab.github.io/Sparkle/.
翻译:近年来,Senorita-2M等开源项目推动了视频编辑向自然语言指令化方向发展。然而,当前公开数据集主要聚焦于局部编辑或风格迁移,这些任务大多保留原始场景结构且易于扩展。相比之下,背景替换作为影视制作与广告等创意应用的核心任务,需合成全新且时间一致的场景,同时保持前景与背景交互的精确性,这使得大规模数据生成极具挑战性。因此,这一复杂任务因缺乏高质量训练数据而长期未被充分探索。现有最优模型(如Kiwi-Edit)的性能不足便印证了这一问题——主要包含该任务的开源数据集OpenVE-3M常生成静态且不自然的背景。本文发现,性能退化源于数据合成过程中缺乏精确的背景引导。据此,我们设计了一种可扩展的流水线,通过解耦方式生成前景与背景引导,并辅以严格的质量过滤。基于该流水线,我们提出Sparkle数据集,包含约14万个视频对,涵盖五种常见背景变换主题,同时构建了迄今最大规模的背景替换专用评估基准Sparkle-Bench。实验表明,本数据集及基于其训练的模型在OpenVE-Bench与Sparkle-Bench上均显著优于现有所有基线方法。所提数据集、基准及模型已在https://showlab.github.io/Sparkle/ 完全开源。