Object removal, as a sub-task of image inpainting, has garnered significant attention in recent years. Existing datasets related to object removal serve a valuable foundation for model validation and optimization. However, they mainly rely on inpainting techniques to generate pseudo-removed results, leading to distribution gaps between synthetic and real-world data. While some real-world datasets mitigate these issues, they face challenges such as limited scalability, high annotation costs, and unrealistic representations of lighting and shadows. To address these limitations, we propose a novel video-based annotation pipeline for constructing a realistic illumination-aware object removal dataset. Leveraging this pipeline, we introduce VDOR, a dataset specifically designed for object removal tasks, which comprises triplets of original frame images with objects, background images without objects, and corresponding masks. By leveraging continuous real-world video frames, we minimize distribution gaps and accurately capture realistic lighting and shadow variations, ensuring close alignment with real-world scenarios. Our approach significantly reduces annotation effort while providing a robust foundation for advancing object removal research.
翻译:物体移除作为图像修复的子任务,近年来受到广泛关注。现有的物体移除相关数据集为模型验证与优化提供了重要基础,但其主要依赖修复技术生成伪移除结果,导致合成数据与真实数据间存在分布差异。尽管部分真实世界数据集缓解了此类问题,却面临可扩展性有限、标注成本高昂、光照与阴影呈现不真实等挑战。为突破这些局限,我们提出一种基于视频的新型标注流程,用于构建具有真实光照感知的物体移除数据集。基于该流程,我们推出了专门针对物体移除任务设计的VDOR数据集,该数据集包含含物体的原始帧图像、无物体的背景图像及对应掩码的三元组。通过利用连续的真实世界视频帧,我们最大限度地减小了分布差异,精确捕捉了真实的光照与阴影变化,确保与真实场景高度吻合。本方法在显著降低标注工作量的同时,为推进物体移除研究提供了坚实基础。