Unsupervised video object segmentation (VOS), also known as video salient object detection, aims to detect the most prominent object in a video at the pixel level. Recently, two-stream approaches that leverage both RGB images and optical flow maps have gained significant attention. However, the limited amount of training data remains a substantial challenge. In this study, we propose a novel data generation method that simulates fake optical flows from single images, thereby creating large-scale training data for stable network learning. Inspired by the observation that optical flow maps are highly dependent on depth maps, we generate fake optical flows by refining and augmenting the estimated depth maps of each image. By incorporating our simulated image-flow pairs, we achieve new state-of-the-art performance on all public benchmark datasets without relying on complex modules. We believe that our data generation method represents a potential breakthrough for future VOS research.
翻译:无监督视频目标分割(VOS),亦称视频显著目标检测,旨在像素级别识别视频中最突出的目标。近年来,利用RGB图像与光流图的双流方法受到广泛关注。然而,训练数据量不足仍是重大挑战。本研究提出一种新颖的数据生成方法,通过从单张图像模拟伪光流,从而构建大规模训练数据以支持稳定的网络学习。受光流图高度依赖深度图的观察启发,我们通过优化和增强单张图像的估计深度图来生成伪光流。通过引入模拟的图像-光流对,我们在不依赖复杂模块的情况下,于所有公开基准数据集上取得了最新的最优性能。我们相信,该数据生成方法为未来VOS研究提供了潜在的突破路径。