Detecting and segmenting moving objects from a moving monocular camera is challenging in the presence of unknown camera motion, diverse object motions and complex scene structures. Most existing methods rely on a single motion cue to perform motion segmentation, which is usually insufficient when facing different complex environments. While a few recent deep learning based methods are able to combine multiple motion cues to achieve improved accuracy, they depend heavily on vast datasets and extensive annotations, making them less adaptable to new scenarios. To address these limitations, we propose a novel monocular dense segmentation method that achieves state-of-the-art motion segmentation results in a zero-shot manner. The proposed method synergestically combines the strengths of deep learning and geometric model fusion methods by performing geometric model fusion on object proposals. Experiments show that our method achieves competitive results on several motion segmentation datasets and even surpasses some state-of-the-art supervised methods on certain benchmarks, while not being trained on any data. We also present an ablation study to show the effectiveness of combining different geometric models together for motion segmentation, highlighting the value of our geometric model fusion strategy.
翻译:从移动单目相机中检测并分割运动物体,在未知相机运动、多样化物体运动及复杂场景结构并存的情况下具有挑战性。现有方法大多依赖单一运动线索进行运动分割,在应对不同复杂环境时通常不够充分。尽管近期少数基于深度学习的方法能结合多种运动线索提升精度,但其高度依赖大规模数据集和大量标注,导致对新场景的适应性较弱。为突破这些局限,我们提出一种新颖的单目稠密分割方法,以零样本方式实现了当前最先进的运动分割效果。该方法通过在物体提议上执行几何模型融合,协同发挥深度学习与几何模型融合方法的优势。实验表明,我们的方法在多个运动分割数据集上取得具有竞争力的结果,甚至在部分基准测试中超越某些最新监督方法,且无需任何训练数据。我们还通过消融研究展示了不同几何模型组合对运动分割的有效性,凸显了所提出的几何模型融合策略的价值。