Omnidirectional depth perception is essential for mobile robotics applications that require scene understanding across a full 360{\deg} field of view. Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps without relying on expensive active sensing. However, existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments, depth ranges, and lighting conditions, due to the scarcity of real-world data. We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation within an iterative optimization-based stereo matching architecture. We introduce a dedicated two-stage training strategy to utilize the relative monocular depth features for our omnidirectional stereo matching before scale-invariant fine-tuning. DFI-OmniStereo achieves state-of-the-art results on the real-world Helvipad dataset, reducing disparity MAE by approximately 16% compared to the previous best omnidirectional stereo method.
翻译:全景深度感知对于需要在完整360度视场范围内理解场景的移动机器人应用至关重要。基于相机的设置提供了一种经济高效的方案,它通过立体深度估计来生成密集的高分辨率深度图,而无需依赖昂贵的主动传感。然而,由于真实世界数据的稀缺,现有的全景立体匹配方法在不同环境、深度范围和光照条件下仅能实现有限的深度精度。我们提出了DFI-OmniStereo,一种新颖的全景立体匹配方法,该方法在基于迭代优化的立体匹配架构中,利用大规模预训练的基础模型进行相对单目深度估计。我们引入了一种专门的两阶段训练策略,在尺度不变微调之前,将相对单目深度特征用于我们的全景立体匹配。DFI-OmniStereo在真实世界的Helvipad数据集上取得了最先进的结果,与先前最佳的全景立体方法相比,将视差平均绝对误差降低了约16%。