Localization and mapping are core perceptual capabilities for underwater robots. Stereo cameras provide a low-cost means of directly estimating metric depth to support these tasks. However, despite recent advances in stereo depth estimation on land, computing depth from image pairs in underwater scenes remains challenging. In underwater environments, images are degraded by light attenuation, visual artifacts, and dynamic lighting conditions. Furthermore, real-world underwater scenes frequently lack rich texture useful for stereo depth estimation and 3D reconstruction. As a result, stereo estimation networks trained on in-air data cannot transfer directly to the underwater domain. In addition, there is a lack of real-world underwater stereo datasets for supervised training of neural networks. Poor underwater depth estimation is compounded in stereo-based Simultaneous Localization and Mapping (SLAM) algorithms, making it a fundamental challenge for underwater robot perception. To address these challenges, we propose a novel framework that enables sim-to-real training of underwater stereo disparity estimation networks using simulated data and self-supervised finetuning. We leverage our learned depth predictions to develop \algname, a novel framework for real-time underwater SLAM that fuses stereo cameras with IMU, barometric, and Doppler Velocity Log (DVL) measurements. Lastly, we collect a challenging real-world dataset of shipwreck surveys using an underwater robot. Our dataset features over 24,000 stereo pairs, along with high-quality, dense photogrammetry models and reference trajectories for evaluation. Through extensive experiments, we demonstrate the advantages of the proposed training approach on real-world data for improving stereo estimation in the underwater domain and for enabling accurate trajectory estimation and 3D reconstruction of complex shipwreck sites.
翻译:定位与建图是水下机器人的核心感知能力。双目相机为直接估计度量深度提供了一种低成本手段以支持这些任务。然而,尽管近期陆上立体深度估计技术取得了进展,在水下场景中从图像对计算深度仍然具有挑战性。在水下环境中,图像会因光衰减、视觉伪影和动态光照条件而退化。此外,真实水下场景通常缺乏对立体深度估计和三维重建有用的丰富纹理。因此,在空气中数据上训练的立体估计网络无法直接迁移到水下领域。同时,目前缺乏用于神经网络监督训练的真实水下立体数据集。基于双目的即时定位与地图构建(SLAM)算法进一步放大了水下深度估计的不足,这成为水下机器人感知的根本性挑战。为解决这些挑战,我们提出了一种新颖框架,该框架利用仿真数据和自监督微调实现水下立体视差估计网络的仿真到现实训练。我们利用学习到的深度预测开发了\algname,这是一个融合双目相机、IMU、气压计和多普勒速度计(DVL)测量的实时水下SLAM新框架。最后,我们使用水下机器人收集了一个具有挑战性的真实沉船勘测数据集。该数据集包含超过24,000个立体图像对,以及用于评估的高质量密集摄影测量模型和参考轨迹。通过大量实验,我们证明了所提训练方法在真实数据上的优势:不仅能提升水下领域的立体估计性能,还能实现对复杂沉船遗址的精确轨迹估计与三维重建。