Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor visibility and geometric constraints, while sonar is crippled by inherent elevation ambiguity and low resolution. Consequently, prior fusion technique relies on heuristics and flawed geometric assumptions, leading to significant artifacts and an inability to model complex scenes. In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations by adapting the principled plane sweep algorithm for cross-modal fusion between sonar and visual data. Extensive experiments in both high-fidelity simulation and real-world environments demonstrate that SonarSweep consistently generates dense and accurate depth maps, significantly outperforming state-of-the-art methods across challenging conditions, particularly in high turbidity. To foster further research, we will publicly release our code and a novel dataset featuring synchronized stereo-camera and sonar data, the first of its kind.
翻译:在视觉退化水下环境中实现精确的三维重建仍是一项严峻挑战。单模态方法存在明显不足:基于视觉的方法受限于低能见度与几何约束,而声纳则因固有的仰角模糊性和低分辨率而性能受限。因此,现有融合技术依赖启发式算法和有缺陷的几何假设,导致严重伪影且无法建模复杂场景。本文提出SonarSweep——一种全新的端到端深度学习框架,通过将原理性平面扫描算法适配至声纳与视觉数据的跨模态融合,有效克服了上述局限。在逼真模拟环境与真实场景中的大量实验表明,SonarSweep能持续生成稠密且精确的深度图,在各类挑战性条件下(尤其高浊度环境)显著优于现有最先进方法。为促进后续研究,我们将公开代码及首个同步立体相机与声纳数据的新型数据集。