Monocular SLAM has received a lot of attention due to its simple RGB inputs and the lifting of complex sensor constraints. However, existing monocular SLAM systems are designed for bounded scenes, restricting the applicability of SLAM systems. To address this limitation, we propose MoD-SLAM, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes. Specifically, we introduce a Gaussian-based unbounded scene representation approach to solve the challenge of mapping scenes without boundaries. This strategy is essential to extend the SLAM application. Moreover, a depth estimation module in the front-end is designed to extract accurate priori depth values to supervise mapping and tracking processes. By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes. Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively compared with existing state-of-the-art monocular SLAM systems.
翻译:单目SLAM因其仅需简单RGB输入且无需复杂传感器约束而受到广泛关注。然而,现有单目SLAM系统主要针对有界场景设计,这限制了SLAM系统的应用范围。为解决这一局限,我们提出MoD-SLAM,这是首个基于NeRF的单目稠密建图方法,能够实时实现无界场景的三维重建。具体而言,我们引入了一种基于高斯分布的无界场景表示方法,以应对无边界场景建图的挑战。该策略对扩展SLAM应用至关重要。此外,前端设计了一个深度估计模块,用于提取精确的先验深度值以监督建图与跟踪过程。通过在跟踪过程中引入鲁棒的深度损失项,我们的SLAM系统在大尺度场景下实现了更精确的位姿估计。在两个标准数据集上的实验表明,MoD-SLAM达到了具有竞争力的性能,与现有最先进的单目SLAM系统相比,三维重建与定位精度分别提升了30%和15%。