Neural implicit representations have recently been demonstrated in many fields including Simultaneous Localization And Mapping (SLAM). Current neural SLAM can achieve ideal results in reconstructing bounded scenes, but this relies on the input of RGB-D images. Neural-based SLAM based only on RGB images is unable to reconstruct the scale of the scene accurately, and it also suffers from scale drift due to errors accumulated during tracking. To overcome these limitations, we present MoD-SLAM, a monocular dense mapping method that allows global pose optimization and 3D reconstruction in real-time in unbounded scenes. Optimizing scene reconstruction by monocular depth estimation and using loop closure detection to update camera pose enable detailed and precise reconstruction on large scenes. Compared to previous work, our approach is more robust, scalable and versatile. Our experiments demonstrate that MoD-SLAM has more excellent mapping performance than prior neural SLAM methods, especially in large borderless scenes.
翻译:神经隐式表示近年来在同时定位与地图构建(SLAM)等多个领域展现了显著成效。现有神经SLAM方法在重建有界场景时能取得理想结果,但这依赖于RGB-D图像的输入。仅基于RGB图像的神经SLAM方法不仅无法准确重建场景尺度,还会因跟踪过程中的累积误差导致尺度漂移问题。为克服这些局限,我们提出MoD-SLAM——一种能够在无界场景中实现实时全局位姿优化与三维重建的单目稠密建图方法。通过单目深度估计优化场景重建,并利用闭环检测更新相机位姿,该方法能在大型场景中实现精细且精确的重建。相较于先前工作,我们提出的方案具有更强的鲁棒性、可扩展性和通用性。实验表明,MoD-SLAM在映射性能上显著优于现有神经SLAM方法,尤其是在无边界大型场景中表现更为卓越。