Neural implicit representations have recently been demonstrated in many fields including Simultaneous Localization And Mapping (SLAM). Current neural SLAM can achieve ideal results in reconstructing bounded scenes, but this relies on the input of RGB-D images. Neural-based SLAM based only on RGB images is unable to reconstruct the scale of the scene accurately, and it also suffers from scale drift due to errors accumulated during tracking. To overcome these limitations, we present MoD-SLAM, a monocular dense mapping method that allows global pose optimization and 3D reconstruction in real-time in unbounded scenes. Optimizing scene reconstruction by monocular depth estimation and using loop closure detection to update camera pose enable detailed and precise reconstruction on large scenes. Compared to previous work, our approach is more robust, scalable and versatile. Our experiments demonstrate that MoD-SLAM has more excellent mapping performance than prior neural SLAM methods, especially in large borderless scenes.
翻译:神经隐式表示近年来在包括同时定位与建图(SLAM)在内的多个领域中得到应用。目前的神经SLAM方法在重建有界场景时可取得理想效果,但这一过程依赖于RGB-D图像的输入。仅基于RGB图像的神经SLAM无法准确重建场景尺度,并且由于跟踪过程中累积的误差,还会遭受尺度漂移问题。为克服这些局限,我们提出MoD-SLAM——一种支持无界场景中实时全局位姿优化与三维重建的单目稠密建图方法。通过单目深度估计优化场景重建,并利用闭环检测更新相机位姿,该方法能够在大场景中实现精细且精确的重建。与先前工作相比,我们的方法更具鲁棒性、扩展性和通用性。实验表明,MoD-SLAM在映射性能上优于以往神经SLAM方法,尤其是在大规模无边界场景中表现更为出色。