Neural implicit representations have recently been demonstrated in many fields including Simultaneous Localization And Mapping (SLAM). Current neural SLAM can achieve ideal results in reconstructing bounded scenes, but this relies on the input of RGB-D images. Neural-based SLAM based only on RGB images is unable to reconstruct the scale of the scene accurately, and it also suffers from scale drift due to errors accumulated during tracking. To overcome these limitations, we present MoD-SLAM, a monocular dense mapping method that allows global pose optimization and 3D reconstruction in real-time in unbounded scenes. Optimizing scene reconstruction by monocular depth estimation and using loop closure detection to update camera pose enable detailed and precise reconstruction on large scenes. Compared to previous work, our approach is more robust, scalable and versatile. Our experiments demonstrate that MoD-SLAM has more excellent mapping performance than prior neural SLAM methods, especially in large borderless scenes.
翻译:神经隐式表示最近在同步定位与建图(SLAM)等多个领域得到应用。当前神经SLAM在重建有界场景时能取得理想效果,但这依赖于RGB-D图像输入。仅基于RGB图像的神经SLAM无法准确重建场景尺度,且因跟踪过程中累积误差导致尺度漂移。为克服这些限制,我们提出MoD-SLAM——一种能够在无界场景中实现全局位姿优化与实时三维重建的单目稠密建图方法。通过单目深度估计优化场景重建,并利用闭环检测更新相机位姿,该方法能够对大型场景进行精细精确的重建。与先前工作相比,我们的方法更具鲁棒性、可扩展性和通用性。实验表明,MoD-SLAM在建模性能上优于先前的神经SLAM方法,尤其是在大规模无边界场景中。