There is an emerging trend of using neural implicit functions for map representation in Simultaneous Localization and Mapping (SLAM). Some pioneer works have achieved encouraging results on RGB-D SLAM. In this paper, we present a dense RGB SLAM method with neural implicit map representation. To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitate the implicit map decoder. This design effectively fuses shape cues across different scales to facilitate map reconstruction. Our method simultaneously solves the camera motion and the neural implicit map by matching the rendered and input video frames. To facilitate optimization, we further propose a photometric warping loss in the spirit of multi-view stereo to better constrain the camera pose and scene geometry. We evaluate our method on commonly used benchmarks and compare it with modern RGB and RGB-D SLAM systems. Our method achieves favorable results than previous methods and even surpasses some recent RGB-D SLAM methods.The code is at poptree.github.io/DIM-SLAM/.
翻译:在同步定位与地图构建(SLAM)领域,利用神经隐式函数进行地图表示正成为新兴趋势。一些开创性工作已在RGB-D SLAM任务中取得令人鼓舞的成果。本文提出一种基于神经隐式地图表示的密集RGB SLAM方法。为在无深度输入条件下实现这一挑战性目标,我们引入分层特征体素以辅助隐式地图解码器。该设计有效融合了不同尺度的形状线索,促进地图重建。通过匹配渲染帧与输入视频帧,本方法可同时求解相机运动与神经隐式地图。为优化求解过程,我们进一步提出基于光度变形的损失函数,借鉴多视角立体视觉思想以更好地约束相机位姿与场景几何。我们在通用基准数据集上评估该方法,并与现代RGB及RGB-D SLAM系统进行对比。实验表明,本方法性能优于既有方法,甚至超越部分近期RGB-D SLAM方法。代码见于poptree.github.io/DIM-SLAM/。