Neural implicit representations have recently become popular in simultaneous localization and mapping (SLAM), especially in dense visual SLAM. However, previous works in this direction either rely on RGB-D sensors, or require a separate monocular SLAM approach for camera tracking and do not produce high-fidelity dense 3D scene reconstruction. In this paper, we present NICER-SLAM, a dense RGB SLAM system that simultaneously optimizes for camera poses and a hierarchical neural implicit map representation, which also allows for high-quality novel view synthesis. To facilitate the optimization process for mapping, we integrate additional supervision signals including easy-to-obtain monocular geometric cues and optical flow, and also introduce a simple warping loss to further enforce geometry consistency. Moreover, to further boost performance in complicated indoor scenes, we also propose a local adaptive transformation from signed distance functions (SDFs) to density in the volume rendering equation. On both synthetic and real-world datasets we demonstrate strong performance in dense mapping, tracking, and novel view synthesis, even competitive with recent RGB-D SLAM systems.
翻译:神经隐式表示近年来在同步定位与地图构建(SLAM)领域,尤其是密集视觉SLAM中变得流行。然而,此前的相关研究要么依赖RGB-D传感器,要么需要独立的单目SLAM方法进行相机追踪,且无法生成高保真度的密集三维场景重建。本文提出NICER-SLAM,一种密集RGB SLAM系统,该系统同时优化相机位姿与分层神经隐式地图表示,并支持高质量的新视角合成。为优化建图过程,我们融入了包括易获取的单目几何线索与光流在内的额外监督信号,并引入一种简单的变形损失以进一步强化几何一致性。此外,为提升复杂室内场景的性能,我们还提出一种将符号距离函数(SDF)局部自适应转换为体积渲染方程中密度的方案。在合成与真实数据集上,我们在密集建图、追踪及新视角合成方面均展现出强劲性能,甚至可与近期RGB-D SLAM系统相媲美。