We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation while estimating the current camera position in the scene. We incorporate the latest advances in Neural Radiance Fields (NeRF) into a SLAM system, resulting in an efficient and accurate dense visual SLAM method. Our scene representation consists of multi-scale axis-aligned perpendicular feature planes and shallow decoders that, for each point in the continuous space, decode the interpolated features into Truncated Signed Distance Field (TSDF) and RGB values. Our extensive experiments on three standard datasets, Replica, ScanNet, and TUM RGB-D show that ESLAM improves the accuracy of 3D reconstruction and camera localization of state-of-the-art dense visual SLAM methods by more than 50%, while it runs up to 10 times faster and does not require any pre-training.
翻译:我们提出了ESLAM,一种用于同步定位与地图构建(SLAM)的高效隐式神经表示方法。ESLAM以顺序方式读取未知相机位姿的RGB-D帧,在估计当前相机场景位置的同时增量式重建场景表示。我们融合了神经辐射场(NeRF)的最新进展于SLAM系统中,形成一种高效且精确的稠密视觉SLAM方法。我们的场景表示由多尺度轴对齐垂直特征平面及浅层解码器构成,对于连续空间中的每个点,该解码器将插值特征解码为截断有向距离场(TSDF)和RGB值。我们在三个标准数据集Replica、ScanNet和TUM RGB-D上的广泛实验表明:ESLAM将现有最先进稠密视觉SLAM方法的3D重建与相机定位精度提升了50%以上,同时运行速度提升至10倍,且无需任何预训练。