PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment

Neural implicit scene representations have recently shown encouraging results in dense visual SLAM. However, existing methods produce low-quality scene reconstruction and low-accuracy localization performance when scaling up to large indoor scenes and long sequences. These limitations are mainly due to their single, global radiance field with finite capacity, which does not adapt to large scenarios. Their end-to-end pose networks are also not robust enough with the growth of cumulative errors in large scenes. To this end, we present PLGSLAM, a neural visual SLAM system which performs high-fidelity surface reconstruction and robust camera tracking in real time. To handle large-scale indoor scenes, PLGSLAM proposes a progressive scene representation method which dynamically allocates new local scene representation trained with frames within a local sliding window. This allows us to scale up to larger indoor scenes and improves robustness (even under pose drifts). In local scene representation, PLGSLAM utilizes tri-planes for local high-frequency features. We also incorporate multi-layer perceptron (MLP) networks for the low-frequency feature, smoothness, and scene completion in unobserved areas. Moreover, we propose local-to-global bundle adjustment method with a global keyframe database to address the increased pose drifts on long sequences. Experimental results demonstrate that PLGSLAM achieves state-of-the-art scene reconstruction results and tracking performance across various datasets and scenarios (both in small and large-scale indoor environments). The code will be open-sourced upon paper acceptance.

翻译：神经隐式场景表示近年来在密集视觉SLAM中展现出令人鼓舞的成果。然而，现有方法在扩展到大规模室内场景和长序列时，会产生低质量的场景重建和低精度的定位性能。这些局限主要源于其单一、全局且容量有限的辐射场，难以适应大型场景。同时，其端到端位姿网络随着大型场景中累积误差的增长，鲁棒性也不足。为此，我们提出PLGSLAM——一种神经视觉SLAM系统，能够实时实现高保真表面重建和鲁棒相机追踪。为处理大规模室内场景，PLGSLAM提出渐进式场景表示方法，该方法动态分配新的局部场景表示，利用局部滑动窗口内的帧进行训练。这使我们能扩展到更大规模的室内场景，并提升鲁棒性（即使在存在位姿漂移的情况下）。在局部场景表示中，PLGSLAM采用三平面编码局部高频特征。同时，我们引入多层感知机网络，用于捕捉低频特征、平滑度以及未观测区域的场景补全。此外，我们提出结合全局关键帧数据库的局部到全局光束法平差方法，以应对长序列中加剧的位姿漂移。实验结果表明，PLGSLAM在不同数据集和场景（包括小型与大规模室内环境）中均实现了最先进的场景重建结果与追踪性能。代码将在论文被接收后开源。