Neural implicit scene representations have recently shown encouraging results in dense visual SLAM. However, existing methods produce low-quality scene reconstruction and low-accuracy localization performance when scaling up to large indoor scenes and long sequences. These limitations are mainly due to their single, global radiance field with finite capacity, which does not adapt to large scenarios. Their end-to-end pose networks are also not robust enough with the growth of cumulative errors in large scenes. To this end, we introduce PLGSLAM, a neural visual SLAM system capable of high-fidelity surface reconstruction and robust camera tracking in real-time. To handle large-scale indoor scenes, PLGSLAM proposes a progressive scene representation method which dynamically allocates new local scene representation trained with frames within a local sliding window. This allows us to scale up to larger indoor scenes and improves robustness (even under pose drifts). In local scene representation, PLGSLAM utilizes tri-planes for local high-frequency features with multi-layer perceptron (MLP) networks for the low-frequency feature, achieving smoothness and scene completion in unobserved areas. Moreover, we propose local-to-global bundle adjustment method with a global keyframe database to address the increased pose drifts on long sequences. Experimental results demonstrate that PLGSLAM achieves state-of-the-art scene reconstruction results and tracking performance across various datasets and scenarios (both in small and large-scale indoor environments).
翻译:神经隐式场景表示在密集视觉SLAM中近期展现出令人鼓舞的结果。然而,现有方法在扩展至大规模室内场景和长序列时,会产生低质量的场景重建和低精度的定位性能。这些限制主要源于其单一的全局辐射场容量有限,无法适应大场景,且端到端姿态网络在累积误差增长时鲁棒性不足。为此,我们提出PLGSLAM,一种能够实时实现高保真表面重建和鲁棒相机追踪的神经视觉SLAM系统。为处理大规模室内场景,PLGSLAM提出渐进式场景表示方法,该方法动态分配新的局部场景表示,并通过局部滑动窗口内的帧进行训练。这使得系统可扩展至更大的室内场景,并提升鲁棒性(即使在姿态漂移下)。在局部场景表示中,PLGSLAM利用三平面表示局部高频特征,结合多层感知机网络表示低频特征,在未观测区域实现平滑性与场景补全。此外,我们提出结合全局关键帧数据库的局部到全局束调整方法,以解决长序列中日益加剧的姿态漂移问题。实验结果表明,PLGSLAM在多种数据集和场景(包括小规模和大规模室内环境)中均实现了最先进的场景重建结果和追踪性能。