We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, and ScanNet++, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality. The project page and source code will be made available at https://hi-slam2.github.io/.
翻译:我们提出了HI-SLAM2,这是一个几何感知的高斯SLAM系统,仅使用RGB输入即可实现快速、准确的单目场景重建。现有的神经SLAM或基于3DGS的SLAM方法通常在渲染质量与几何精度之间进行权衡,我们的研究表明,仅凭RGB输入即可同时实现两者。我们方法的核心思想是通过将易于获取的单目先验信息与基于学习的稠密SLAM相结合,来增强几何估计能力,然后使用3D高斯泼溅作为核心地图表示来高效建模场景。在闭环检测时,我们的方法通过高效的位姿图光束法平差,并基于锚定关键帧的更新对3D高斯单元进行显式变形,从而实现即时地图更新,确保实时全局一致性。此外,我们引入了一种基于网格的尺度对齐策略,以在先验深度中保持更好的尺度一致性,从而获得更精细的深度细节。通过在Replica、ScanNet和ScanNet++数据集上进行的大量实验,我们证明了该方法相对于现有神经SLAM方法的显著改进,甚至在重建和渲染质量上超越了基于RGB-D的方法。项目页面和源代码将在https://hi-slam2.github.io/上提供。