(Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this scene reconstruction, in the form of sparse pointclouds of visual landmarks, is often only used within the SLAM system because navigation and planning methods require different map representations. In this work, we therefore investigate a more scalable Visual SLAM (VSLAM) approach without reconstruction, mainly based on approaches for two-view loop closures. By restricting the map to a sparse keyframed pose graph without dense geometry representations, our `2GO' system achieves efficient optimization with competitive absolute trajectory accuracy. In particular, we find that recent advancements in image matching and monocular depth priors enable very accurate trajectory optimization without BA. We conduct extensive experiments on diverse datasets, including large-scale scenarios, and provide a detailed analysis of the trade-offs between runtime, accuracy, and map size. Our results demonstrate that this streamlined approach supports real-time performance, scales well in map size and trajectory duration, and effectively broadens the capabilities of VSLAM for long-duration deployments to large environments.
翻译:(视觉)同步定位与建图(SLAM)仍然是实现自主系统在大规模环境中导航与理解的基础性挑战。传统SLAM方法难以在效率与精度间取得平衡,尤其在大规模场景中,场景重建与光束法平差(BA)需要大量计算资源。然而,这种以视觉路标稀疏点云形式存在的场景重建,通常仅在SLAM系统内部使用,因为导航与规划方法需要不同的地图表示形式。因此,本研究探索了一种无需重建的、更具可扩展性的视觉SLAM(VSLAM)方法,主要基于双视图回环检测技术。通过将地图限制为不含稠密几何表示的稀疏关键帧姿态图,我们的`2GO`系统实现了高效优化,同时保持了具有竞争力的绝对轨迹精度。特别地,我们发现图像匹配与单目深度先验领域的最新进展,使得无需BA即可实现高精度轨迹优化。我们在多样化数据集(包括大规模场景)上进行了大量实验,并对运行时间、精度与地图规模之间的权衡关系进行了详细分析。结果表明,这种精简方法支持实时性能,在地图规模与轨迹时长方面具有良好的可扩展性,有效拓展了VSLAM在长期部署至大规模环境中的能力。