We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images. This design reduces model complexity significantly, the size of our frontend is only 35\% that of comparable state-of-the-art methods, while enhancing the quality of two-view constraints used in the pipeline. In the backend, we construct a specially designed Sim(3) pose graph that incorporates loop closures to address accumulated drift. Extensive experiments demonstrate that our approach achieves superior performance in both camera tracking and dense 3D reconstruction quality compared to current methods. Github repository: https://github.com/zhangganlin/vista-slam
翻译:本文提出ViSTA-SLAM,一种无需相机内参即可运行的实时单目视觉SLAM系统,使其能够广泛适用于不同的相机配置。该系统的核心是采用轻量级对称双视图关联(STA)模型作为前端,该模型仅需两幅RGB图像即可同时估计相对相机位姿并回归局部点云图。这种设计显著降低了模型复杂度——我们的前端模型尺寸仅为同类先进方法的35%,同时提升了流程中所用双视图约束的质量。在后端,我们构建了一个专门设计的Sim(3)位姿图,通过融入回环检测以解决累积漂移问题。大量实验表明,与现有方法相比,我们的方法在相机跟踪和稠密三维重建质量方面均取得了更优的性能。项目代码仓库:https://github.com/zhangganlin/vista-slam