SuperVINS：面向挑战性成像条件的实时视觉-惯性SLAM框架 (SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions)

The traditional visual-inertial SLAM system often struggles with stability under low-light or motion-blur conditions, leading to potential lost of trajectory tracking. High accuracy and robustness are essential for the long-term and stable localization capabilities of SLAM systems. Addressing the challenges of enhancing robustness and accuracy in visual-inertial SLAM, this paper propose SuperVINS, a real-time visual-inertial SLAM framework designed for challenging imaging conditions. In contrast to geometric modeling, deep learning features are capable of fully leveraging the implicit information present in images, which is often not captured by geometric features. Therefore, SuperVINS, developed as an enhancement of VINS-Fusion, integrates the deep learning neural network model SuperPoint for feature point extraction and loop closure detection. At the same time, a deep learning neural network LightGlue model for associating feature points is integrated in front-end feature matching. A feature matching enhancement strategy based on the RANSAC algorithm is proposed. The system is allowed to set different masks and RANSAC thresholds for various environments, thereby balancing computational cost and localization accuracy. Additionally, it allows for flexible training of specific SuperPoint bag of words tailored for loop closure detection in particular environments. The system enables real-time localization and mapping. Experimental validation on the well-known EuRoC dataset demonstrates that SuperVINS is comparable to other visual-inertial SLAM system in accuracy and robustness across the most challenging sequences. This paper analyzes the advantages of SuperVINS in terms of accuracy, real-time performance, and robustness. To facilitate knowledge exchange within the field, we have made the code for this paper publicly available.

翻译：传统视觉-惯性SLAM系统在低光照或运动模糊条件下常面临稳定性不足的问题，易导致轨迹跟踪丢失。高精度与强鲁棒性对于SLAM系统的长期稳定定位能力至关重要。针对提升视觉-惯性SLAM鲁棒性与精度的挑战，本文提出SuperVINS——一种专为挑战性成像条件设计的实时视觉-惯性SLAM框架。相较于几何建模方法，深度学习特征能够充分挖掘图像中几何特征难以捕捉的隐含信息。因此，基于VINS-Fusion改进的SuperVINS集成了深度学习神经网络模型SuperPoint进行特征点提取与回环检测，同时在前端特征匹配中整合了用于特征点关联的深度学习神经网络LightGlue模型。本文提出了一种基于RANSAC算法的特征匹配增强策略，允许系统针对不同环境设置差异化掩码与RANSAC阈值，从而平衡计算成本与定位精度。此外，系统支持针对特定环境灵活训练专用于回环检测的SuperPoint词袋模型。该系统能够实现实时定位与建图。在权威EuRoC数据集上的实验验证表明，SuperVINS在最挑战性序列中的精度与鲁棒性均与主流视觉-惯性SLAM系统相当。本文从精度、实时性与鲁棒性三个维度系统分析了SuperVINS的优势。为促进领域内知识共享，本文已公开相关代码。