Accurate localization in autonomous driving is critical for successful missions including environmental mapping and survivor searches. In visually challenging environments, including low-light conditions, overexposure, illumination changes, and high parallax, the performance of conventional visual odometry methods significantly degrade undermining robust robotic navigation. Researchers have recently proposed LiDAR-inertial-visual odometry (LIVO) frameworks, that integrate LiDAR, IMU, and camera sensors, to address these challenges. This paper extends the FAST-LIVO2-based framework by introducing a hybrid approach that integrates direct photometric methods with descriptor-based feature matching. For the descriptor-based feature matching, this work proposes pairs of ORB with the Hamming distance, SuperPoint with SuperGlue, SuperPoint with LightGlue, and XFeat with the mutual nearest neighbor. The proposed configurations are benchmarked by accuracy, computational cost, and feature tracking stability, enabling a quantitative comparison of the adaptability and applicability of visual descriptors. The experimental results reveal that the proposed hybrid approach outperforms the conventional sparse-direct method. Although the sparse-direct method often fails to converge in regions where photometric inconsistency arises due to illumination changes, the proposed approach still maintains robust performance under the same conditions. Furthermore, the hybrid approach with learning-based descriptors enables robust and reliable visual state estimation across challenging environments.
翻译:自主驾驶中的精确定位对于环境地图绘制和幸存者搜索等关键任务至关重要。在视觉挑战性环境下,包括低光照条件、过度曝光、光照变化和高视差,传统视觉里程计方法的性能显著下降,削弱了机器人的鲁棒导航能力。研究者近期提出了集成激光雷达、惯性测量单元和相机传感器的激光雷达-惯性-视觉里程计(LIVO)框架以应对这些挑战。本文通过引入一种融合直接光度法与描述子特征匹配的混合方法,扩展了基于FAST-LIVO2的框架。对于基于描述子的特征匹配,本研究提出了ORB与汉明距离、SuperPoint与SuperGlue、SuperPoint与LightGlue、以及XFeat与互最近邻的配对方案。通过精度、计算成本和特征跟踪稳定性对所提出的配置进行基准测试,实现了视觉描述子适应性和适用性的定量比较。实验结果表明,所提出的混合方法优于传统的稀疏直接法。尽管稀疏直接法在因光照变化产生光度不一致的区域中常无法收敛,但所提方法在相同条件下仍能保持鲁棒性能。此外,基于学习型描述子的混合方法在挑战性环境中实现了稳健可靠的视觉状态估计。