Global localization is a critical problem in autonomous navigation, enabling precise positioning without reliance on GPS. Modern global localization techniques often depend on dense LiDAR maps, which, while precise, require extensive storage and computational resources. Recent approaches have explored alternative methods, such as sparse maps and learned features, but they suffer from poor robustness and generalization. We propose SparseLoc, a global localization framework that leverages vision-language foundation models to generate sparse, semantic-topometric maps in a zero-shot manner. It combines this map representation with a Monte Carlo localization scheme enhanced by a novel late optimization strategy, ensuring improved pose estimation. By constructing compact yet highly discriminative maps and refining localization through a carefully designed optimization schedule, SparseLoc overcomes the limitations of existing techniques, offering a more efficient and robust solution for global localization. Our system achieves over a 5X improvement in localization accuracy compared to existing sparse mapping techniques. Despite utilizing only 1/500th of the points of dense mapping methods, it achieves comparable performance, maintaining an average global localization error below 5m and 2 degrees on KITTI sequences.
翻译:全局定位是自主导航中的关键问题,能够在无需依赖GPS的情况下实现精确定位。现代全局定位技术通常依赖于密集激光雷达地图,这类方法虽然精确,但需要大量的存储和计算资源。近期研究探索了替代方法,例如稀疏地图和学习特征,但这些方法存在鲁棒性和泛化性不足的问题。我们提出了SparseLoc,一种利用视觉-语言基础模型以零样本方式生成稀疏语义拓扑地图的全局定位框架。该框架将此地图表示与通过新型延迟优化策略增强的蒙特卡洛定位方案相结合,从而确保改进的姿态估计。通过构建紧凑且具有高区分度的地图,并借助精心设计的优化策略细化定位,SparseLoc克服了现有技术的局限性,为全局定位提供了更高效、更鲁棒的解决方案。与现有稀疏建图技术相比,我们的系统在定位精度上实现了超过5倍的提升。尽管仅使用密集建图方法1/500的点数,其性能仍可与之媲美,在KITTI序列上保持平均全局定位误差低于5米和2度。