Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors. Traditional localization approaches rely on high-definition (HD) maps, which consist of precisely annotated landmarks. However, building HD map is expensive and challenging to scale up. Given these limitations, leveraging navigation maps has emerged as a promising low-cost alternative for localization. Current approaches based on navigation maps can achieve highly accurate localization, but their complex matching strategies lead to unacceptable inference latency that fails to meet the real-time demands. To address these limitations, we propose a novel transformer-based neural re-localization method. Inspired by image registration, our approach performs a coarse-to-fine neural feature registration between navigation map and visual bird's-eye view features. Our method significantly outperforms the current state-of-the-art OrienterNet on both the nuScenes and Argoverse datasets, which is nearly 10%/20% localization accuracy and 30/16 FPS improvement on single-view and surround-view input settings, separately. We highlight that our research presents an HD-map-free localization method for autonomous driving, offering cost-effective, reliable, and scalable performance in challenging driving environments.
翻译:鲁棒定位是自动驾驶的基石,在GPS信号受多径误差影响的城市复杂环境中尤为重要。传统定位方法依赖由精确标注地标构成的高清(HD)地图,但构建高清地图成本高昂且难以扩展。鉴于这些限制,利用导航地图已成为一种有前景的低成本定位替代方案。当前基于导航地图的方法虽能实现高精度定位,但其复杂的匹配策略导致难以接受的推理延迟,无法满足实时性需求。为突破这些局限,我们提出一种基于Transformer的新型神经重定位方法。受图像配准思想启发,本方法在导航地图与视觉鸟瞰特征之间执行从粗到精的神经特征配准。在nuScenes和Argoverse数据集上,我们的方法显著超越当前最先进的OrienterNet模型,在单视角与环视输入设置下分别实现近10%/20%的定位精度提升及30/16 FPS的帧率改进。我们强调,本研究提出了一种无需高清地图的自动驾驶定位方案,在复杂驾驶环境中实现了经济高效、可靠且可扩展的性能。