We present a novel framework using Energy-Based Models (EBMs) for localizing a ground vehicle mounted with a range sensor against satellite imagery in the absence of GPS. Lidar sensors have become ubiquitous on autonomous vehicles for describing its surrounding environment. Map priors are typically built using the same sensor modality for localization purposes. However, these map building endeavors using range sensors are often expensive and time-consuming. Alternatively, we leverage the use of satellite images as map priors, which are widely available, easily accessible, and provide comprehensive coverage. We propose a method using convolutional transformers that performs accurate metric-level localization in a cross-modal manner, which is challenging due to the drastic difference in appearance between the sparse range sensor readings and the rich satellite imagery. We train our model end-to-end and demonstrate our approach achieving higher accuracy than the state-of-the-art on KITTI, Pandaset, and a custom dataset.
翻译:本文提出了一种新型框架,利用基于能量的模型(EBMs)在无GPS条件下,将搭载距离传感器与卫星图像的地面车辆进行定位。激光雷达传感器已成为自动驾驶车辆描述周围环境的普遍设备。通常,用于定位的建图先验基于相同传感器模态构建。然而,这种利用距离传感器的建图工作往往成本高昂且耗时。作为替代方案,我们利用广泛可用、易于获取且覆盖全面的卫星图像作为地图先验。我们提出了一种基于卷积Transformer的方法,能够在跨模态方式下实现精确的度量级定位,而由于稀疏距离传感器读数与丰富卫星图像之间显著的外观差异,这一任务极具挑战性。我们以端到端方式训练模型,并在KITTI、Pandaset及自定义数据集上证明,该方法相比现有技术实现了更高精度。