This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images. The proposed method addresses limitations in existing cross-view localization methods that struggle to handle noise sources such as moving objects and seasonal variations. It is the first sparse visual-only method that enhances perception in dynamic environments by detecting view-consistent key points and their corresponding deep features from ground and satellite views, while removing off-the-ground objects and establishing homography transformation between the two views. Moreover, the proposed method incorporates a spatial embedding approach that leverages camera intrinsic and extrinsic information to reduce the ambiguity of purely visual matching, leading to improved feature matching and overall pose estimation accuracy. The method exhibits strong generalization and is robust to environmental changes, requiring only geo-poses as ground truth. Extensive experiments on the KITTI and Ford Multi-AV Seasonal datasets demonstrate that our proposed method outperforms existing state-of-the-art methods, achieving median spatial accuracy errors below $0.5$ meters along the lateral and longitudinal directions, and a median orientation accuracy error below 2 degrees.
翻译:本文提出一种面向室外机器人的细粒度自定位方法,可灵活利用车载摄像头与易获取的卫星图像。该方法解决了现有跨视角定位方法难以应对移动物体和季节变化等噪声源的局限性,是首个纯视觉稀疏方法——通过检测地面与卫星视角间的视角一致关键点及其对应深度特征,剔除地面非附着物体,并建立两视角间的单应变换,从而增强动态环境感知能力。进一步地,该方法引入空间嵌入机制,利用相机内参和外参信息降低纯视觉匹配的歧义性,提升特征匹配与整体位姿估计精度。该方法具备强泛化能力与鲁棒性,仅需地理位姿作为真值即可工作。在KITTI和Ford Multi-AV Seasonal数据集上的大量实验表明,所提方法优于现有最先进方法,横向与纵向中值空间定位误差低于0.5米,中值朝向角误差低于2度。