视觉基础模型在行星地面-空中机器人团队中的领域泛化跨视角定位 (Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams)

from arxiv, 7 pages, 10 figures. Presented at the International Conference on Space Robotics (iSpaRo) 2025 in Sendai, Japan. Dataset available: https://doi.org/10.5281/zenodo.17364038

Accurate localisation in planetary robotics enables the advanced autonomy required to support the increased scale and scope of future missions. The successes of the Ingenuity helicopter and multiple planetary orbiters lay the groundwork for future missions that use ground-aerial robotic teams. In this paper, we consider rovers using machine learning to localise themselves in a local aerial map using limited field-of-view monocular ground-view RGB images as input. A key consideration for machine learning methods is that real space data with ground-truth position labels suitable for training is scarce. In this work, we propose a novel method of localising rovers in an aerial map using cross-view-localising dual-encoder deep neural networks. We leverage semantic segmentation with vision foundation models and high volume synthetic data to bridge the domain gap to real images. We also contribute a new cross-view dataset of real-world rover trajectories with corresponding ground-truth localisation data captured in a planetary analogue facility, plus a high volume dataset of analogous synthetic image pairs. Using particle filters for state estimation with the cross-view networks allows accurate position estimation over simple and complex trajectories based on sequences of ground-view images.

翻译：行星机器人中的精确定位是实现先进自主性的关键，这对于支持未来任务规模与范围的扩展至关重要。Ingenuity直升机与多颗行星轨道器的成功为未来利用地面-空中机器人团队的任务奠定了基础。本文研究了一种方法：漫游车通过机器学习，以有限视场的单目地面视角RGB图像作为输入，在局部航空地图中实现自身定位。机器学习方法的一个关键考量在于，具有可用于训练的精确位置标签的真实空间数据极为稀缺。本工作提出了一种新颖的方法，利用跨视角定位双编码器深度神经网络在航空地图中定位漫游车。我们通过视觉基础模型实现的语义分割与海量合成数据，弥合了与真实图像之间的领域差异。此外，我们贡献了一个新的跨视角数据集，包含在行星模拟设施中捕获的真实漫游车轨迹及其对应的精确位置数据，以及一个海量的类似合成图像对数据集。将粒子滤波器用于跨视角网络的状态估计，能够基于地面视角图像序列，在简单和复杂的轨迹上实现精确的位置估计。

相关内容

在行

关注 1

「在行」 http://zaih.com 是一个全新的经验交谈平台。当你遇到任何领域问题，都可以在这找到一个合适的行家，交付一点费用，获得一对一见面交谈机会，让行家为你答疑解惑、出谋划策。「在行」用共享经济的理念改善知识服务的效率，致力于打造一个社会化的个人智库。「在行」是果壳网孵化项目，于2015年3月正式运营。

《无人系统：基于视觉定位算法的快速原型开发框架（含代码）》2025最新95页

专知会员服务

27+阅读 · 2025年9月22日

面向机器人操作的基于大型视觉‑语言模型（VLM）的视觉‑语言‑动作（VLA）模型综述

专知会员服务

34+阅读 · 2025年8月19日

基于视觉的无人机定位与导航方法研究综述

专知会员服务

21+阅读 · 2025年5月21日

多模态融合与视觉-语言模型：面向机器人视觉的综述

专知会员服务

35+阅读 · 2025年4月5日