Spatial understanding from vision is crucial for robots operating in unstructured environments. In the real world, spatial understanding is often an ill-posed problem. There are a number of powerful classical methods that accurately regress relative pose, however, these approaches often lack the ability to leverage data-derived priors to resolve ambiguities. In multi-robot systems, these challenges are exacerbated by the need for accurate and frequent position estimates of cooperating agents. To this end, we propose CoViS-Net, a cooperative, multi-robot, visual spatial foundation model that learns spatial priors from data. Unlike prior work evaluated primarily on offline datasets, we design our model specifically for online evaluation and real-world deployment on cooperative robots. Our model is completely decentralized, platform agnostic, executable in real-time using onboard compute, and does not require existing network infrastructure. In this work, we focus on relative pose estimation and local Bird's Eye View (BEV) prediction tasks. Unlike classical approaches, we show that our model can accurately predict relative poses without requiring camera overlap, and predict BEVs of regions not visible to the ego-agent. We demonstrate our model on a multi-robot formation control task outside the confines of the laboratory.
翻译:视觉空间理解对于在非结构化环境中运行的机器人至关重要。在现实世界中,空间理解往往是一个不适定问题。现有多种强大的经典方法能够准确回归相对位姿,但此类方法通常缺乏利用数据先验来解决歧义性的能力。在多机器人系统中,由于需要对协作智能体进行精确且高频的位置估计,这些挑战进一步加剧。为此,我们提出CoViS-Net——一种协作式多机器人视觉空间基础模型,能够从数据中学习空间先验。与主要基于离线数据集进行评估的先前工作不同,我们专门针对在线评估和在协作机器人上的实际部署设计了该模型。我们的模型完全去中心化、与平台无关、可利用机载计算实现实时执行,且无需依赖现有网络基础设施。本文重点研究相对位姿估计和局部鸟瞰图预测任务。与经典方法不同,我们证明了该模型无需相机重叠即可准确预测相对位姿,并能预测自主智能体视野以外区域的鸟瞰图。我们还在实验室环境之外的多机器人编队控制任务上验证了该模型。