In this work, we study Cooperative Spatial Intelligence, the ability of decentralized embodied agents to coordinate effectively under dynamic environmental constraints across city-scale outdoor domains. We introduce Sentinel Challenge, a benchmark where multiple decentralized embodied agents must communicate in natural language to agree on a mutually safe and convenient meeting point within large, city-scale outdoor environments. Each agent must then navigate safely while avoiding dynamic sentinels patrolling the area, using a tool that provides coarse spatial information. To address this, we propose CoSaR (Cooperative Spatial Reasoning and Planning), a framework that bridges the high-level communication and planning abilities of foundation models with the precision of classical spatial navigation algorithms. CoSaR enables agents to exchange situational updates, reason over evolving spatial constraints, and collaboratively replan trajectories. Evaluated across 14 city-level scenes with 3-5 agents, CoSaR consistently leads to faster gathering, shorter path lengths, and improved safety. Our results demonstrate that integrating dynamic communication with spatial reasoning is essential for robust multi-agent cooperation. By formalizing this new setting and providing a scalable benchmark, we aim to build a foundation for advancing cooperative spatial intelligence in embodied multi-agent systems. Code and challenge are available at https://github.com/UMass-Embodied-AGI/Sentinel.
翻译:本文研究协同空间智能——即分散式具身智能体在跨城市尺度的室外领域中,于动态环境约束下有效协调的能力。我们提出Sentinel挑战基准,在该基准中,多个分散式具身智能体需通过自然语言通信,在大型城市场景中协商出既安全又便捷的公共会合点。每个智能体必须利用提供粗粒度空间信息的工具,在规避巡逻动态哨兵的同时安全导航。为此,我们提出CoSaR(协同空间推理与规划)框架,该框架将基础模型的高层通信与规划能力同经典空间导航算法的精度相衔接。CoSaR使智能体能够交换态势更新、推理动态空间约束并协作重规划轨迹。在14个城市级场景(含3-5个智能体)上的评估表明,CoSaR始终实现更快的集结速度、更短的路径长度及更高的安全性。我们的结果证明,将动态通信与空间推理相结合是鲁棒多智能体协同的关键。通过形式化这一新场景并提供可扩展基准,我们旨在为推进具身多智能体系统中的协同空间智能奠定基础。代码与挑战赛详见https://github.com/UMass-Embodied-AGI/Sentinel。