Successful cooperation among decentralized agents requires each agent to quickly adapt its plan to the behavior of other agents. In scenarios where agents cannot confidently predict one another's intentions and plans, language communication can be crucial for ensuring safety. In this work, we focus on path-level cooperation in which agents must adapt their paths to one another in order to avoid collisions or perform physical collaboration such as joint carrying. In particular, we propose a safe and interpretable multimodal path planning method, CaPE (Code as Path Editor), which generates and updates path plans for an agent based on the environment and language communication from other agents. CaPE leverages a vision-language model (VLM) to synthesize a path editing program verified by a model-based planner, grounding communication to path plan updates in a safe and interpretable way. We evaluate our approach in diverse simulated and real-world scenarios, including multi-robot and human-robot cooperation in autonomous driving, household, and joint carrying tasks. Experimental results demonstrate that CaPE can be integrated into different robotic systems as a plug-and-play module, greatly enhancing a robot's ability to align its plan to language communication from other robots or humans. We also show that the combination of the VLM-based path editing program synthesis and model-based planning safety enables robots to achieve open-ended cooperation while maintaining safety and interpretability.
翻译:分散式智能体间的成功协作要求每个智能体能够快速根据其他智能体的行为调整自身规划。在智能体无法可靠预测彼此意图和规划的场景中,语言通信对于保障安全性至关重要。本研究聚焦路径级协作,即智能体必须相互调整路径以避免碰撞或完成物理协作任务(如联合搬运)。我们提出一种安全可解释的多模态路径规划方法CaPE(代码化路径编辑器),该方法能根据环境信息和其他智能体的语言通信生成并更新智能体的路径规划。CaPE利用视觉语言模型合成经模型规划器验证的路径编辑程序,以安全可解释的方式将通信内容落实到路径规划的更新中。我们在多样化仿真与真实场景中评估了该方法,包括自动驾驶、家庭环境及联合搬运任务中的多机器人协作与人机协作。实验结果表明,CaPE可作为即插即用模块集成到不同机器人系统中,显著提升机器人根据其他机器人或人类语言通信调整规划的能力。我们还证明,基于视觉语言模型的路径编辑程序合成与模型规划安全验证的结合,使机器人能够在保持安全性与可解释性的同时实现开放式协作。