CoLF：面向视觉语言引导多机器人协同运输的一致性领导者-跟随者策略学习 (CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport)

In this study, we address vision-language-guided multi-robot cooperative transport, where each robot grounds natural-language instructions from onboard camera observations. A key challenge in this decentralized setting is perceptual misalignment across robots, where viewpoint differences and language ambiguity can yield inconsistent interpretations and degrade cooperative transport. To mitigate this problem, we adopt a dependent leader-follower design, where one robot serves as the leader and the other as the follower. Although such a leader-follower structure appears straightforward, learning with independent and symmetric agents often yields symmetric or unstable behaviors without explicit inductive biases. To address this challenge, we propose Consistent Leader-Follower (CoLF), a multi-agent reinforcement learning (MARL) framework for stable leader-follower role differentiation. CoLF consists of two key components: (1) an asymmetric policy design that induces leader-follower role differentiation, and (2) a mutual-information-based training objective that maximizes a variational lower bound, encouraging the follower to predict the leader's action from its local observation. The leader and follower policies are jointly optimized under the centralized training and decentralized execution (CTDE) framework to balance task execution and consistent cooperative behaviors. We validate CoLF in both simulation and real-robot experiments using two quadruped robots. The demonstration video is available at https://sites.google.com/view/colf/.

翻译：本研究针对视觉语言引导的多机器人协同运输任务展开研究，其中每个机器人需通过机载摄像头观测来理解自然语言指令。在这种去中心化场景中，一个关键挑战是机器人间的感知错位问题——视角差异和语言歧义可能导致不一致的指令解读，从而降低协同运输效率。为缓解该问题，我们采用依赖型领导者-跟随者架构，即一个机器人作为领导者，另一个作为跟随者。尽管这种领导者-跟随者结构看似简单，但在独立对称的智能体中进行学习时，若缺乏显式归纳偏置，往往会产生对称或不稳定的行为。为解决这一挑战，我们提出一致性领导者-跟随者（CoLF）框架——一种用于稳定实现领导者-跟随者角色分化的多智能体强化学习（MARL）框架。CoLF包含两个核心组件：（1）诱导角色分化的非对称策略设计；（2）基于互信息的训练目标，该目标通过最大化变分下界，促使跟随者根据其局部观测预测领导者动作。领导者与跟随者策略在中心化训练与去中心化执行（CTDE）框架下联合优化，以平衡任务执行与一致性协作行为。我们通过仿真和四足机器人实物实验验证了CoLF的有效性。演示视频详见：https://sites.google.com/view/colf/。