In multi-robot systems, achieving coordinated missions remains a significant challenge due to the coupled nature of coordination behaviors and the lack of global information for individual robots. To mitigate these challenges, this paper introduces a novel approach, Bi-level Coordination Learning (Bi-CL), that leverages a bi-level optimization structure within a centralized training and decentralized execution paradigm. Our bi-level reformulation decomposes the original problem into a reinforcement learning level with reduced action space, and an imitation learning level that gains demonstrations from a global optimizer. Both levels contribute to improved learning efficiency and scalability. We note that robots' incomplete information leads to mismatches between the two levels of learning models. To address this, Bi-CL further integrates an alignment penalty mechanism, aiming to minimize the discrepancy between the two levels without degrading their training efficiency. We introduce a running example to conceptualize the problem formulation and apply Bi-CL to two variations of this example: route-based and graph-based scenarios. Simulation results demonstrate that Bi-CL can learn more efficiently and achieve comparable performance with traditional multi-agent reinforcement learning baselines for multi-robot coordination.
翻译:在多机器人系统中,由于协调行为的耦合性以及个体机器人缺乏全局信息,实现协调任务仍面临重大挑战。为解决这些问题,本文提出一种新颖方法——双层协调学习(Bi-CL),该方法在集中训练与分散执行的范式中利用双层优化结构。我们的双层重构将原始问题分解为两个层面:一是具有缩减动作空间的强化学习层面,二是从全局优化器获取示范的模仿学习层面。这两个层面共同提升了学习效率与可扩展性。注意到机器人信息不完整会导致两个学习层面之间存在模型失配,为此Bi-CL进一步集成了对齐惩罚机制,旨在最小化两个层面间的差异,同时不降低训练效率。我们引入一个运行示例来概念化问题建模,并将Bi-CL应用于该示例的两种变体:基于路径的场景与基于图的场景。仿真结果表明,与传统多智能体强化学习基线相比,Bi-CL能够以更高效率学习,并在多机器人协调中取得可比性能。