Multi-Robot System (MRS) has garnered widespread research interest and fostered tremendous interesting applications, especially in cooperative control fields. Yet little light has been shed on the compound ability of formation, monitoring and defence in decentralized large-scale MRS for pursuit avoidance, which puts stringent requirements on the capability of coordination and adaptability. In this paper, we put forward a decentralized Imitation learning based Alternative Multi-Agent Proximal Policy Optimization (IA-MAPPO) algorithm to provide a flexible and communication-economic solution to execute the pursuit avoidance task in well-formed swarm. In particular, a policy-distillation based MAPPO executor is firstly devised to capably accomplish and swiftly switch between multiple formations in a centralized manner. Furthermore, we utilize imitation learning to decentralize the formation controller, so as to reduce the communication overheads and enhance the scalability. Afterwards, alternative training is leveraged to compensate the performance loss incurred by decentralization. The simulation results validate the effectiveness of IA-MAPPO and extensive ablation experiments further show the performance comparable to a centralized solution with significant decrease in communication overheads.
翻译:多机器人系统(MRS)已引发广泛研究兴趣并催生大量有趣应用,尤其是在协同控制领域。然而,针对分散式大规模MRS在追逃任务中兼具编队、监控与防御的复合能力鲜有研究,这对系统的协同性和适应性提出了严苛要求。本文提出一种基于模仿学习的分散式备选多智能体近端策略优化(IA-MAPPO)算法,为良好编队群体中的追逃任务提供灵活且低通信开销的解决方案。具体而言,首先设计基于策略蒸馏的MAPPO执行器,使其能以集中式方式完成并快速切换多种编队形态;其次,利用模仿学习将编队控制器分散化,以降低通信开销并增强可扩展性;最后,采用备选训练机制补偿分散化导致的性能损失。仿真结果验证了IA-MAPPO的有效性,大量消融实验进一步表明该方法在显著降低通信开销的同时,性能可媲美集中式方案。