In recent years, formation control of unmanned vehicles has received considerable interest, driven by the progress in autonomous systems and the imperative for multiple vehicles to carry out diverse missions. In this paper, we address the problem of behavior-based formation control of mobile robots, where we use safe multi-agent reinforcement learning~(MARL) to ensure the safety of the robots by eliminating all collisions during training and execution. To ensure safety, we implemented distributed model predictive control safety filters to override unsafe actions. We focus on achieving behavior-based formation without having individual reference targets for the robots, and instead use targets for the centroid of the formation. This formulation facilitates the deployment of formation control on real robots and improves the scalability of our approach to more robots. The task cannot be addressed through optimization-based controllers without specific individual reference targets for the robots and information about the relative locations of each robot to the others. That is why, for our formulation we use MARL to train the robots. Moreover, in order to account for the interactions between the agents, we use attention-based critics to improve the training process. We train the agents in simulation and later on demonstrate the resulting behavior of our approach on real Turtlebot robots. We show that despite the agents having very limited information, we can still safely achieve the desired behavior.
翻译:近年来,随着自主系统的发展以及多无人平台执行多样化任务的迫切需求,无人飞行器编队控制问题引起了广泛关注。本文研究移动机器人的行为基编队控制问题,通过引入安全多智能体强化学习(MARL)方法,在训练和执行过程中消除所有碰撞以保障机器人安全。为确保安全性,我们部署了分布式模型预测控制安全滤波器,用于覆盖不安全动作。研究重点在于实现无个体参考目标的行为基编队控制,转而采用编队质心作为目标。这种表述方式有利于在真实机器人上部署编队控制,并提高方法对更多机器人的可扩展性。由于缺乏机器人个体参考目标及机器人间相对位置信息,该任务无法通过基于优化的控制器直接求解,因此我们采用MARL对机器人进行训练。此外,为考虑智能体间的交互作用,我们采用基于注意力机制的批评网络来优化训练过程。在仿真环境中完成智能体训练后,我们在真实Turtlebot机器人上验证了方法的实际效果。实验证明,即便智能体仅拥有有限信息,仍能安全实现预期行为。