Pursuit-evasion (PE) problem is a critical challenge in multi-robot systems (MRS). While reinforcement learning (RL) has shown its promise in addressing PE tasks, research has primarily focused on single-target pursuit, with limited exploration of multi-target encirclement, particularly in large-scale settings. This paper proposes a Transformer-Enhanced Reinforcement Learning (TERL) framework for large-scale multi-target encirclement. By integrating a transformer-based policy network with target selection, TERL enables robots to adaptively prioritize targets and safely coordinate robots. Results show that TERL outperforms existing RL-based methods in terms of encirclement success rate and task completion time, while maintaining good performance in large-scale scenarios. Notably, TERL, trained on small-scale scenarios (15 pursuers, 4 targets), generalizes effectively to large-scale settings (80 pursuers, 20 targets) without retraining, achieving a 100% success rate.
翻译:追逃问题是多机器人系统中的关键挑战。虽然强化学习在解决追逃任务方面已展现出潜力,但现有研究主要集中于单目标追击,对多目标围捕(特别是在大规模场景中)的探索仍十分有限。本文提出一种基于Transformer增强强化学习的TERL框架,用于解决大规模多目标围捕问题。通过将基于Transformer的策略网络与目标选择机制相结合,TERL使机器人能够自适应地确定目标优先级并实现安全协同。实验结果表明,TERL在围捕成功率和任务完成时间方面均优于现有基于强化学习的方法,同时在大规模场景中保持良好性能。值得注意的是,仅需在小规模场景(15个追击者、4个目标)训练的TERL模型,无需重新训练即可有效泛化至大规模场景(80个追击者、20个目标),并实现100%的成功率。