Integrating rule-based policies into reinforcement learning promises to improve data efficiency and generalization in cooperative pursuit problems. However, most implementations do not properly distinguish the influence of neighboring robots in observation embedding or inter-robot interaction rules, leading to information loss and inefficient cooperation. This paper proposes a cooperative pursuit algorithm named Decentralized Adaptive COOperative Pursuit via Attention (DACOOP-A) by empowering reinforcement learning with artificial potential field and attention mechanisms. An attention-based framework is developed to emphasize important neighbors by concurrently integrating the learned attention scores into observation embedding and inter-robot interaction rules. A KL divergence regularization is introduced to alleviate the resultant learning stability issue. Improvements in data efficiency and generalization are demonstrated through numerical simulations. Extensive quantitative analysis and ablation studies are performed to illustrate the advantages of the proposed modules. Real-world experiments are performed to justify the feasibility of deploying DACOOP-A in physical systems.
翻译:摘要:将基于规则的策略融入强化学习有望提升协同追捕问题的数据效率与泛化能力。然而,多数实现未能合理区分观测嵌入或机器人间交互规则中邻近机器人的影响,导致信息丢失与协作效率低下。本文提出一种名为"基于注意力的自适应分布式协同追捕算法"(DACOOP-A)的协同追捕算法,通过将强化学习与人工势场及注意力机制相结合,构建了基于注意力的框架,通过将学习得到的注意力得分同时融入观测嵌入与机器人间交互规则来强化重要邻居节点。引入KL散度正则化以缓解由此引发的学习稳定性问题。数值仿真验证了该算法在数据效率与泛化能力方面的提升,并通过大量定量分析与消融实验论证各模块优势。最后通过物理系统实际部署实验验证了DACOOP-A的可行性。