In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration. However, agents can hardly accomplish the team task without coordination and they would be trapped in a local optimum where easy cooperation is accessed without enough individual exploration. Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space. To address this issue, we propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation. In SMMAE, we train an independent exploration policy for each agent to maximize their own visited state space. Each agent learns an adjustable exploration probability based on the stability of the joint team policy. The experiments on highly cooperative tasks in StarCraft II micromanagement benchmark (SMAC) demonstrate that SMMAE can explore task-related states more efficiently, accomplish coordinated behaviours and boost the learning performance.
翻译:在协作多智能体强化学习(CMARL)中,智能体在自我探索与团队协作之间取得平衡至关重要。然而,缺乏协调的智能体难以完成团队任务,且易陷入仅能实现简单协作而个体探索不足的局部最优解。现有研究主要聚焦于智能体的协同探索,但会导致状态空间探索量呈指数级增长。为解决该问题,我们提出自驱动多智能体探索(SMMAE)方法,旨在通过自适应地平衡自我探索与团队协作来实现团队任务的成功。在SMMAE中,我们为每个智能体训练独立的探索策略以最大化其已访问状态空间,并令各智能体基于联合团队策略的稳定性学习可调节的探索概率。在星际争霸II微观管理基准测试(SMAC)中高度协作任务上的实验表明,SMMAE能更高效地探索任务相关状态,实现协调行为并提升学习性能。