Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.
翻译:近年来,深度多智能体强化学习(MARL)因其在各类合作型多智能体任务中的成功而备受关注。然而,由于智能体部分可观测性以及探索空间随智能体数量指数级增长的问题,探索依然是MARL中的挑战性难题。首先,为解决探索空间的可扩展性问题,我们在探索空间上定义了基于编队的等价关系,旨在仅探索不同编队中的有意义状态以缩小搜索空间。随后,我们提出了一种新颖的编队感知探索(FoX)框架,通过引导部分可观测智能体仅基于自身观测充分感知当前编队,从而激励其访问多样化编队下的状态。数值结果表明,所提出的FoX框架在Google Research Football(GRF)和稀疏星际争霸II多智能体挑战(SMAC)任务中显著优于当前最先进的MARL算法。