Researchers have integrated exploration techniques into multi-agent reinforcement learning (MARL) algorithms, drawing on their remarkable success in deep reinforcement learning. Nonetheless, exploration in MARL presents a more substantial challenge, as agents need to coordinate their efforts in order to achieve comprehensive state coverage. Reaching a unanimous agreement on which kinds of states warrant exploring can be a struggle for agents in this context. We introduce \textbf{M}ulti-agent \textbf{E}xploration based on \textbf{S}ub-state \textbf{E}ntropy (MESE) to address this limitation. This novel approach incentivizes agents to explore states cooperatively by directing them to achieve consensus via an extra team reward. Calculating the additional reward is based on the novelty of the current sub-state that merits cooperative exploration. MESE employs a conditioned entropy approach to select the sub-state, using particle-based entropy estimation to calculate the entropy. MESE is a plug-and-play module that can be seamlessly integrated into most existing MARL algorithms, which makes it a highly effective tool for reinforcement learning. Our experiments demonstrate that MESE can substantially improve the MAPPO's performance on various tasks in the StarCraft multi-agent challenge (SMAC).
翻译:研究人员借鉴深度强化学习中探索技术的显著成功,将其整合到多智能体强化学习算法中。然而,多智能体强化学习中的探索构成了更大挑战,因为智能体需要协调努力以实现全面的状态覆盖。在此背景下,智能体难以就哪些状态值得探索达成一致意见。我们提出基于子状态熵的多智能体探索方法(MESE),以解决这一局限性。这一新颖方法通过引导智能体达成共识(以额外团队奖励为激励),促使智能体协作探索状态。额外奖励的计算基于值得协作探索的当前子状态的新颖性。MESE采用条件熵方法选择子状态,并利用基于粒子的熵估计来计算熵值。MESE是一种即插即用模块,可无缝集成到现有大多数多智能体强化学习算法中,使其成为强化学习的有效工具。实验表明,MESE能显著提升MAPPO在星际争霸多智能体挑战(SMAC)各项任务中的表现。