In multi-agent reinforcement learning (MARL), effective exploration is critical, especially in sparse reward environments. Although introducing global intrinsic rewards can foster exploration in such settings, it often complicates credit assignment among agents. To address this difficulty, we propose Individual Contributions as intrinsic Exploration Scaffolds (ICES), a novel approach to motivate exploration by assessing each agent's contribution from a global view. In particular, ICES constructs exploration scaffolds with Bayesian surprise, leveraging global transition information during centralized training. These scaffolds, used only in training, help to guide individual agents towards actions that significantly impact the global latent state transitions. Additionally, ICES separates exploration policies from exploitation policies, enabling the former to utilize privileged global information during training. Extensive experiments on cooperative benchmark tasks with sparse rewards, including Google Research Football (GRF) and StarCraft Multi-agent Challenge (SMAC), demonstrate that ICES exhibits superior exploration capabilities compared with baselines. The code is publicly available at https://github.com/LXXXXR/ICES.
翻译:在多智能体强化学习(MARL)中,有效的探索至关重要,尤其是在稀疏奖励环境中。尽管引入全局内在奖励可以促进此类环境中的探索,但这通常会使智能体间的信用分配复杂化。为解决这一难题,我们提出了一种新颖方法——个体贡献作为内在探索支架(ICES),该方法通过从全局视角评估每个智能体的贡献来激励探索。具体而言,ICES利用集中训练期间的全局转移信息,通过贝叶斯惊奇构建探索支架。这些仅在训练中使用的支架有助于引导个体智能体采取对全局潜在状态转移产生显著影响的行动。此外,ICES将探索策略与利用策略分离,使前者能够在训练期间利用特权全局信息。在包括谷歌研究足球(GRF)和星际争霸多智能体挑战(SMAC)在内的稀疏奖励合作基准任务上进行的大量实验表明,与基线方法相比,ICES展现出更优越的探索能力。代码公开于 https://github.com/LXXXXR/ICES。