Effective exploration is crucial to discovering optimal strategies for multi-agent reinforcement learning (MARL) in complex coordination tasks. Existing methods mainly utilize intrinsic rewards to enable committed exploration or use role-based learning for decomposing joint action spaces instead of directly conducting a collective search in the entire action-observation space. However, they often face challenges obtaining specific joint action sequences to reach successful states in long-horizon tasks. To address this limitation, we propose Imagine, Initialize, and Explore (IIE), a novel method that offers a promising solution for efficient multi-agent exploration in complex scenarios. IIE employs a transformer model to imagine how the agents reach a critical state that can influence each other's transition functions. Then, we initialize the environment at this state using a simulator before the exploration phase. We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively. The prompt consists of timestep-to-go, return-to-go, influence value, and one-shot demonstration, specifying the desired state and trajectory as well as guiding the action generation. By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important under-explored regions. Despite its simplicity, empirical results demonstrate that our method outperforms multi-agent exploration baselines on the StarCraft Multi-Agent Challenge (SMAC) and SMACv2 environments. Particularly, IIE shows improved performance in the sparse-reward SMAC tasks and produces more effective curricula over the initialized states than other generative methods, such as CVAE-GAN and diffusion models.
翻译:高效探索对于多智能体强化学习在复杂协作任务中发现最优策略至关重要。现有方法主要利用内在奖励实现定向探索,或通过基于角色的学习分解联合动作空间,而非直接在整个动作-观察空间中进行集体搜索。然而,这些方法在长期任务中获取特定联合动作序列以到达成功状态时往往面临挑战。为解决此局限,我们提出"想象、初始化与探索"(IIE)方法,为复杂场景下的高效多智能体探索提供创新方案。IIE采用Transformer模型想象智能体如何到达能相互影响转移函数的关键状态,随后在探索阶段前利用模拟器将该状态初始化到环境中。我们将想象过程建模为序列预测问题,通过自回归方式预测状态、观察、提示、动作和奖励。提示包含剩余时间步、剩余回报、影响值和单次演示,用于指定期望状态与轨迹,并引导动作生成。通过将智能体初始化为关键状态,IIE显著提升了发现潜在重要未探索区域的可能性。尽管方法简洁,实验结果表明,该方法在星际争霸多智能体挑战(SMAC)和SMACv2环境中均优于多智能体探索基线。特别地,IIE在稀疏奖励的SMAC任务中展现出更优性能,且相比CVAE-GAN和扩散模型等生成方法,能产生更有效的初始化状态课程。