The field of cooperative multi-agent reinforcement learning (MARL) has seen widespread use in addressing complex coordination tasks. While value decomposition methods in MARL have been popular, they have limitations in solving tasks with non-monotonic returns, restricting their general application. Our work highlights the significance of joint intentions in cooperation, which can overcome non-monotonic problems and increase the interpretability of the learning process. To this end, we present a novel MARL method that leverages learnable joint intentions. Our method employs a hierarchical framework consisting of a joint intention policy and a behavior policy to formulate the optimal cooperative policy. The joint intentions are autonomously learned in a latent space through unsupervised learning and enable the method adaptable to different agent configurations. Our results demonstrate significant performance improvements in both the StarCraft micromanagement benchmark and challenging MAgent domains, showcasing the effectiveness of our method in learning meaningful joint intentions.
翻译:协作型多智能体强化学习(MARL)领域在处理复杂协调任务方面已得到广泛应用。尽管MARL中的值分解方法备受青睐,但它们在解决非单调回报任务时存在局限性,从而限制了其通用性。本研究强调了联合意图在协作中的重要性,它能够克服非单调问题并提升学习过程的可解释性。为此,我们提出了一种利用可学习联合意图的新型MARL方法。该方法采用包含联合意图策略与行为策略的分层框架来制定最优协作策略。联合意图通过无监督学习在潜在空间中自主习得,使该方法能够适应不同的智能体配置。我们的实验结果表明,该方法在星际争霸微操基准测试和具有挑战性的MAgent领域中均实现了显著的性能提升,展示了其学习有意义联合意图的有效性。