In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
翻译:本文提出了一种名为“解耦世界模型的价值分解框架”的新型基于模型的多智能体强化学习方法,以解决多个智能体在同一环境中交互以实现共同目标时样本复杂度降低的挑战。针对多智能体系统带来的可扩展性和非平稳性问题,无模型方法依赖大量样本进行训练。相比之下,我们采用由动作条件分支、无动作分支和静态分支组成的模块化世界模型,来解析环境动态并基于过去经验生成想象结果,而无需直接从真实环境采样。我们使用变分自编码器和变分图自编码器来学习世界模型的潜在表征,并将其与基于价值的框架融合,以预测联合动作价值函数并优化整体训练目标。我们在星际争霸II的简单、困难和超困难微操挑战中展示了实验结果,证明我们的方法实现了高样本效率,并在击败敌方军队方面表现出优于其他基准模型的性能。