Despite their potential in real-world applications, multi-agent reinforcement learning (MARL) algorithms often suffer from high sample complexity. To address this issue, we present a novel model-based MARL algorithm, BiLL (Bi-Level Latent Variable Model-based Learning), that learns a bi-level latent variable model from high-dimensional inputs. At the top level, the model learns latent representations of the global state, which encode global information relevant to behavior learning. At the bottom level, it learns latent representations for each agent, given the global latent representations from the top level. The model generates latent trajectories to use for policy learning. We evaluate our algorithm on complex multi-agent tasks in the challenging SMAC and Flatland environments. Our algorithm outperforms state-of-the-art model-free and model-based baselines in sample efficiency, including on two extremely challenging Super Hard SMAC maps.
翻译:尽管多智能体强化学习(MARL)算法在实际应用中潜力巨大,但其常受限于样本复杂度高的问题。为解决该问题,我们提出一种新颖的基于模型的MARL算法——BiLL(双层潜变量模型学习),该算法可从高维输入中学习双层潜变量模型。在顶层,模型学习全局状态的潜变量表征,编码与行为学习相关的全局信息;在底层,模型基于顶层全局潜变量表征为每个智能体学习其潜变量表征。该模型生成潜变量轨迹用于策略学习。我们在具有挑战性的SMAC和Flatland环境中的复杂多智能体任务上评估了该算法。结果表明,在样本效率方面,我们的算法优于当前最先进的无模型及基于模型的基线方法,包括在两个极具挑战性的超级困难SMAC地图上的表现。