Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only the joint value function into agent-wise value functions for decentralized execution, but also the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on both single-task and multi-task benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.
翻译:深度协作多智能体强化学习在广泛复杂控制任务中展现出卓越成效。然而,当前多智能体学习研究主要聚焦价值分解,实体交互仍处于耦合状态,易导致对实体间噪声交互的过拟合。本文提出一种新颖的交互模式解耦方法(OPT),不仅将联合价值函数分解为可独立执行的智能体价值函数,更将实体交互解构为交互原型——每个原型代表实体子群中的潜在交互模式。OPT能够滤除无关实体间的噪声交互,显著提升泛化能力与可解释性。具体而言,OPT引入稀疏分歧机制以促进所发现交互原型的稀疏性与多样性,并通过可学习权重的聚合器选择性重组原型为紧凑交互模式。针对部分可观测性导致的训练不稳定问题,我们提出最大化聚合权重与各智能体历史行为间的互信息。单任务与多任务基准实验表明,该方法显著优于现有先进方案。代码已开源至https://github.com/liushunyu/OPT。