Value decomposition is widely used in cooperative multi-agent reinforcement learning, however, its implicit credit assignment mechanism is not yet fully understood due to black-box networks. In this work, we study an interpretable value decomposition framework via the family of generalized additive models. We present a novel method, named Neural Attention Additive Q-learning (N$\text{A}^\text{2}$Q), providing inherent intelligibility of collaboration behavior. N$\text{A}^\text{2}$Q can explicitly factorize the optimal joint policy induced by enriching shape functions to model all possible coalitions of agents into individual policies. Moreover, we construct identity semantics to promote estimating credits together with the global state and individual value functions, where local semantic masks help us diagnose whether each agent captures relevant-task information. Extensive experiments show that N$\text{A}^\text{2}$Q consistently achieves superior performance compared to different state-of-the-art methods on all challenging tasks, while yielding human-like interpretability.
翻译:值分解被广泛应用于合作式多智能体强化学习,然而由于其黑箱网络特性,隐式的信用分配机制尚未被完全理解。本文通过广义加性模型家族研究可解释的值分解框架。我们提出一种名为神经注意力加性Q学习(N$\text{A}^\text{2}$Q)的新方法,该方法能够提供协作行为的内在可解释性。N$\text{A}^\text{2}$Q可通过丰富的形状函数显式分解最优联合策略——该策略通过建模所有可能的智能体联盟诱导得到——并将其转化为个体策略。此外,我们构建身份语义以促进全局状态与个体价值函数的联合信用评估,其中局部语义掩码有助于诊断每个智能体是否捕获了相关任务信息。大量实验表明,在所有具有挑战性的任务中,N$\text{A}^\text{2}$Q相较于不同最先进方法始终展现出更优越的性能,同时具备类人可解释性。