Value decomposition is widely used in cooperative multi-agent reinforcement learning, however, its implicit credit assignment mechanism is not yet fully understood due to black-box networks. In this work, we study an interpretable value decomposition framework via the family of generalized additive models. We present a novel method, named Neural Attention Additive Q-learning~(N$\text{A}^\text{2}$Q), providing inherent intelligibility of collaboration behavior. N$\text{A}^\text{2}$Q can explicitly factorize the optimal joint policy induced by enriching shape functions to model all possible coalitions of agents into individual policies. Moreover, we construct identity semantics to promote estimating credits together with the global state and individual value functions, where local semantic masks help us diagnose whether each agent captures relevant-task information. Extensive experiments show that N$\text{A}^\text{2}$Q consistently achieves superior performance compared to different state-of-the-art methods on all challenging tasks, while yielding human-like interpretability.
翻译:价值分解方法在合作式多智能体强化学习中广泛应用,然而由于黑箱网络的存在,其隐式信用分配机制尚未完全明晰。本文通过广义加性模型家族研究了一种可解释的价值分解框架。我们提出名为神经注意力加性Q学习(N$\text{A}^\text{2}$Q)的新方法,该方法能提供协作行为的内在可解释性。N$\text{A}^\text{2}$Q通过丰富形状函数对智能体所有可能联盟进行建模,从而将最优联合策略显式分解为独立策略。此外,我们构建身份语义以促进联合全局状态与个体价值函数的信用估计,其中局部语义掩码有助于诊断每个智能体是否捕获到任务相关信息。大量实验表明,在具有挑战性的任务中,N$\text{A}^\text{2}$Q相较于不同前沿方法始终取得更优性能,同时具备类人可解释性。