Measuring the contribution of individual agents is challenging in cooperative multi-agent reinforcement learning (MARL). In cooperative MARL, team performance is typically inferred from a single shared global reward. Arguably, among the best current approaches to effectively measure individual agent contributions is to use Shapley values. However, calculating these values is expensive as the computational complexity grows exponentially with respect to the number of agents. In this paper, we adapt difference rewards into an efficient method for quantifying the contribution of individual agents, referred to as Agent Importance, offering a linear computational complexity relative to the number of agents. We show empirically that the computed values are strongly correlated with the true Shapley values, as well as the true underlying individual agent rewards, used as the ground truth in environments where these are available. We demonstrate how Agent Importance can be used to help study MARL systems by diagnosing algorithmic failures discovered in prior MARL benchmarking work. Our analysis illustrates Agent Importance as a valuable explainability component for future MARL benchmarks.
翻译:在合作式多智能体强化学习(MARL)中,衡量个体智能体的贡献是一项挑战。合作式MARL中,团队性能通常通过单一的全局共享奖励来推断。目前,有效度量个体智能体贡献的最佳方法之一当属Shapley值。然而,计算这些值的成本高昂,因为其计算复杂度随智能体数量呈指数级增长。本文中,我们将差分奖励方法改进为一种量化个体智能体贡献的高效方法,称之为“代理重要性”,其计算复杂度与智能体数量呈线性关系。实验证明,计算出的值与真实Shapley值高度相关,并在可获得真实值(即用作基准真相的个体智能体奖励)的环境中,与个体智能体的真实潜在奖励值高度一致。我们展示了如何利用代理重要性来辅助研究MARL系统,例如诊断此前MARL基准测试工作中发现的算法故障。我们的分析表明,代理重要性有望成为未来MARL基准测试中重要的可解释性工具。