Learning to collaborate has witnessed significant progress in multi-agent reinforcement learning (MARL). However, promoting coordination among agents and enhancing exploration capabilities remain challenges. In multi-agent environments, interactions between agents are limited in specific situations. Effective collaboration between agents thus requires a nuanced understanding of when and how agents' actions influence others. To this end, in this paper, we propose a novel MARL algorithm named Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning (SCIC), which incorporates a novel Intrinsic reward mechanism based on a new cooperation criterion measured by situation-dependent causal influence among agents. Our approach aims to detect inter-agent causal influences in specific situations based on the criterion using causal intervention and conditional mutual information. This effectively assists agents in exploring states that can positively impact other agents, thus promoting cooperation between agents. The resulting update links coordinated exploration and intrinsic reward distribution, which enhance overall collaboration and performance. Experimental results on various MARL benchmarks demonstrate the superiority of our method compared to state-of-the-art approaches.
翻译:在多智能体强化学习领域,智能体协作能力的提升已取得显著进展,但如何促进智能体间的协调与增强探索能力仍是核心挑战。在多智能体环境中,智能体间的交互具有特定情境局限性。有效的智能体协作需要精准理解智能体行为何时、以何种方式影响其他智能体。为此,本文提出一种新型多智能体强化学习算法——基于情境因果影响的协作型多智能体强化学习(SCIC),该算法引入基于新协作准则的内在奖励机制,通过智能体间的情境依赖性因果影响进行度量。我们的方法基于因果干预与条件互信息准则,检测特定情境下智能体间的因果影响,从而有效辅助智能体探索能对其他智能体产生正向影响的状态空间,促进智能体间的协作。由此形成的更新机制将协调性探索与内在奖励分配相耦合,显著提升整体协作效率与性能表现。在多种多智能体强化学习基准测试上的实验结果表明,本方法相较于现有最优算法具有显著优越性。