Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to integrate with various offline MARL methods seamlessly. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.
翻译:离线多智能体强化学习(MARL)在在线交互不可行或存在风险的场景中极具价值。尽管MARL中的独立学习具有灵活性和可扩展性,但在离线设置中,由于无法与环境交互,准确分配各智能体的信用面临挑战。本文提出一种新框架——多智能体因果信用分配(MACCA),以解决离线MARL中的信用分配问题。我们的方法MACCA将生成过程建模为动态贝叶斯网络,捕捉环境变量、状态、动作与奖励之间的关系。通过在离线数据上估计该模型,MACCA能够通过分析各智能体个体奖励的因果关系来学习其贡献,从而确保信用分配的准确性和可解释性。此外,该方法的模块化设计使其能够无缝集成到多种离线MARL方法中。理论上,我们证明了在离线数据集设置下,智能体个体奖励的生成函数及其潜在因果结构是可辨识的,这为模型的正确性奠定了基础。实验表明,MACCA不仅优于现有最先进方法,还能在集成至其他基线模型时提升其性能。