In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caused the rewards. As a consequence, we observe that the emergence of cooperative behaviours is not necessarily a byproduct of being able to solve a task as a team. In this paper, we investigate the applications of causality in MARL and how it can be applied in MARL to penalise these lazy agents. We observe that causality estimations can be used to improve the credit assignment to the agents and show how it can be leveraged to improve independent learning in MARL. Furthermore, we investigate how Amortized Causal Discovery can be used to automate causality detection within MARL environments. The results demonstrate that causality relations between individual observations and the team reward can be used to detect and punish lazy agents, making them develop more intelligent behaviours. This results in improvements not only in the overall performances of the team but also in their individual capabilities. In addition, results show that Amortized Causal Discovery can be used efficiently to find causal relations in MARL.
翻译:在协作性多智能体强化学习(MARL)中,智能体需要作为团队学习行为以实现共同目标。然而,在学习任务时,部分智能体可能最终习得次优策略,未能为团队目标做出贡献。此类智能体由于不理解自身是否引发了奖励而产生的非协作行为,被称为"懒惰智能体"。因此,我们观察到协作行为的涌现并不必然等同于团队解决任务的能力。本文探讨因果关系在MARL中的应用,以及如何利用其惩罚这些懒惰智能体。我们发现,因果估计可用于改善智能体的信用分配,并展示其如何提升MARL中的独立学习效果。此外,我们研究了摊销因果发现如何实现MARL环境中因果关系的自动化检测。结果表明,个体观测与团队奖励之间的因果关系可用于检测并惩罚懒惰智能体,促使它们发展更智能的行为。这不仅提升了团队的整体性能,还增强了智能体的个体能力。同时,实验结果证明,摊销因果发现可高效地发现MARL中的因果关系。