Emergent cooperation in classical Multi-Agent Reinforcement Learning has gained significant attention, particularly in the context of Sequential Social Dilemmas (SSDs). While classical reinforcement learning approaches have demonstrated capability for emergent cooperation, research on extending these methods to Quantum Multi-Agent Reinforcement Learning remains limited, particularly through communication. In this paper, we apply communication approaches to quantum Q-Learning agents: the Mutual Acknowledgment Token Exchange (MATE) protocol, its extension Mutually Endorsed Distributed Incentive Acknowledgment Token Exchange (MEDIATE), the peer rewarding mechanism Gifting, and Reinforced Inter-Agent Learning (RIAL). We evaluate these approaches in three SSDs: the Iterated Prisoner's Dilemma, Iterated Stag Hunt, and Iterated Game of Chicken. Our experimental results show that approaches using MATE with temporal-difference measure (MATE\textsubscript{TD}), AutoMATE, MEDIATE-I, and MEDIATE-S achieved high cooperation levels across all dilemmas, demonstrating that communication is a viable mechanism for fostering emergent cooperation in Quantum Multi-Agent Reinforcement Learning.
翻译:经典多智能体强化学习中的涌现合作行为已获得广泛关注,尤其在序列社会困境(SSDs)的背景下。尽管经典强化学习方法已展现出实现涌现合作的能力,但将这些方法扩展到量子多智能体强化学习的研究仍较为有限,特别是通过通信机制的研究。本文中,我们将多种通信方法应用于量子Q学习智能体:互认令牌交换(MATE)协议、其扩展版本互认分布式激励令牌交换(MEDIATE)、同伴奖励机制Gifting以及强化型智能体间学习(RIAL)。我们在三种序列社会困境中评估这些方法:迭代囚徒困境、迭代猎鹿博弈和迭代懦夫博弈。实验结果表明,使用时序差分度量的MATE(MATE\textsubscript{TD})、AutoMATE、MEDIATE-I和MEDIATE-S方法在所有困境中均实现了较高的合作水平,证明通信是促进量子多智能体强化学习中涌现合作行为的有效机制。