The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding about how agents develop neighbour selection behaviours and the formation of strategic assortment within an explicit interaction structure. To address this, our study introduces a computational framework based on multi-agent reinforcement learning in the spatial Prisoner's Dilemma game. This framework allows agents to select dilemma strategies and interacting neighbours based on their long-term experiences, differing from existing research that relies on preset social norms or external incentives. By modelling each agent using two distinct Q-networks, we disentangle the coevolutionary dynamics between cooperation and interaction. The results indicate that long-term experience enables agents to develop the ability to identify non-cooperative neighbours and exhibit a preference for interaction with cooperative ones. This emergent self-organizing behaviour leads to the clustering of agents with similar strategies, thereby increasing network reciprocity and enhancing group cooperation.
翻译:社会困境中网络结构对促进群体合作的重要性已得到广泛认可。既有研究将这种促进作用归因于空间互动驱动的策略聚类。尽管已有研究利用强化学习探究动态交互对合作演化的影响,但关于智能体如何在显式交互结构中发展邻居选择行为及形成策略聚类的机制仍缺乏深入理解。为填补这一空白,本研究提出了一种基于多智能体强化学习的计算框架,应用于空间囚徒困境博弈。该框架允许智能体依据长期经验选择困境策略与交互邻居,区别于依赖预设社会规范或外部激励的现有研究。通过为每个智能体构建两个独立的Q网络,我们解耦了合作与交互的协同演化动力学。结果表明,长期经验使智能体能够发展识别非合作邻居的能力,并表现出偏好与合作邻居交互的倾向。这种涌现的自组织行为促进了策略相似智能体的聚类,从而增强网络互惠性并提升群体合作水平。