Cooperation is fundamental in Multi-Agent Systems (MAS) and Multi-Agent Reinforcement Learning (MARL), often requiring agents to balance individual gains with collective rewards. In this regard, this paper aims to investigate strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma, where agents must optimize both individual and group outcomes. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. Modifications are proposed where encouraging group rewards will also result in a higher individual gain, addressing real-world dilemmas seen in distributed systems. The study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. Leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.
翻译:合作是多智能体系统(MAS)与多智能体强化学习(MARL)中的核心问题,通常要求智能体在个体收益与集体奖励之间取得平衡。基于此,本文旨在研究博弈论场景(即迭代囚徒困境)中激发合作行为的策略,在该场景中智能体需同时优化个体与群体目标。本文分析了现有合作策略在重复博弈中促进群体导向行为的有效性,并提出改进方案——通过强化群体奖励机制实现个体收益提升,为解决分布式系统中的现实困境提供参考。研究进一步扩展至智能体数量呈指数增长($N \longrightarrow +\infty$)的场景,传统计算方法与均衡确定在此类场景中面临挑战。借助均场博弈理论,本文为无限大规模智能体集合在重复博弈中的均衡解与奖励结构建立了理论框架。最终,通过基于多智能体后验信用分配训练器的仿真实验提供实践启示,并探讨如何调整仿真算法以构建有利于群体奖励导向合作的新场景。这些实践应用在理论概念与现实技术之间架起了桥梁。