Cooperation is fundamental in Multi-Agent Systems (MAS) and Multi-Agent Reinforcement Learning (MARL), often requiring agents to balance individual gains with collective rewards. In this regard, this paper aims to investigate strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma, where agents must optimize both individual and group outcomes. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. Modifications are proposed where encouraging group rewards will also result in a higher individual gain, addressing real-world dilemmas seen in distributed systems. The study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. Leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.
翻译:合作是多智能体系统(MAS)与多智能体强化学习(MARL)中的基础问题,通常要求智能体在个体收益与集体奖励之间取得平衡。为此,本文旨在研究如何在博弈论场景(即重复囚徒困境)中激发合作策略,使智能体同时优化个体与群体产出。通过分析现有合作策略在重复博弈中促进群体导向行为的有效性,本文提出了改进方案:在鼓励群体奖励的同时实现更高的个体收益,从而应对分布式系统中的现实困境。研究进一步扩展到智能体数量呈指数级增长($N \longrightarrow +\infty$)的场景——此类场景中传统计算方法与均衡判定面临挑战。基于平均场博弈理论,本文为无限大规模智能体系统在重复博弈中建立了均衡解与奖励结构。最后,通过采用多智能体后验信用分配训练器进行仿真,本文提供了实践洞见,并探索了调整仿真算法以构建促进群体奖励合作场景的方法。这些实践应用架起了理论概念与现实世界应用之间的桥梁。