Cooperation is fundamental in Multi-Agent Systems (MAS) and Multi-Agent Reinforcement Learning (MARL), often requiring agents to balance individual gains with collective rewards. In this regard, this paper aims to investigate strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma, where agents must optimize both individual and group outcomes. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. Modifications are proposed where encouraging group rewards will also result in a higher individual gain, addressing real-world dilemmas seen in distributed systems. The study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. Leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.
翻译:合作是多智能体系统和多智能体强化学习的基础,常常要求智能体在个体收益与集体回报之间寻求平衡。本文旨在探究博弈论场景(即迭代囚徒困境)中激发合作行为的策略,在此类场景中智能体需同时优化个体与群体结果。研究分析了现有合作策略在重复博弈中促进群体导向行为的有效性,并提出改进方案:通过鼓励群体回报来实现更高的个体收益,从而解决分布式系统中存在的现实困境。研究进一步拓展至智能体数量呈指数级增长($N \longrightarrow +\infty$)的场景,此时传统计算与均衡求解面临挑战。借助平均场博弈理论,研究为重复博弈中无限大规模智能体集合建立了均衡解与奖励结构。最后,通过基于多智能体事后信用分配训练器的仿真实验提供实践启示,并探讨了如何调整仿真算法以构建有利于群体奖励合作场景的方法。这些实践应用架起了理论概念与现实世界应用的桥梁。