Among the research topics in multi-agent learning, mixed-motive cooperation is one of the most prominent challenges, primarily due to the mismatch between individual and collective goals. The cutting-edge research is focused on incorporating domain knowledge into rewards and introducing additional mechanisms to incentivize cooperation. However, these approaches often face shortcomings such as the effort on manual design and the absence of theoretical groundings. To close this gap, we model the mixed-motive game as a differentiable game for the ease of illuminating the learning dynamics towards cooperation. More detailed, we introduce a novel optimization method named \textbf{\textit{A}}ltruistic \textbf{\textit{G}}radient \textbf{\textit{A}}djustment (\textbf{\textit{AgA}}) that employs gradient adjustments to progressively align individual and collective objectives. Furthermore, we theoretically prove that AgA effectively attracts gradients to stable fixed points of the collective objective while considering individual interests, and we validate these claims with empirical evidence. We evaluate the effectiveness of our algorithm AgA through benchmark environments for testing mixed-motive collaboration with small-scale agents such as the two-player public good game and the sequential social dilemma games, Cleanup and Harvest, as well as our self-developed large-scale environment in the game StarCraft II.
翻译:在多智能体学习的研究课题中,混合动机合作是最突出的挑战之一,这主要源于个体目标与集体目标之间的不匹配。当前的前沿研究集中于将领域知识融入奖励函数,并引入额外机制以激励合作。然而,这些方法常面临诸如依赖人工设计、缺乏理论依据等不足。为弥补这一差距,我们将混合动机博弈建模为可微分博弈,以便于阐明趋向合作的学习动态。具体而言,我们提出了一种名为**利他梯度调整**(**AgA**)的新型优化方法,该方法通过梯度调整逐步对齐个体与集体目标。此外,我们从理论上证明了AgA在考虑个体利益的同时,能有效将梯度吸引至集体目标的稳定不动点,并通过实证证据验证了这一论断。我们通过在基准环境中测试AgA算法的有效性来评估其性能,这些环境包括用于测试小规模智能体混合动机协作的双人公共物品博弈、序列社会困境博弈(Cleanup与Harvest),以及我们在《星际争霸II》游戏中自行开发的大规模环境。