In the field of multi-agent learning, the challenge of mixed-motive cooperation is pronounced, given the inherent contradictions between individual and collective goals. Current research in this domain primarily focuses on incorporating domain knowledge into rewards or introducing additional mechanisms to foster cooperation. However, many of these methods suffer from the drawbacks of manual design costs and the lack of a theoretical grounding convergence procedure to the solution. To address this gap, we approach the mixed-motive game by modeling it as a differentiable game to study learning dynamics. We introduce a novel optimization method named Altruistic Gradient Adjustment (AgA) that employs gradient adjustments to novelly align individual and collective objectives. Furthermore, we provide theoretical proof that the selection of an appropriate alignment weight in AgA can accelerate convergence towards the desired solutions while effectively avoiding the undesired ones. The visualization of learning dynamics effectively demonstrates that AgA successfully achieves alignment between individual and collective objectives. Additionally, through evaluations conducted on established mixed-motive benchmarks such as the public good game, Cleanup, Harvest, and our modified mixed-motive SMAC environment, we validate AgA's capability to facilitate altruistic and fair collaboration.
翻译:在多智能体学习领域,混合动机合作挑战尤为突出,因为个体目标与集体目标之间存在固有矛盾。当前研究主要聚焦于将领域知识融入奖励函数或引入额外机制以促进合作。然而,许多方法存在人工设计成本高、缺乏理论收敛性保证的问题。为填补这一空白,我们将混合动机博弈建模为可微分博弈以研究学习动力学,并提出一种名为利他梯度调整(AgA)的新型优化方法。该方法通过梯度调整创新性地实现个体与集体目标的对齐。进一步地,我们提供了理论证明:在AgA中选取合适的对齐权重可加速收敛至期望解,同时有效规避非期望解。学习动力学的可视化结果清晰表明,AgA成功实现了个体与集体目标的对齐。此外,在公共物品博弈、Cleanup、Harvest等经典混合动机基准测试以及改进后的混合动机SMAC环境中的评估,验证了AgA促进利他与公平协作的能力。