The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private information. This results in agents that forgo their individual goals in favour of social good, which can potentially be exploited by selfish defectors. We argue that cooperation also requires agents' identities and boundaries to be respected by making sure that the emergent behaviour is an equilibrium, i.e., a convention that no agent can deviate from and receive higher individual payoffs. Inspired by advances in mechanism design, we propose to solve the problem of cooperation, defined as finding socially beneficial equilibrium, by using mediators. A mediator is a benevolent entity that may act on behalf of agents, but only for the agents that agree to it. We show how a mediator can be trained alongside agents with policy gradient to maximize social welfare subject to constraints that encourage agents to cooperate through the mediator. Our experiments in matrix and iterative games highlight the potential power of applying mediators in MARL.
翻译:多智能体强化学习(MARL)文献中的主流观点将混合环境中自利智能体的合作等同于社会福利最大化问题,允许智能体任意共享奖励与私有信息。这导致智能体为追求社会利益而放弃自身目标,可能被自私的背叛者利用。我们认为,合作还要求尊重智能体的身份与边界,确保涌现行为构成均衡——即任何智能体都无法通过偏离该行为准则获得更高个体收益的惯例。受机制设计领域进展的启发,我们提出通过使用中介者来解决合作问题(定义为寻找对社会有益的均衡)。中介者是一种仁慈的实体,可代表智能体采取行动,但仅针对同意接受其服务的智能体。我们展示了如何通过策略梯度法训练中介者与智能体,在最大化社会福利的同时施加约束条件,鼓励智能体通过中介者进行合作。在矩阵博弈与迭代博弈中的实验凸显了中介者在MARL中应用的潜力。