Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.
翻译:多智能体系统需要适应性地应对动态环境、变化的智能体数量以及多样化任务。然而,由于状态空间与任务空间的复杂性,大多数多智能体系统难以处理这些情况。社会影响理论将复杂影响因素视为作用于智能体上的力——这些力来自环境、其他智能体以及智能体内在动机,并统称为社会力。受此概念启发,我们提出一种基于梯度的新型多智能体强化学习状态表征方法。为对多重社会力进行非平凡建模,我们进一步引入数据驱动方法,通过去噪得分匹配从离线样本中学习社会梯度场(SocialGFs),例如每种力产生的引力或斥力效应。在交互过程中,智能体基于多维梯度选择动作以最大化自身奖励。实践中,我们将SocialGFs集成到广泛使用的多智能体强化学习算法(如MAPPO)中。实验结果表明,SocialGFs为多智能体系统带来四方面优势:1)无需在线交互即可完成学习;2)可在多样化任务间迁移;3)在复杂奖励设定下促进信用分配;4)随智能体数量增加具有可扩展性。