Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.
翻译:现代强化学习算法能够在多种任务中超越人类表现。多智能体强化学习场景带来了额外挑战,而混合动机智能体群组的成功合作依赖于个体目标与群体目标之间的微妙平衡。通常受人类制度启发的社会惯例和规范,被用作实现这种平衡的工具。本文研究了一种在动物和人类社会合作中均起基础作用的、经过充分研究的社会惯例:支配等级。我们将支配等级的动物行为学理论适配至人工智能体,尽可能少地修改现有术语和定义,沿用其规范框架。我们证明,未经显式编程或内在奖励的强化学习智能体群体,能够自发创造、学习、强制执行并将支配等级传递至新群体。这些涌现的支配等级结构与在鸡、鼠、鱼及其他物种中研究的结构具有相似性。