Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: Dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.
翻译:现代强化学习(RL)算法能够在多种任务中超越人类表现。多智能体强化学习(MARL)场景带来了额外挑战,在混合动机的智能体群体中实现成功合作,取决于个体目标与群体目标之间的微妙平衡。受人类制度启发的社会习俗与规范常被用作实现这一平衡的工具。本文研究了一种在动物与人类社会中支撑合作的基本且经过深入探究的社会习俗——优势等级制度。我们将优势等级制度的行为理论适配至人工智能体,在尽可能少作修改的前提下沿用已有的术语与定义。我们证明,无需显式编程或内在奖励,RL智能体群体能够自发创造、学习、执行并将优势等级制度传递给新群体。涌现出的优势等级制度与在鸡、鼠、鱼及其他物种中观察到结构具有相似性。