Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.
翻译:现代强化学习算法能够在多种任务中超越人类表现。多智能体强化学习环境带来了额外挑战,而混合动机群体中的成功合作依赖于个体与群体目标之间的微妙平衡。受人类制度启发的社会惯例和规范常被用作实现这种平衡的工具。本文研究了一种在动物和人类社会中共有的、经过充分研究且构成合作基础的社会惯例:优势等级制度。我们将优势等级制度的动物行为学理论适配至人工智能体,在尽可能少修改的情况下沿用了既有的术语和定义。研究表明,未经显式编程或内在奖励机制的强化学习代理群体,能够自主创造、学习、执行并向新群体传递优势等级制度。这些涌现出的等级结构与鸡、小鼠、鱼类及其他物种中研究的等级结构具有相似性。