Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.
翻译:现代强化学习算法能够在多种任务中超越人类表现。多智能体强化学习环境带来了额外挑战,混合动机智能体群体中的成功合作需要在个体目标与群体目标之间实现微妙平衡。通常借鉴人类制度的社会惯例与规范,被用作达成这种平衡的工具。本文研究了一个在动物和人类社会合作中普遍存在且经过深入探讨的基本社会惯例:支配等级体系。我们将动物行为学中的支配等级理论适配至人工智能体,尽可能沿用既有术语和定义而不作修改。我们证明,在没有显式编程或内在奖励的情况下,强化学习智能体群体能够创造、学习、执行支配等级体系,并将其传递给新群体。所涌现的支配等级体系在结构上与鸡、小鼠、鱼类等物种中研究的等级体系具有相似性。