Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.
翻译:现代强化学习算法能够在各种任务中超越人类表现。多智能体强化学习环境带来了额外挑战,在混合动机的智能体群体中实现成功合作需在个体目标与群体目标之间取得微妙平衡。通常受人类制度启发的社会习俗与规范被用作实现这种平衡的工具。本文研究一个基础且被广泛研究的社会习俗——优势等级体系,它支撑着动物和人类社会中的合作行为。我们将优势等级体系的动物行为学理论适配至人工智能体,尽可能少地修改现有术语和定义。研究表明,未经显式编程或内在奖励的强化学习智能体群体,能够自主创造、学习、执行并将优势等级体系传播至新群体。涌现出的优势等级体系与鸡、老鼠、鱼及其他物种中研究的等级结构具有相似性。