The primary focus of multi-agent reinforcement learning (MARL) has been to study interactions among a fixed number of agents embedded in an environment. However, in the real world, the number of agents is neither fixed nor known a priori. Moreover, an agent can decide to create other agents (for example, a cell may divide, or a company may spin off a division). In this paper, we propose a framework that allows agents to create other agents; we call this a fluid-agent environment. We present game-theoretic solution concepts for fluid-agent games and empirically evaluate the performance of several MARL algorithms within this framework. Our experiments include fluid variants of established benchmarks such as Predator-Prey and Level-Based Foraging, where agents can dynamically spawn, as well as a new environment we introduce that highlights how fluidity can unlock novel solution strategies beyond those observed in fixed-population settings. We demonstrate that this framework yields agent teams that adjust their size dynamically to match environmental demands.
翻译:多智能体强化学习(MARL)的主要研究焦点一直是探究嵌入在环境中固定数量智能体之间的交互。然而,在现实世界中,智能体的数量既非固定,也非先验已知。此外,一个智能体可以决定创建其他智能体(例如,细胞可能分裂,或公司可能剥离某个部门)。在本文中,我们提出了一个允许智能体创建其他智能体的框架;我们称之为流体智能体环境。我们提出了流体智能体博弈的博弈论解概念,并在此框架内实证评估了多种MARL算法的性能。我们的实验包括经典基准测试的流体变体,如捕食者-猎物和基于等级的觅食,其中智能体可以动态生成,以及我们引入的一个新环境,该环境突显了流动性如何能够解锁在固定群体设置中无法观察到的新颖解决策略。我们证明,该框架能够产生根据环境需求动态调整其规模的智能体团队。