Autotelic Reinforcement Learning in Multi-Agent Environments

In the intrinsically motivated skills acquisition problem, the agent is set in an environment without any pre-defined goals and needs to acquire an open-ended repertoire of skills. To do so the agent needs to be autotelic (deriving from the Greek auto (self) and telos (end goal)): it needs to generate goals and learn to achieve them following its own intrinsic motivation rather than external supervision. Autotelic agents have so far been considered in isolation. But many applications of open-ended learning entail groups of agents. Multi-agent environments pose an additional challenge for autotelic agents: to discover and master goals that require cooperation agents must pursue them simultaneously, but they have low chances of doing so if they sample them independently. In this work, we propose a new learning paradigm for modeling such settings, the Decentralized Intrinsically Motivated Skills Acquisition Problem (Dec-IMSAP), and employ it to solve cooperative navigation tasks. First, we show that agents setting their goals independently fail to master the full diversity of goals. Then, we show that a sufficient condition for achieving this is to ensure that a group aligns its goals, i.e., the agents pursue the same cooperative goal. Our empirical analysis shows that alignment enables specialization, an efficient strategy for cooperation. Finally, we introduce the Goal-coordination game, a fully-decentralized emergent communication algorithm, where goal alignment emerges from the maximization of individual rewards in multi-goal cooperative environments and show that it is able to reach equal performance to a centralized training baseline that guarantees aligned goals. To our knowledge, this is the first contribution addressing the problem of intrinsically motivated multi-agent goal exploration in a decentralized training paradigm.

翻译：在内在动机驱动的技能获取问题中，智能体被置于一个无预定义目标的环境中，需要获取开放式技能库。为此，智能体需具备自驱性（源自希腊语auto（自我）和telos（终极目标））：即自主生成目标并通过内在动机（而非外部监督）学习实现这些目标。此前对自驱智能体的研究均基于孤立的个体场景，但开放学习的许多应用涉及多智能体群体。多智能体环境对自驱智能体构成额外挑战：要发现并掌握需要协作的目标，智能体必须同时追求这些目标，但若各自独立采样目标，协作概率将极低。本文针对此类场景提出新的学习范式——分布式内在动机技能获取问题（Decentralized Intrinsically Motivated Skills Acquisition Problem, Dec-IMSAP），并将其应用于协作导航任务。首先，我们发现独立设定目标的智能体无法掌握全部目标的多样性；继而证明实现这一目标的充分条件是确保群体目标对齐（即所有智能体追求同一协作目标）。实证分析表明，目标对齐能促进专业化分工，这是实现协作的有效策略。最后，我们提出基于完全分布式涌现通信的"目标协调博弈"算法，该算法通过最大化多目标协作环境中的个体奖励实现目标对齐，且其性能可媲美保证目标对齐的集中式训练基线。据我们所知，这是首个在分布式训练范式下解决内在动机多智能体目标探索问题的研究成果。