Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.
翻译:近期研究证明,通过元强化学习在开放式任务分布上使用自对弈训练智能体,可以涌现出复杂的协作行为。尽管这些结果令人印象深刻,但我们认为自对弈和其他集中式训练技术并不能准确反映自然界中普遍集体探索策略的涌现方式——通过分布式训练和开放式任务分布实现。因此,本研究探究了集体探索策略的涌现机制,即多个智能体在开放式任务分布上元学习独立的循环策略。为此,我们引入了一个具有开放式程序化生成任务空间的新型环境,该环境动态地组合从五种不同任务类型中采样的多个子任务,形成庞大的任务树分布。实验表明,在我们环境中训练的分布式智能体在测试阶段面对新物体时展现出强大的泛化能力。此外,尽管训练过程中从未强制要求协作,这些智能体仍学习到集体探索策略,从而能够解决训练中从未遇到的新任务。我们进一步发现,智能体习得的集体探索策略可扩展至开放式任务场景,使其能够解决深度为训练阶段所见任务树两倍的任务。我们的开源代码及智能体视频可通过配套网站获取。