Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by 1.5x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
翻译:模仿学习是机器学习领域应用最广泛的方法之一。然而,许多研究发现即使是在单智能体游戏这类受限环境中,该方法仍往往无法完全复现专家行为的本质。现有研究均未深入探讨模型与数据规模扩增的作用。受自然语言处理领域近期研究的启发——该领域中“规模扩增”已催生出能力持续提升的大语言模型——我们探究在单智能体游戏的模仿学习场景中,精细化的模型与数据规模扩增能否带来类似的性能提升。我们首先在多种Atari游戏中验证发现,随后聚焦于极具挑战性的NetHack游戏。在所有游戏中,我们发现模仿学习损失与平均回报随计算预算(浮点运算次数)平滑变化且强相关,从而推导出训练计算最优模仿智能体的幂律关系。最终,我们通过模仿学习训练了多个NetHack智能体,其性能在所有设定下均超越先前最佳成果1.5倍。本研究不仅揭示了模仿学习在多种单智能体游戏中的缩放规律,同时证明了通过规模扩增现有方法可在NetHack中培育更强智能体的可行性——该游戏对当前人工智能系统而言仍是难以逾越的挑战。