Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by 1.5x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
翻译:模仿学习(Imitation Learning, IL)是机器学习中最广泛使用的方法之一。然而,许多研究发现,即使在单智能体游戏等受限环境中,它也往往无法完全恢复潜在专家行为。但现有研究均未深入探讨模型与数据规模扩大的作用。受自然语言处理(Natural Language Processing, NLP)领域近期研究中“规模扩大”带来大型语言模型(LLM)能力持续提升的启发,我们探究了在单智能体游戏模仿学习场景中,系统性地扩大模型与数据规模是否能带来类似改进。我们首先在多种Atari游戏中验证发现,随后聚焦于极具挑战性的NetHack游戏。在所有游戏中,我们发现IL损失与平均得分随计算预算(FLOPs)平滑变化且高度相关,从而形成训练计算最优IL智能体的幂律关系。最终,我们通过IL方法预测并训练了多个NetHack智能体,发现其在所有设置中均超越此前最优水平达1.5倍。本研究既展示了模仿学习在多种单智能体游戏中的缩放行为,也验证了通过扩大现有方法规模来构建能力更强的NetHack智能体的可行性——该游戏对当前AI系统而言仍属难以攻克的领域。