Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
翻译:模仿学习(IL)是机器学习中最广泛使用的方法之一。然而,尽管功能强大,许多研究发现它通常无法完全恢复潜在的专家行为。但这些研究并未深入探讨模型和数据规模缩放所起的作用。受自然语言处理(NLP)领域近期工作的启发——其中“缩放规模”已催生出能力日益强大的大语言模型(LLMs)——我们研究了精心缩放模型与数据规模是否能在模仿学习场景中带来类似的改进。为展示我们的发现,我们聚焦于NetHack游戏:一个具有程序化生成、随机性、长期依赖性和部分可观测性等挑战特性的环境。我们发现,IL损失与平均回报随计算预算平滑缩放且高度相关,从而在训练计算最优的IL智能体时,模型规模与样本数量之间呈现幂律关系。我们预测并训练了多个基于IL的NetHack智能体,发现它们在所有设置中的表现均至少以2倍优势超越此前的最优水平。本研究既展示了模仿学习在复杂领域的缩放行为,也验证了通过缩放当前方法构建能力日益增强的NetHack智能体的可行性——该游戏对现有AI系统而言仍是一个难以攻克的挑战。