Humans learn by interacting with their environments and perceiving the outcomes of their actions. A landmark in artificial intelligence has been the development of deep reinforcement learning (dRL) algorithms capable of doing the same in video games, on par with or better than humans. However, it remains unclear whether the successes of dRL models reflect advances in visual representation learning, the effectiveness of reinforcement learning algorithms at discovering better policies, or both. To address this question, we introduce the Learning Challenge Diagnosticator (LCD), a tool that separately measures the perceptual and reinforcement learning demands of a task. We use LCD to discover a novel taxonomy of challenges in the Procgen benchmark, and demonstrate that these predictions are both highly reliable and can instruct algorithmic development. More broadly, the LCD reveals multiple failure cases that can occur when optimizing dRL algorithms over entire video game benchmarks like Procgen, and provides a pathway towards more efficient progress.
翻译:人类通过与环境的互动以及观察自身行为的结果来学习。人工智能领域的一项里程碑成就是开发出能够在视频游戏中达到甚至超越人类水平的深度强化学习(dRL)算法。然而,目前尚不清楚dRL模型的成功是源于视觉表征学习的进步、强化学习算法在发现更优策略方面的有效性,还是两者共同作用的结果。为解决这一问题,我们引入了学习挑战诊断器(LCD),这是一种能够分别衡量任务中感知与强化学习需求的工具。利用LCD,我们在Procgen基准测试中发现了一种新颖的挑战分类体系,并证明这些预测不仅高度可靠,还能指导算法开发。更广泛而言,LCD揭示了在Procgen等完整视频游戏基准测试中优化dRL算法时可能出现的多种失败案例,并为更高效的进展提供了路径。