Effective exploration is a challenge in reinforcement learning (RL). Novelty-based exploration methods can suffer in high-dimensional state spaces, such as continuous partially-observable 3D environments. We address this challenge by defining novelty using semantically meaningful state abstractions, which can be found in learned representations shaped by natural language. In particular, we evaluate vision-language representations, pretrained on natural image captioning datasets. We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments. We also characterize why and how language provides useful abstractions for exploration by considering the impacts of using representations from a pretrained model, a language oracle, and several ablations. We demonstrate the benefits of our approach in two very different task domains -- one that stresses the identification and manipulation of everyday objects, and one that requires navigational exploration in an expansive world. Our results suggest that using language-shaped representations could improve exploration for various algorithms and agents in challenging environments.
翻译:有效探索是强化学习中的一个挑战。基于新颖性的探索方法在高维状态空间(如连续部分可观测的3D环境)中可能效果不佳。我们通过利用由自然语言塑造的学习表征中蕴含的具有语义意义的状态抽象来定义新颖性,从而应对这一挑战。具体而言,我们评估了在自然图像描述数据集上预训练的视觉-语言表征。研究表明,这些预训练表征能够驱动有意义的、与任务相关的探索,并提升在3D模拟环境中的性能。我们还通过分析使用预训练模型、语言神谕模型及若干消融实验的表征所带来的影响,从机理上阐明了语言为何以及如何为探索提供有用的抽象。我们在两个截然不同的任务领域——一个侧重于日常物体的识别与操作,另一个要求广阔世界中的导航探索——验证了该方法的效果。结果表明,使用语言塑造的表征能够改善各类算法和智能体在挑战性环境中的探索性能。