High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. An auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. Experimentally, various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.
翻译:高样本复杂度一直是强化学习(RL)面临的一大挑战。另一方面,人类不仅通过交互或演示来学习任务,还会通过阅读非结构化文本文档(例如说明书)来学习。说明书和维基页面是最丰富的数据来源之一,它们能为智能体提供有价值的特征、策略,或是任务特定的环境动态与奖励结构。因此,我们假设利用人类编写的说明书来辅助学习特定任务的策略,能够构建更高效、性能更优的智能体。为此,我们提出了“阅读与回报”(Read and Reward)框架。该框架通过阅读Atari游戏开发者发布的说明书,加速了Atari游戏上的RL算法。我们的框架包含一个问答提取模块,从说明书中提取并总结相关信息;以及一个推理模块,基于说明书信息评估物体与智能体之间的交互。当检测到交互时,系统会向标准的A2C强化学习智能体提供辅助奖励。实验表明,在我们的设计辅助下,多种强化学习算法在性能和训练速度上均获得了显著提升。