A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals. This paper investigates the integration of two frameworks for tackling those goals: episodic control and successor features. Episodic control is a cognitively inspired approach relying on episodic memory, an instance-based memory model of an agent's experiences. Meanwhile, successor features and generalized policy improvement (SF&GPI) is a meta and transfer learning framework allowing to learn policies for tasks that can be efficiently reused for later tasks which have a different reward function. Individually, these two techniques have shown impressive results in vastly improving sample efficiency and the elegant reuse of previously learned policies. Thus, we outline a combination of both approaches in a single reinforcement learning framework and empirically illustrate its benefits.
翻译:强化学习中的一个长期目标是构建能够像人类和动物一样快速学习并灵活迁移技能的智能体。本文研究两种旨在实现这些目标的框架的整合:情节控制与后继特征。情节控制是一种受认知启发的方法,依赖于情节记忆——一种基于实例的智能体经验记忆模型。而后继特征与广义策略改进(SF&GPI)则是一种元学习与迁移学习框架,允许学习面向多项任务策略,这些策略可被高效复用于具有不同奖励函数的后续任务。单独来看,这两种技术已在显著提升样本效率以及优雅复用先前学习策略方面展现出令人印象深刻的结果。因此,我们提出将这两种方法整合到单一强化学习框架中,并通过实验验证其优势。