The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.
翻译:选项键盘(Option Keyboard, OK)最近被提出作为一种跨任务传递行为知识的方法。OK通过使用后继特征(Successor Features, SFs)和广义策略改进(Generalized Policy Improvement, GPI)自适应地组合已知行为的子集来传递知识。然而,它依赖于手工设计的状态特征和任务编码,这在每个新环境中设计起来都颇为繁琐。在这项工作中,我们提出了“后继特征键盘”(Successor Features Keyboard, SFK),它能够通过自动发现的状态特征和任务编码来实现知识传递。为了实现这种自动发现,我们提出了“分类后继特征逼近器”(Categorical Successor Feature Approximator, CSFA),这是一种新颖的学习算法,能够在联合发现状态特征和任务编码的同时估计SFs。借助SFK和CSFA,我们首次在具有挑战性的3D环境中实现了基于SFs的知识传递,其中所有必要的表示均为自动发现得到。我们首先将CSFA与其他逼近SFs的方法进行比较,结果表明只有CSFA能在这一规模下发现与SF&GPI兼容的表示。接着,我们将SFK与迁移学习基线方法进行比较,结果显示它能够最快地迁移到长时域任务上。