Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.
翻译:尽管强化学习可以解决许多具有挑战性的序列决策问题,但在相关任务间实现零样本迁移仍是一项难点。其关键在于为当前任务找到合适的表征,使智能体理解该任务与以往任务之间的关联。为实现零样本迁移,本文提出函数编码器——一种将函数表示为可学习非线性基函数加权组合的表征学习算法。通过使用函数编码器表征奖励函数或转移函数,智能体能够借助统一的向量表征获知当前任务与已见任务的关联性,从而在无额外训练的情况下在线完成相关任务间的迁移。通过在三个强化学习领域对基础强化学习算法进行函数编码器任务表征增强,我们验证了该方法在数据效率、渐近性能与训练稳定性方面均达到当前最优水平。