Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using hand-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent "question" functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
翻译:深度强化学习在高维输入中提取有用表示方面取得了显著进展,尽管这依赖于手工设计的辅助任务和伪奖励。如何以对象为中心自动学习此类表示,以适用于控制和快速适应,仍然是一个开放的研究问题。在本文中,我们介绍了一种方法,该方法尝试从对象中发现有意义的特征,将其转化为时间上连贯的“问题”函数,并利用随后学习到的广义价值函数进行控制。我们将我们的方法与现有最先进技术及其他消融实验进行了比较,并在静态和非静态环境中展示了具有竞争力的性能。最后,我们还研究了所发现的广义价值函数,通过定性分析表明,学习到的表示不仅是可解释的,而且围绕对象展开,这些对象在不同任务中保持不变性,从而促进快速适应。