Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, and rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.
翻译:表征学习与探索是深度强化学习智能体面临的核心挑战。本文提出一种基于奇异值分解的方法,可获取保留领域内底层转移结构的表征。有趣的是,我们证明这些表征还能捕捉状态访问的相对频率,从而无额外代价地提供伪计数估计。为将该分解方法扩展至大规模领域,我们设计了一种无需构建转移矩阵、可结合深度网络并支持小批量训练的算法。此外,受预测性状态表征启发,我们将分解方法推广至部分可观测环境。通过在包含多任务场景与部分可观测领域的实验验证,该方法不仅能从DM-Lab-30环境(其输入涵盖语言指令、像素图像及奖励等多模态信息)中有效学习有用表征,还能在DM-Hard-8环境的困难探索任务中展现出优异性能。