Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality. However, the power of representation learning has not been fully exploited yet in reinforcement learning (RL), due to i), the trade-off between expressiveness and tractability; and ii), the coupling between exploration and representation learning. In this paper, we first reveal the fact that under some noise assumption in the stochastic control model, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. Based on this observation, we propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise. We provide rigorous theoretical analysis of SPEDE, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.
翻译:表示学习是深度学习成功应对维度灾难的核心基础。然而,在强化学习中,表示学习的能力尚未得到充分挖掘,原因在于:i) 表达能力与可处理性之间的权衡;ii) 探索策略与表示学习之间的耦合。本文首先揭示了在随机控制模型的特定噪声假设下,可以零代价获得对应马尔可夫转移算子的线性谱特征这一事实。基于这一发现,我们提出了谱动力学嵌入算法(SPEDE),该算法通过利用噪声结构打破了上述权衡,并实现了面向表示学习的乐观探索。我们为SPEDE提供了严格的理论分析,并在多个基准测试中证明了其实际性能优于现有最先进的经验算法。