Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. In addition, an experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
翻译:表示学习在强化学习中常通过应对维度灾难发挥关键作用。一类代表性算法利用随机转移动力学的谱分解构建表示,在理想化设定下具有强理论性质。然而,当前谱方法因仅针对状态聚合构造、源于策略依赖的转移核且未考虑探索问题而面临适用性局限。为解决这些问题,我们提出一种替代性谱方法——谱分解表示(SPEDER),该方法从动力学中提取状态-动作抽象而不引入数据收集策略的虚假依赖,同时在学习过程中平衡探索与利用的权衡。理论分析建立了所提算法在在线与离线设定下的样本效率。此外,实验研究表明该方法在多个基准测试中优于当前最先进算法。